By what mechanism do more dots mean less approximation IF only two dots can describe the sound 100% accurately?
I dont know this inside myself, but I see the great digital minds say that with only those two points they can tell you everything that is relevant within the passband.
You have a choice at a lower sample rate as the signal approaches the Nyquist limit: lower phase accuracy or lower amplitude. Which one you get depends on how the converters do time averaging.
They typically sample at a much higher rate, then average several samples together to get a value for a given chunk of time rather than using a single sample. If they use a single sample method, you get potentially reduced amplitude. If you use the maximum positive/minimum negative value during the period, you get the amplitude, but the phase of the signal is shifted. If you use a moving average, you end up somewhere in-between.
The reason for this is that if you have a sine wave at 22.05 kHz and you sample twice in that period, and the signal level is 1V, if you sample it at the midpoint of the upward sweep, it's about .71V at each sample. If you take another 22.05 kHz signal, this time at .71V, and sample it at 44.1 kHz but you sample it at the peak you also get .71V for the sample. The two signals show up on your computer as identical even though one was supposed to be almost half again louder.
That's the problem as you approach the Nyquist point. The accuracy of amplitude diminishes greatly, you get pumping for signals that approach the sampling rate, etc. Now some folks will immediately jump on me for not mentioning that some of this is diminished by reconstruction filters, but realistically, no filter can create data that isn't there. If those two signals look the same when the computer captures them, they're going to play back the same, too.
The big difference with doing the downsampling in software is that computers can compensate for what would otherwise be an irrecoverable encoding error if done in hardware. I'm not saying low end downsampling code does, just that it is possible in software, while in converter hardware, it isn't really practical to do it in real time with the same accuracy.