No. That school of thought has very few students.
For one thing you are confusing signal-to-noise with theoretical dynamic range. At 16bit, you have a maximum mathematical dynamic range of -96db (e.g. amplitude doubles every 6db and 6*16=96), but if the source passed through 16bit converters, then you're going to have a least one bit's worth of noise, possibly two, which leaves you in the -84-90db range in terms of S:N. The usable range of a 24bit system might be down to 110-120db.
Also consider that as you go down the db scale, you're using fewer and fewer bits to represent your variations in amplitude. In our perfect theoretical world, -90 to -96db in a 16bit system is represented by one bit (e.g. two voltage levels). In a 24bit system, -90 to -96db would be represented by 2^8 - 2^7 = 128 voltage levels.
Now let's consider mathematical precision. You will not find a digital 16bit digital processor that processes internally at 16bit, even if it inputs & outputs at 16bit. Take the following example:
Let's work on a simple positive floating point scale of 0 to +1.0, and let's say that our input audio stream has a precision of 0.1 (e.g. one tenth). That means that the input audio stream will consist of samples from the set {0 , 0.1, ... , 0.9, 1.0}. Not very great of course, but it's easier to work with simple numbers.
Now let's say that we have a simple "fader" routine that changes the level of the input by multiplying by some user selectable value. Let's say the user has adjusted the "fader" such that the incoming stream will be multiplied by 0.3.
Taking into consideration only ONE sample, of say 0.5, we do our multiplication:
0.5 * 0.3 = 0.15
Now since we're working with an accuracy of only 0.1, the actual value arrived at by the "fader" will not be 0.15, but 0.1 (assuming truncation).
Now let's say that we pass that output (0.1) into another "fader" that has been set to 2.0:
0.1 * 2.0 = 0.2
Ok, that's all fine and dandy. But is 0.2 really what we want that sample to be? Let's say that our "faders" do internal processing with an accuracy of 0.01 and that "faders" are connected using a signal chain with the same accurace, and run that one more time:
0.5 * 0.3 = 0.15
0.15 * 2 = 0.3
Hmm, very interesting. So even though we started off with a sample that is only accurate to 0.1, using higher precision mathematics to process that sample resulted in a BETTER answer!
Now that's all from a processing standpoint, but consider what happens when we increase the *real usable accuracy* of the input samples, and then process those samples at even higher accuracy. The result is simply more precise mathematics that will produce more accurately rounded or truncated samples when they are converted to their end bit depth. Without going through another boring example, it should be obvious enough.
Now on the topic of rounding and truncating, we come across the word dither. Yes, dither is certainly the addition of noise to a signal, but it's GOOD noise that results in an image that more closely represents the original than a truncation or rounding process. Consider the following sample stream:
0.15, 0.15, 0.15, 0.15, 0.15, 0.15
Not too exciting, but let's say we want to take the accuracy down to 0.1. If we truncate we end up with:
0.1, 0.1, 0.1, 0.1, 0.1, 0.1
...and if we round we end up with
0.2, 0.2, 0.2, 0.2, 0.2, 0.2
...but if we dither (using a simple method) we might get:
0.1, 0.2, 0.1, 0.2, 0.1, 0.2
...which more closely or pleasantly approximates the orignal samples than rounding or truncation.
In the most extreme cases, dithering will literally sound like noise. In the usual case, however, dither will simply produce a better sound...not a noisier sound in the traditional sense.
And when we consider that in even a good 16bit system you're going to have about 1 bit's worth of noise (the bad kind of noise), using dither to go from 24bit (an analog source sampled with 24bit converters) to 16bit will actually produce a BETTER 16bit file!
All that said, it is possible to have a 16bit system that sounds better than a 24bit system. This is the real world we're talking about here. However, a properly designed 24bit system will produce better results than a properly designed 16bit system. Starting off with 24bits of precision (albeit lower in the real world) will result in a greater amount of acheivable detail when you start doing DSP, AND you'll carry out the detail because you're dithering down to 24bits instead of 16bits for your final output stream which means that continued processing can be done with less loss of resolution.
Holy shit it's really late, and I'm very much just rambling. I hope this makes sense because I don't have the time to go up and read what I just puked up.
Slackmaster 2000