OK, here it is, a couple of days late, sorry about that, but replete with clips, waveforms, & charts!
Before I revisit quantization distortion (QD), I want to take another look at sample rate, frequency response, and content. No one seems to believe poor apl, nor Mr. Lavry, when they show mathematically that Nyquist can indeed be sustained. Again, if Nyquist is wrong, this is Nobel prize material. I suspect the difficulty is most people, myself included, haven't had the higher math classes to enable us to truly understand the equations. I mean, I took college calculus, I can follow the graphs, but that's really not enough for me to be able to prove or disprove it mathematically.
Mathematicians may eschew experiments-they don't like to get themselves all sullied, except perhaps with chalk, or grease pens, or white board markers, or whatever they use in class these days-but I don't mind!
So, what about the way that waves look and sound at different sample rates? Here's a series of three files, each containing a mix of four high-frequency sine waves at staggered intervals. The first is the original file, generated at 96kHz.
Original 96kHz wave
And a look at its time-domain waveform:
And its frequency-domain FFT:
Looks pretty good eh? It should, these frequencies are all less than 1/4 sample rate.
Now, what happens if we downsample to 44.1kHz?
44.1kHz downsample
Looks like Atari 2600 quality graphics, doesn't it? But what about the frequency domain?
Hmmm. OK, one quick note, I did peak normalize the downsampled wave, so it would be easier to visually compare the waveforms. Sorry about that, it skews the FFT a bit. Subtract 1dB from all the peaks (actually the entire line), and you'll be back to the original.
What we see is what the theory predicts-not only the 1dB attenuation of the lowest peak, but as the frequency increases, the amount of attenuation increases. You can show this effect not only with a few sine waves, but with white noise, program material, anything full-range. It's a consistent result. The sine waves lend the ability to directly measure any resulting sideband distortion, so they are useful for that.
Finally, let's convert our 44.1kHz downsample back to 96kHz, and have a listen/look:
96kHz upsample
A little more attenuation there, again due to necessary filter behavior. But visually, the waveform looks all nice again.
In conclusion, this experiment showed that it is indeed possible to recreate a waveform accurately, less some high-frequency attenuation due to necessary filter behavior, so long as the material is bandwidth-limited.
.
.
.
Hey! Those were test signals again! Yes, you are correct, I promised real-world audio, and here it is!
The pipe organ samples! Here I am testing QD, but you could use these samples to retest what I did above, especially on the highest-pitched pipe. But I am getting ahead of myself; first, the pipes!
OK, these are three pipes pitched at different As, although I made no attempt to tune them-organ pipes have to be tuned to wind pressure, and since I am blowing these by mouth, I am going to vary pressure a bit anyway. I am not playing them together, so it's not important.
The first pipe is a principal, in the 2' octave. Principals are dead in the middle of the pipe organ timbral range; more overtones than the flutes; fewer than the reeds. Principals are often used as the display rank; organ pipes actually come in all shapes and sizes, but if you walk into a hall with a pipe organ, chances are the rank you'll see are the principals. The rest of the pipes are hidden behind.
So pipe #1 is mellow, with lots of overtones.
Pipe #2 is a flute d'amour, pitched two octaves higher. This is a wooden, stopped pipe, with fewer overtones than the principal.
Pipe #3 is another principal, an octave higher than pipe #2. Actually, when you get to the highest registers, all the pipes kinda get the same, I guess because it's hard to do really weensy pipes in the various constructions. So wood ranks often have a metal top octave, and reed ranks might have a metal flue rank on top as well.
Enough about organs. The recording itself: Each pipe was mouth-blown, recorded using a flat-response, omnidirectional measurement-style mic with frequency response extending beyond 20kHz. The pipes were at a distance of about 6" from the mic. This was fed into an ART Digital MPA, using its internal converters, which actually have a quite good A/D IC. The close distance was used to maximize signal to electrical noise ratio, because as I discussed in an earlier post, noise, whether analog or added digitally, will dither away QD, but we want to try and see if we can generate QD, so let's keep noise to a minimum possible! You will also hear something that sounds windy or staticy. That's air moving! Partially becase it's the nature of organ pipes, and partially because I don't actually have these things in my mouth, because they are made of lead! They are cupped in my hand, and some air escapes between my fingers.
I also discussed earlier that it's impossible to directly record QD when using any 24 bit converter, because every 24 bit converter's noise will be sufficient to dither the signal. So my quest to generate QD won't be at the point of recording, it will be in truncating that signal to 16 bit, which must be done to produce an audio CD.
We know from wado's discussion and my previous experiment with noise-free test signals that it's actually quite easy to generate very objectionable QD with signals that are rather high-level. The test files I posted showed rather disgusting QD with a -54dBFS signal truncated to 16 bit. The big reason QD is so nasty is because it's not harmonic distortion.
So QD must be eliminated at all steps of processing to avoid that fate. For example, if you apply a fade, your DAW better know to dither that result if it's returning to 24 bit. And in fact, I would have to think that all of them do these days, otherwise the flaw would be quickly discovered.
Therefore, if we want to go looking for QD, we've got to try to purposely break a process. Truncation to 16 bit is the best subject, because it's both necessary and in many DAWs, a manual process-by that I mean the DAW won't automatically apply dither. But some will, so be careful! I am using Wavelab 5, which does allow dither-free truncation when saving a 24 bit wave as a 16 bit file (WL expects you to apply dither when rendering the master bus, and you get to pick which dither you want to use). This is clearly seen in my previous test files, where I generated the QD on truncation.
OK, without further ado, the sample files. Here is the source file, this is the original recording, only processed to black and trim the empty space between the samples:
Organ pipe original
You may wish to skip that download; I processed that file further for the test; it's just there for full disclosure.
The reason why further processing is necessary: as I have said, it's actually kinda hard to produce a real-world signal that causes QD. I've tried before and failed, using sources like a single piano note that fades into oblivion. The problem with that test is the acoustic/electrical noise doesn't fade into oblivion, so as the signal decreases in amplitude, so does the signal-to-noise ratio, and there you go.
But if I take a signal with a high signal-to-noise ratio, and apply a digital fade, then the acoustic/electrical noise is faded with the signal, and thus should be less capable of dithering away the QD. And what we are looking for is inharmonic distortion that remains somewhat constant as the signal amplitude decreases, resulting in a progressively worse signal to distortion ratio (nasty sounding fadeouts!)
So here's the same file with fades applied and the very last bit of breath running out trimmed off, in 24 bit:
Organ pipe fade
And truncated to 16 bit--if you wish to save yourself this download, do the truncation on the 24 bit file yourself:
Organ pipe trunc
And with dither applied, still at 24 bit:
Organ pipe dither
And the dithered 16 bit file (again, you can save yourself a download here):
Organ pipe dither 16 bit
I have my observations on those files, but I will save them for now, except for one: have a listen/look to a section between the samples, on the dithered files, where there is nothing but the dither. Dither routines are normally noise-shaped, which means that the noise is weighted towards very high frequencies, where people can't easily hear it. Thus, if you measure the integrated noise (RMS), it's something like -90dBFS, but here is a look at the frequency domain of the dither:
And while you are looking at that and listening to the samples, consider what implication that has on the effective dynamic range of 16 bit audio, with respect to your ability to hear signal below the -90dBFS noise rating, and of course any QD that may exist.