I can't SEE the difference?!

  • Thread starter Thread starter icanhasanswers
  • Start date Start date
I

icanhasanswers

New member
My question is pretty straightforward: when I export 16bit/44khz and 24bit/96khz versions of the same track, and reload those back into my sequencer, I can't SEE any difference between the two. Why is that?

I stack them on top of each other, and zoom all the way in, (Cubase shows the contour of the track underneath as well as on top), and the contours are precisely the same. I DO however, see a difference between an mp3 file and a WAV file (albeit a very small one) - and mostly the contour seems to diverge near the end of the track (it stays the same as well for most of the track).

Is there a sensible reason why the contours of these files would be the same? Are the differences so small that they would visible only in the sample editor?
 
Are the differences so small that they would visible only in the sample editor?
Unless you are looking where you can see the individual 96K samples, no I imagine you wouldn't see a difference.

And if none of your source tracks have audio information up that high (mics that pick up well above 20khz, samplers with high-res samples), the difference might only be the smallest amount of noise.

Don't know for sure, but logically that makes sense to me.
 
....?

Why would you expect to see a difference?

Why would you expect to even hear a difference for that matter?
 
....?

Why would you expect to see a difference?

Why would you expect to even hear a difference for that matter?

I was under the impression that a higher bit depth allowed for an increased dynamic range and signal-to-noise ratio. I would then assume this would be reflected in the waveform itself and, consequently, what is translated into sound.

Is that not the case?
 
No. Even 1-bit audio can represent audio quite accurately :p
 
Chibi Nappa - thanks for your thoughtful response.

noisewreck and Massive Master - will either one of you please elaborate on what you mean? What you are saying does not relate to what I have come to understand.

What I know: sample-rate means how often a sample is taken; because samples cannot be taken infinitely quickly, it seems to me that the more often a sample is taken, the more 'accurate' the representative waveform.

And the higher the bit-depth, the greater the dynamic range (and consequently, signal-to-noise ratio). I understand that each bit affords ~6dB in dynamic range.

Unless by "Even 1-bit audio can represent audio quite accurately" you're referring to perfect representation of a very specific source input, one whose dynamic range is very small; which I am obviously not referring to, because I'm talking about recording in a general sense - a variety of sources - and most probably music. I would guess that 99% of the planet would assume that's what was meant - the remaining 1% being the types trying to be difficult or snide on purpose.
 
What I know: sample-rate means how often a sample is taken; because samples cannot be taken infinitely quickly, it seems to me that the more often a sample is taken, the more 'accurate' the representative waveform.
That's not really quite how it works. Digital sampling is based upon a very important principle in information theory called the Nyquist theorum. It's more than theory, it's the basis for virtually all modern digtial communication technology. What Nyquist states is that for a limited bandwidth signal (for example, an audio signal limited to a range of 20hz - 20kHz), one needs a sample rate of twice the highest frequency in order to be able to losslessly (meaning 100% accurately) recreate that signal. This frequency is called the Nyquist frequency.

Therefore, the Nyquist frequency for audio up to 20kHz is 40kHz. Because of some realities of physical technology, we actually need to push the sample rate just a little bit in order to avoid certain kinds of distortion, which is why the Engineers That Be selected 44.1kHz as the main sample rate for audio instead of pure 40kHz.

What's important to remember is this sample rate is mathematically proven to be enough to losslessly/accurately recreate any and all frequencies up to 20kHz. Any higher sample rates are simply unnecessary unless you want to increase the frequency range. 96kHz - as a sample rate in and of itself - does not reproduce 20kHz any more accurately than 44.1kHz does. Period. This is why the theoretical waveform you see in your editor looks more or less the same.

Then why do so many people hear differences between sample rates? That's almost certainly not because of the sample rate itself, but rather a function of the quality of the converter circuitry and design itself. Some converters simply operate and perform "better" at some sample rates than they do at others, much like some automobile engines have their power bands at different RPMs than others. It's not that the sample rate itself makes a difference, it's that the *converter* operates better (lower distortion, less jitter, things like that) when switched to operate at that speed. Anything above and beyond that is a psychosomatic reaction to marketing hype.

So, don't be surprised that you can't see a difference. There should not really be one. Also, don't be surprised if whatever difference you hear does not always follow the sample rate lockstep, because that's up to your ears and your converters far more than it is your sample rate.

HTH,

G.
 
That's not really quite how it works. Digital sampling is based upon a very important principle in information theory called the Nyquist theorum. It's more than theory, it's the basis for virtually all modern digtial communication technology. What Nyquist states is that for a limited bandwidth signal (for example, an audio signal limited to a range of 20hz - 20kHz), one needs a sample rate of twice the highest frequency in order to be able to losslessly (meaning 100% accurately) recreate that signal. This frequency is called the Nyquist frequency.

Therefore, the Nyquist frequency for audio up to 20kHz is 40kHz. Because of some realities of physical technology, we actually need to push the sample rate just a little bit in order to avoid certain kinds of distortion, which is why the Engineers That Be selected 44.1kHz as the main sample rate for audio instead of pure 40kHz.

What's important to remember is this sample rate is mathematically proven to be enough to losslessly/accurately recreate any and all frequencies up to 20kHz. Any higher sample rates are simply unnecessary unless you want to increase the frequency range. 96kHz - as a sample rate in and of itself - does not reproduce 20kHz any more accurately than 44.1kHz does. Period. This is why the theoretical waveform you see in your editor looks more or less the same.

Then why do so many people hear differences between sample rates? That's almost certainly not because of the sample rate itself, but rather a function of the quality of the converter circuitry and design itself. Some converters simply operate and perform "better" at some sample rates than they do at others, much like some automobile engines have their power bands at different RPMs than others. It's not that the sample rate itself makes a difference, it's that the *converter* operates better (lower distortion, less jitter, things like that) when switched to operate at that speed. Anything above and beyond that is a psychosomatic reaction to marketing hype.

So, don't be surprised that you can't see a difference. There should not really be one. Also, don't be surprised if whatever difference you hear does not always follow the sample rate lockstep, because that's up to your ears and your converters far more than it is your sample rate.

HTH,

G.

Two quick little things jump out at me:

(01) It's actually called the "Nyquist-Shannon Sampling Theorem", Nyquist laid the groundwork for it, but I believe Shannon's work is what codified it into what we know today.

(02) Technically speaking, you need to sample at a rate higher than the Nyquist frequency to ensure proper recreation. This rate increase can be infinitesimally smally, but is a necessary aspect of the theorem. This is an often incorrectly mentioned aspect of the theorem (and I only really bring it up because it's kind of a pet peeve of mine).

Actually, when you think about it, this kind of makes intuitive sense, because sampling at exactly 2x the frequency of a pure-tone sine wave could result in a regular samping of all the zero points.
 
I can't SEE any difference between the two. Why is that?

Besides all the other explanations, I'm pretty sure the "waveform view" in most DAW programs is an 8-bit representation shown as a GIF file. If the waveform graphic had higher resolution, the files would be as large as the Wave data itself! As a test, I just loaded a 51 MB Wave file into Sound Forge, and the .sfk graphic it created is only 114 KB. So for this reason alone it makes sense that fine detail is not shown.

--Ethan
 
Unless by "Even 1-bit audio can represent audio quite accurately" you're referring to perfect representation of a very specific source input, one whose dynamic range is very small; which I am obviously not referring to, because I'm talking about recording in a general sense - a variety of sources - and most probably music.

He was probably talking about Direct Stream Digital (like a SACD uses). You sample 1 bit at something crazy like 2800 khz. I haven't wrapped my head around the math yet, but it's something like a dense cluster of sampled 1's means the audio is at a peak and a dense cluster of 0's means the audio is at a trough.

A whole mess of uninterrupted 1's means you are at the top of a wave. An bunch of 1's with some 0's sprinkled in here and there means you are above the zero crossing but not at the top of the scale yet. Vise-versa for troughs.



But all of that is something totally different from PCM, which is what your question was about.
 
(02) Technically speaking, you need to sample at a rate higher than the Nyquist frequency to ensure proper recreation. This rate increase can be infinitesimally smally, but is a necessary aspect of the theorem. This is an often incorrectly mentioned aspect of the theorem (and I only really bring it up because it's kind of a pet peeve of mine).
Correct me if I'm wrong, or if you're referring to something else, but I'm of the understanding that increase, which I did touch upon, is not because of the actual sampling math of the theorem, but because of the physics involved. The fact that we cannot construct a brick wall low-pass filter that strictly limits the bandwidth of the signal at exactly half the sampling frequency is what leads to aliasing distortion, and it's largely for this reason that the actual sampling frequency has to be slightly incremented.

That is, if there were such a thing in physical reality as a brick wall filter that allowed everything below 20kHz to be passed 100% unhindered and everything above 20kHz to be 100% blocked, that the increment in sampling rate would not be necessary. This is one reason why it's important to express the theorem as applying to a limited bandwidth signal. But - by today's technology, anyway - we can't physically do that, some stuff above 20k will leak through the filter, requiring the sample rate to be increased to handle the extra frequencies.

But we're really discussing the wallpaper when it's what's in the room itself that's important for the explanation. The point is that a significantly higher sample rate (such as 96k, just for example) is not going to - due to the sample rate itself - make a more accurate copy of (to remove the technical arguments about/at the technical edge) a 15k max waveform than a 44.1k sample rate will. Any increase in "resolution" is, in fact an increase in frequency response, not an increase of accuracy at a set frequency.

*You may be right about the hyphenated name to ID the extension of the theorem to describe the implementation of sampling. And in fact, if you really wanted to get picky about it, Nyquist's work was not out of the blue, but was based upon the work of other work published before his. But fairly or unfairly (depending upon how you look at it) the key frequency is virtually always referred to as the Nyquist frequency and not the Nyquist-Shannon frequency.

I have a similar peeve about the "Fletcher-Munson" name being applied to much more recent ISO-accepted hearing response curves which, while similar, are not the original Fletcher-Munson results at all. It's unfortunate that history has a tendency to over-simplify nomenclature.


G.
 
Last edited:
But, keep in mind that a square wave above 7,400 Hz will come out as a pure sine wave, since a 44.1 brickwall filter will chop off any odd harmonics above that frequency. And Ethan is absolutely correct about the "waveform view".
 
Correct me if I'm wrong, or if you're referring to something else, but I'm of the understanding that increase, which I did touch upon, is not because of the actual sampling math of the theorem, but because of the physics involved. The fact that we cannot construct a brick wall low-pass filter that strictly limits the bandwidth of the signal at exactly half the sampling frequency is what leads to aliasing distortion, and it's largely for this reason that the actual sampling frequency has to be slightly incremented.

That is, if there were such a thing in physical reality as a brick wall filter that allowed everything below 20kHz to be passed 100% unhindered and everything above 20kHz to be 100% blocked, that the increment in sampling rate would not be necessary. This is one reason why it's important to express the theorem as applying to a limited bandwidth signal. But - by today's technology, anyway - we can't physically do that, some stuff above 20k will leak through the filter, requiring the sample rate to be increased to handle the extra frequencies.

My understanding of it is that these are seperate issues. I.e., you're totally right about the reasons in the physical domain having these restrictions, but the theoretical requirements also need more than 2x the Nyquist frequency. It's been a few years since college, but that's what I recall from my courses (almost all of which were theory-based rather than implemenation-based).

It's likely that the physical restraints are the real reason we have formats spec'd the way we do, and I'm not going to argue that one for you.

I just double checked one of my theory books on the shelf (Bosi/Goldberg) and it explicitly uses the greater than sign (">") rather than "greater than or equal to" to express the theorem. So my thought is that the theoretical issue and the physical issue aren't necessarily related, though they have similar constraints.


That is, if there were such a thing in physical reality as a brick wall filter that allowed everything below 20kHz to be passed 100% unhindered and everything above 20kHz to be 100% blocked, that the increment in sampling rate would not be necessary. This is one reason why it's important to express the theorem as applying to a limited bandwidth signal. But - by today's technology, anyway - we can't physically do that, some stuff above 20k will leak through the filter, requiring the sample rate to be increased to handle the extra frequencies.

I'm not sure if this is an accurate assessment or not. Sampling, by itself, doesn't imply filtering to me in a genuinely band-limited signal (of course, there aren't any in the real-world...): so I'm not sure I'm following your point.


But we're really discussing the wallpaper when it's what's in the room itself that's important for the explanation. The point is that a significantly higher sample rate (such as 96k, just for example) is not going to - due to the sample rate itself - make a more accurate copy of (to remove the technical arguments about/at the technical edge) a 15k max waveform than a 44.1k sample rate will. Any increase in "resolution" is, in fact an increase in frequency response, not an increase of accuracy at a set frequency.

Agreed. Like I said, it's just a pet peeve of mine so I feel compelled to clarify. I can at least claim that there's less of a chance of making a fool of yourself in front of signal processing experts if you pay attention to that detail :).


*You may be right about the hyphenated name to ID the extension of the theorem to describe the implementation of sampling. And in fact, if you really wanted to get picky about it, Nyquist's work was not out of the blue, but was based upon the work of other work published before his. But fairly or unfairly (depending upon how you look at it) the key frequency is virtually always referred to as the Nyquist frequency and not the Nyquist-Shannon frequency.

My understanding is that the theorem is called the "Nyquist-Shannon Sampling Theorem" (or on occassion just the "Shannon Sampling Theorem"), and the frequency is just the "Nyquist Frequency" (in all contexts). I've only ever heard "The Nyquist Theorem" get used informally. I'm only pointing it out because I've never seen a book or white paper article refer to it as such, and if somebody wanted to go read up on it in a book (I know, who does that anymore?), then it's easier to find with the more accepted name.

Granted, I've only done about as much reading as you might expect from somebody who did a 4-year BSE program in the 21st Century...I haven't seen the vast majority of publications on the matter, or even what one could reasonably call "a lot" in academia. So the name thing was just the added info that I was aware of.
 
But, keep in mind that a square wave above 7,400 Hz will come out as a pure sine wave, since a 44.1 brickwall filter will chop off any odd harmonics above that frequency.

Right, but that's filtering the signal first: fundamentally changing it.

Sampling doesn't filter the signal in the same fashion: what you'll see is a complex signal because the higher order harmonics will be aliased against the 44.1 kHz sampling frequency.
 
There's a thread on the Cakewalk forums at the moment regarding differences between sample rates. The thread basically consists of a lot of people who don't have a clue what they're talking about completely ignoring the advice/knowledge of those who do! But there was a good analogy that popped up in response to another analogy (people do love to use analogies when trying to make a point) that tried to relate sample rate to something such as the number of colours a photocopier reproduces, i.e. more is better.

But the response was, what is the point in photocopying an image at 16.7m colours when the original only contains 256 colours? Draw the loose relationship between that and sampling, and relate that to the nyquist-shannon theory and it does make a point.
 
noisewreck and Massive Master - will either one of you please elaborate on what you mean? What you are saying does not relate to what I have come to understand.
Mine was a trick responce (as signified by the :p), as I was specifically referring to DSD. DSD (Direct Stream Digital) on which SACD format is based on, runs at 1Bit but at 2.8224 MHz, using Delta-Sigma conversion/representation rather than PCM.
 
But, keep in mind that a square wave above 7,400 Hz will come out as a pure sine wave, since a 44.1 brickwall filter will chop off any odd harmonics above that frequency. And Ethan is absolutely correct about the "waveform view".
True, for a square wave running at 7,400Hz the next audible harmonic will be 22,200Hz. Question is, can you hear this harmonic? How many Mics do you have that do not have a sharp drop off at this frequency... maybe some EarthWorks mics go into the stratosphere, but again, can you hear it? How many tape machines can record this w/o distortion?

[Edit]Also, I suspect some other stuff will go on as well to prevent it from sounding like a pure sine wave. Anyway, I remember the waves I generated in Reaktor that I speak about below, don't sound like sine-waves.[/Edit]

Sure, there are analog oscillators that can accurately generate square waves at 10-20kHz that will put most digital systems to shame. BTW, I've done tests with generating square waves from Native Instrument's Reaktor at 10-15kHz, and when I look at the bounced audio, it doesn't look exactly like a square wave, but more like a trapezoid, regardless of the internal sample rate. Even at 384kHz (yeah Reaktor can run at 384kHz internally) the wave still looks trapezoidal rather than like a pure square wave...

Of course you can forget about analog synth emulators, they think that "analog synth" means that it can't produce square waves with any overshoots, and other crazy stuff.

I know, I am rambling, so I'll shuddup now :D
 
Last edited:
Is there a sensible reason why the contours of these files would be the same? Are the differences so small that they would visible only in the sample editor?
The only way to see the differences is to zoom in to the sample level in the sample editor where you can draw in the wave using the pencil tool. You'll also need to do this in the original project at the original sample rate that these files were recorded at. So, open your 44.1kHz file in 44.1kHz project, double-click on the audio part to open the sample editor and zoom in, untill you can see the samples in all their jagged glory. Take a screenshot, and copy/paste it in some graphic aware application.
Now open your 96kHz file in a 96kHz project, and follow the same steps as the above.

Compare the two.
 
But there was a good analogy that popped up in response to another analogy (people do love to use analogies when trying to make a point) that tried to relate sample rate to something such as the number of colours a photocopier reproduces, i.e. more is better.

But the response was, what is the point in photocopying an image at 16.7m colours when the original only contains 256 colours? Draw the loose relationship between that and sampling, and relate that to the nyquist-shannon theory and it does make a point.
I love the way people try to analogize to photography and image reproduction, and get the anaogys totally wrong.

Another one folks often drag out is that an increase in sample rate is like an increase in pixel resolution; i.e. the more samples you have, the greater the resolution. This is probably the most common misconception I've seen.

That example you bring up is different in that it's talking about color resolution and not sharpness resolution, but it's still talking about an increase in resolution within the same bandwidth, which is incorrect. It assumes that those 16 million colors are all within the visible spectrum between infrared and ultravoiolet just like the 256-color GIF image it's trying to reproduce is.

But that's not how sample rate actually works. An increase in sample rate does not increase the sharpness or the color resolution within the visible spectrum of the reproduction, rather it extends the spectrum beyond the visible into the ultraviolet. Assuming for the sake of argument that all other physical and technical design and construction restraints are equal, the analog of 44.1k in image processing will reproduce *all* colors in the visible spectrum - an analog scale, not color steps - just as well as the visual equivalent of 96k will. The difference is 96k will also reproduce colors that extend well into the ultraviolet, which we never see anyway.
Harvey Gerst said:
But, keep in mind that a square wave above 7,400 Hz will come out as a pure sine wave, since a 44.1 brickwall filter will chop off any odd harmonics above that frequency.
And we all know how pleasant a sibilant square wave sounds ;) :D. Seriously, though, Harvey, you're correct about that, but outside of a pure, single oscillator synth, where are you going to a) find a square wave above 7k at all, 2) find a square wave reproduced by your average analog synth or digital modeler of one that is actually anywhere near a real, true square wave even before conversion, iii) find one of sufficient amplitude to make the lack of third overtones and above at their resulting even smaller amplitudes significant, D) find anyone other than George who would actually want to listen to something so annoying as to fit the first three ;). (j/k George ;))

You do bring up a good point that there are technical limitations to Nyquist, but they are not ones that appear in nature and virtually never appear in music. If you really wanted to push the example, how about a 19k square wave? Definitely not accurately reproducable at 44.1. But nobody other than gear sluts will ever care.
Moseph said:
I just double checked one of my theory books on the shelf (Bosi/Goldberg) and it explicitly uses the greater than sign (">") rather than "greater than or equal to" to express the theorem. So my thought is that the theoretical issue and the physical issue aren't necessarily related, though they have similar constraints.
OK, I'm probably a victim of my own vocabulary; I should have said that the minimum sample rate *limit* is 2x. in order to be technically correct. The problem is that language almost always leads into a misunderstanding by those new to the idea that that means that the further above that rate one goes, the better, thus justifying the ultra-high sample rates.

I don't know for sure how close one can push that ">" towards ">=" - I believe it's pretty darn close to infinitely close - but it's an effect that is most certainly swamped by the requirements imposed by the filtering constraint that brings the sample rate way up to 44.1k. Which leads us to:
Moseph said:
Sampling, by itself, doesn't imply filtering to me in a genuinely band-limited signal (of course, there aren't any in the real-world...): so I'm not sure I'm following your point.
I was just continuing the explanation of what I had said before that. The point is that A/D conversion in the real world does include a low pass filter stage on purpose in order to ensure bandwidth-limiting of the signal. Higher frequency noise and harmonics and transients (oh my!) and such can otherwise sneak through. Because of gear limitations and musicality and such it won't be a large amount, but it's out there, we just can't hear it. The problem is, if you let that stuff through to the converter, it gets aliased back into the (poetntially) audible range below the Nyquist target. Like George said, the half-sample rate becomes kind of a mirror that reflects higher frequencies back into lower frequency ranges.

This is why we want to - and do - low pass at ~20k and filter out above that. But because we can't build an "ideal" brick wall filter right at 20k, we have to let that filter slope past 20k a bit. The idea is that we should be able to build such a filter that reaches full attenuation by about 22k or so. Allowing for that and a little extra slack, we wind up with a sample rate of 44.1k to cover the full 22k of fully limited bandwidth.

G.
 
But that's not how sample rate actually works. An increase in sample rate does not increase the sharpness or the color resolution within the visible spectrum of the reproduction, rather it extends the spectrum beyond the visible into the ultraviolet. Assuming for the sake of argument that all other physical and technical design and construction restraints are equal, the analog of 44.1k in image processing will reproduce *all* colors in the visible spectrum - an analog scale, not color steps - just as well as the visual equivalent of 96k will. The difference is 96k will also reproduce colors that extend well into the ultraviolet, which we never see anyway.

The analogies are always poor, I know, I was just quoting from the other thread for the possible benefit or musings of others :)

I think where most people stumble up is when they can't understand how everything below the nyquist frequency can be faithfully reproduced (assuming everything above the nyquist frequency was filtered so no aliasing has occurred), and have a notion that having more samples (which they visualize as 'data points' on the waveforms) will mean a more accurate reproduction - the word 'smoother' also pops up from time to time. I think part of this problem and the way people visualise digital sampling as being 'jagged' is because you only really get to see the blocky dot-to-dot "waveforms" of most editors and not the smooth interpolations and reconstructions that actually occur during digital-analogue conversion. Intersample peaks are usually a good starting point for explaining this. I recall seeing some good images accompanied by some basic but insightful explanations of how the signal is reconstructed on a website somewhere and if I can find it then I'll post a link.
 
Back
Top