Final words about increase in bit depth/sample rate?

MatsD

New member
There's a lot of talk these days about the benefits of 24 bit/96 kHz recording over the old CD standard. Often there is a lack of technical knowledge and instead you are fed with final words like "There's a significant audible difference..." or "There's no audible difference whatsoever..."

Claimed benefits of recording at a greater bit depth are:
- Better dynamics/headroom - you can afford to lose a few bits here and there due to low levels and still have at least 16 used bits in all tracks when it's time for mixdown.
- Plugins who can deal internally with higher bit rates sounds better when fed with 24 bits than 16.
- A multitude of audio tracks mixed down to a final stereo track would present a higher demand in digital representation than just a few tracks. The 16 bit/44 kHz format is not sufficient to hold all this data with accuracy.

I feel comfortable with the first and second claim, but the third one I find questionable. Is it really true that it requires more digital "resources" to represent the sound of a full symphony orchestra than it requires to represent a simple sine wave? And if so, is the CD standard format not sufficient to deal with accurate representation of very complicated wave forms (like the symphony orchestra or a multitude of mixed down audio tracks)? Is there a real and for every added track increasing degradation in sound when mixing down several 16/44.1 tracks to a single 16/44.1 master? Or is this just plain nonsense? If this is true it also implies that you would benefit from mixing down a multitude of 16/44.1 tracks to a 24/96 (or at least a 24/44.1) master which would more accurately be able to hold all data from the tracks added together, provided all 24 bits are used and not just the lowest 16 bits.

To elaborate here, mixing down a bunch of 16 bit tracks within the same system (i.e. Cubase Export Audio) to a final 24 bit mix with everything at unity (master set not to go above 0 dB) would result in a 24 bit file with only the bottom 16 bits used. But if you calculate the added headroom of these 8 unused bits, you should be able to increase the levels on the master faders n dB above 0 dB (calibrated for 16 bits) and thus get a 24 bit mixdown with all bits used. Mixing down from 16 bits doesn't equal mixing down 1x16 bits but Nx16 bits, where N is the number of tracks. When mixing down to 24 bits you can allow more of these Nx16 bits to be transfered to the mix, to put it simply. In real life the equation gets a bit more complicated if we consider that some tracks are stereo and some are mono and the final mix is stereo. Then Nx16 then corresponds to the total number of mono tracks (where each stereo track is regarded as two mono tracks). The resulting mix would have 2x24 bits capacity.

My guess is that audio quality is preserved no matter how many tracks are mixed down, and this is due to the fact that every track is mixed down with only a fraction of its original amplitude. Everyone knows that the more tracks you have in a mix, the more you have to back off on the faders in order not to get a total signal above 0 dB and introduce digital clipping (provided the individual tracks are recorded at or near 0 dB level). The amplitudes for all tracks are added together and must be reduced to fit the headroom in the final mix. Less amplitude/volume means less need of number of bits in the representation, thus no significant degradation in sound.

The benefits from an increase in sample rate is more questionable. According to Nyquist 44.1 kHz can accurately reproduce frequencies up to 22.05 kHz, which is above most peoples (and certainly most ear abused musicians) hearing limit. Sufficient oversampling and good AD/DA converters is of course a must. What would be the benefits of increasing the sample rate to 96 kHz? Do you get a flatter frequency response and less distortion at the highest audible frequencies? Is the Nyquist theorem just theory and is 44.1 kHz sample rate when it comes down to dust really not enough to accurately reproduce all audible frequencies? Where does for example oversampling come into the picture or statistical quantification errors, the latter suggesting a greater accuracy for higher sampling frequencies than 44.1 kHz?

These are, I think, important questions for anyone involved in digital recording. Any facts beyond "I hear the difference" are welcome. Hints of good literature (books, web sites) that deal with these questions would also be appreciated.

/Mats D, Sweden
 
I agree with you about those 2 facts of better-than-CD quality. In general CD quality is BETTER than most people's hearing and the environment they listen in (also many people wouldn't realize that a 16-bit 44.1 kHz recording of a good vinyl album wasn't direct to digital unless they could hear the rumble, scratches, and wow). It is only the music/equipment industry that keeps pushing us to want for more so that they can make more sales on existing intellectual property (e.g. 5.1 surround sound for anything other than movies). As far as I'm concerned, the only improvement they could usefully make to a stereo CD is to make it unscratchable.

I think one factor that is frequently ignored is that people's ears react differently to edges vs. continuous high frequencies. That is to say, ears can hear the momentary burst of high frequencies from a snare drum far better than they could hear a sine wave at 22 kHz. I don't know how that factor relates to this discussion however.

My final comment is that there will always be those audiophiles that spend more time listening to the gaps between the songs than they do listening to the music. I think that's because they enjoy the technology or the snobbery MORE than the music.
 
What i am most interested in is if mixing down to a greater bit depth than the one used for the individual tracks has any benefits.

An example: My song is made up of four 16 bit mono tracks, recorded hot (just below 0 dB). The total dynamic content is 4x16=64 bits. When mixing down to 16 bit stereo format, I need to cut the dynamic range down to 2x16=32 bits by reducing the volume of each track. But if I mix down to 24 bit stereo format, I would have 2x24=48 bits capacity and would not need to reduce the total dynamic content of the four mono tracks as much, and that would be a real benefit. But I'm not sure this is possible in the digital domain. Perhaps you always end up with a 24 bit stereo mix where only the bottom 16 bits are used and the eight top bits are empty, which indicates that it would be useless to mixdown to a greater bit depth than the original format. Perhaps you need some software or hardware device in order to resolve n number of 16 bit tracks to make full use of the 24 bit format.

/Mats D
 
I didn't have the stamina to read through your entire post. But this is how it is:
A 16-bit wave file is made up from sixteen-bit numbers. A sixteen-bit number is a number between 0 and 65535 (in base ten :)). When you mix down a multitrack project to a stereo file what happens is - not exactly but lets say it anyway for the sake of simplicity - that the numbers for each of the tracks are added together. So mixing down 48 sixteen-bit tracks could give peaks of 48*65.535=3.145.728. And to represent that number you need log2(3.145.728) = 22 bits. And i hope that was what you were asking for, or I'll look kind of stupid. :)
 
MatsD said:
with only the bottom 16 bits used....


According to Nyquist 44.1 kHz can accurately reproduce frequencies up to 22.05 kHz, which is above most peoples (and certainly most ear abused musicians) hearing limit. Sufficient oversampling and good AD/DA converters is of course a must. What would be the benefits of increasing the sample rate to 96 kHz?....


Usually its the first 16 bits that are kept and the last 8 are randomly truncated. Truncation is Bad!

The Nyquist Theorm is great for higher frequencies, but suffers when the lower frequencies are sampled. The number of times you quantize a wave length is directly related to the frequency. So the a 20hz signal is so long that its nearly impossible to get a good sample when compared to a 20khz signal. There is alot of ambience at the higher frequencies that the converters just are not designed to sample. There is alot more to the issue of the increase of the 24/96 movement, but when you compare it to 16/44.1, its clear there is more definition of everything. Plus everytime a process is done digitally, the recalculations change the actual signal to a different frquency.

Peace,
Dennis
 
OK, so there is a gradual degradation from high to low frequences regardless of sample frequency but less obvious the higher frequency you use, due to the fact that lower frequencies equals longer wave lengths. OK, I buy that. You will primarily get a better representation of the lower frequencies by using 96 kHz instead of 44.1 kHz, right?

I'm still confused about the bit issue though. Why would it not be possible within the digital domain to add two 16 bit files to one 24 bit file? Mathematically this presents no problem as far as I can see. If you can add two 16 bit files together and then reduce the dynamic range to fit within the dynamic range of a new 16 bit file, why can't you add two 16 bit files together and then reduce the dynamic range (somewhat less) to fit within and fully occupy the greater dynamic range of a 24 bit file?

/Mats D
 
I am going to go out on a limb here and say that most of this discussion so far has missed many of the key points about increased bit depth and sample rate. Sorry, but there are many assumptions here that are plainly sin (off the mark...that is what sin mean....:)).

First off, you all need to do some reading up on how frequencies above what humans can hear do to frequencies we CAN hear. It is complex and hard to explain. But, it is some must read. Sorry I do not have a link to some excellent articles on this subject anymore, but a search on google will find you something.

Second off, again, you all need to do some reading up on how sensitive the human ear actually is to volume differences!!! The ear is capable roughly of some astounding capabilities in hearing volume change in it's whole hearing spectrum range, and 16 bit does NOT cover it. 20-21 bits DOES.

24 bit doesn't mean that you can go LOUDER, it simply means that the converters POTENTIAL noise floor is LOWER. Increased bit depth, on even a decent 24 bit converter usually equals a sound that has more "depth", meaning an audible improvement in the dynamic range (the difference between the converters self-noise and the highest volume achievable). Also, quantiniztion errors in the lower bits is VASTLY improved WITHOUT dithering, but certainly is wonderfully improved WITH dithering at the input. Yes, your audio IS dithered while it is recorded. It HAS to be, because converters HAVE to convert an input voltage to a voltage that can be stored as a fixed volume increment while it is sampling. If the voltage at the input falls between two storable volume increments, the audio either has to be boosted or reduced to a volume increment that IS storable, and how FAR the voltage is boosted/reduced is what cause distortion. This is more evident in lower levels in the audio, thus, reverb trails from the room or otherwise are more effected by this. With dithering applied though, a "shaped noise" is applied at the lowest bits to keep the least significant bit toggled on, and the inputted voltage is more or less "mixed" in with this shaped noise, thus it really doesn't need to be boosted or reduced. Distortion is significantly reduced!!! BUT, at the expense of potential sound to noise ratio. This has generally been an acceptable trade off as often, a bit of noise at very low levels is acceptable to the ear, but quantinization errors ARE NOT!!!

Crap, this is very complicated, and deserves those who can explain it better than I. http://www.digido.com has some excellent articles concerning bit depth/sample rate stuff that is well worth reading. It may take you a few reads to start understanding some of it, but it IS worth the time.

I will see if I can get one of out own here to compose an article, or at least give me a bit more information so that I may compose an article explaining it. There actually IS someone that frequents this BBS (I won't say who though...sorry....) who has been heavily involved in the R&D of many converters and has FORGOTTEN more about this stuff than all of us put together KNOWS about it...:D I will ask him to assist as this is an issue that really needs some clarification in simple laymens terms.

Frankly, I tire of seeing this stuff posted again and again. I am not saying this to insult those who post these kinds of questions and speculate. I know that you guys just want to know what the heck the "official" word is about increased bit depth/sampling rate really is, free of any slants of manufacture claims, and free of some big named dude possibly putting out misinformation (oh boy!!! have I ever seen some big named engineers/producers make some totally inaccurate claims in regards to digital audio....it is sad....no integrity at all!!!!). I am just frustrated that nobody has really approached the subject in a way that laymen CAN understand it, and can back what they are saying with good science.

Let me see if that can be changed....I suspect it can! :D

Ed
 
None of what you two guys say in the last two posts is correct, or even make any sence. But I have to get some sleep before I cach a train in four and a half hours, so I'll wait till tomorrow setting you straight. :) And hopefully someone else will have explained it by then... :D

Oops, Sonusman alredy did it. I really have to take a speed typing class... :)
 
Atomictoyz wrote: "The Nyquist Theorm is great for higher frequencies, but suffers when the lower frequencies are sampled. The number of times you quantize a wave length is directly related to the frequency. So the a 20hz signal is so long that its nearly impossible to get a good sample when compared to a 20khz signal."

Regrettably, this is false. Nyquist stated the upper limit of the passband with respect to the sampling frequency, but not the lower limit.

Shannon's Sampled Data Theorem is the true basis for modern digital audio- Nyquist just kind of embroidered around the edges a little, years and years later. (;-) The fact is that Shannon's sampled data systems have no lower frequency limit, as such: they are theoretically accurate to +- 1/2lsb *right down to DC*.

In practice (as we use them in digital audio), there are usually highpass properties designed into the actual converter support circuitry. But if the converter itself were to be wrapped in DC-coupled analog circuitry for one reason or another, you're good right down to DC. Example: the digital multimeter in your toolbox.

You usually don't _want_ to be DC coupled for audio, though- there's no sense having your precious few bits used up in preserving DC offsets from previous stages, when you are trying to somehow preserve the ten-trillion-to-one dynamic range of our auditory apparatus! In practice, digital is too damned _good_ at preserving LF information: most knowledgeable people will actually roll off their signals below 20Hz or so just to keep the infrasonic stuff _out_ of their mixes, because digital sampling will preserve it very nicely. But real-world woofers can't do a thing with it...

Anyway, Nyquist is silent on the lower limit of the passband in a sampled data system for the simple reason that the lower limit *is* DC, with no reduced-resolution effects whatsoever down there.

Sonusman has offered excellent advice here: the essays at http://www.digido.com should be required reading for everybody *before* the religious arguments start. This is hard stuff, but it is _science_, not art: it can be understood, you really don't need to be an EE to understand it, and the pseudoscience and misunderstandings really should be put to rest.

They never will be, probably, because a lot of this stuff has taken on the mantle of religion. And regrettably, religious-style issues are used by many people as a convenient excuse not to probe futher. But the truth is out there, and digido is an excellent place to start in achiving a _true_ understanding of resolution issues, distortion residuals, and the effects of proper dither compared to the effects of simple truncation. And learn not only that it can be heard (hell, even I can hear it, and everybody knows I'm as deaf as a post!), but _why_ it can be heard...

It's absolutely worth everybody's while to get up to speed on that stuff- and _then_, let's talk.
 
Oh yeah, all the above from our other resident digital expert!!! :)

Had a nice little 2 hour talk on the phone with skippy the other day. I am constantly impressed by the credentials some people who post around here have!!!

I'll try not to blow your cover too much skippy....;)

Ed
 
I was not implying that 24 bits would make anything sound louder, in fact I was not stating the essence of the benefits at all. But if I am to do so, I meant that there ought to be a way of mixing down the sum of the dynamic ranges of several 16 bit tracks to fully occupy the dynamic range of a 24 bit file. What I am talking about is a better dynamic resolution, not a boost in volume. Perhaps there's a fundmental reasoning error in my suggestion, but the technical stuff from you two guys didn't make me any wiser and seems to a great extent to deal with other issues.

The practical questions remains - can there in any way be any dynamic benefits of mixing down a bunch of 16 bit tracks to a 24 bit "master"? Is the nature of the digital domain such that I, no matter what, always will be ending up with a 24 bit file where 8 bits are unused? Is it impossible to reduce the sum of several 16 bit dynamic ranges to the full resolution of a 24 bit mixdown recording?

/Mats D
 
Again, go read those articles on http://www.digido.com . All will be answered there concerning what you are asking. What you are not getting is that we ARE in fact talking about the same thing, you just don't know it yet, thus you are asking the question.

I will throw this out. ANYTIME you apply DSP to digital audio, the bit depth changes. Anytime you change a volume in digital, and this includes combining two sounds together, the bit depth INCREASES. The trick is this, what to do with the increased bit depth?

In almost all software packages for mixing, dithering is offered. Whether you choose to use it depends upon many factors.

Okay, I will give you the simple answer. Yes, you will gain MANY benefits to rendering your mix of all 16 bit source tracks to a stereo 24 bit file. From there, the 24 bit file can be effectively reduced to 16 bit using dithering. Yes, that will sound much better than doing a 16 bit mix.

I will end there because I still encourage you to go read and understand what is happening with bit depth. Once you read though that stuff at digido.com , you should start to get a much better picture of why ANY increase in bit depth you can create at some point in your audio production will do nothing BUT help the audio. The fact that dithering helps the "percieved" bit depth of audio is part of the "science" that skippy was talking about.

Believe it or not, your 16 bit audio that you initially recorded could potentially have more "percieved" bit resolution. No, not totally "clean" dynamics, but enough to make a nice difference. It all centers around noise in the source. The right kind of noise is your friend, and THAT is what dithering is all about. The fact that your preamp might make the right kind of noise will assume that you can in fact have MORE THAN 16 bits of resolution (with only a bit of noise that will barely be audible) on a 16 bit converter. The noise keeps the least significant bit toggled on (hopefully), and since audio can be heard through this noise (yes, audio that is even lower volume than the noise can be heard...go read about this on digido.com....), you effectively have more "percieved" resolution because now you are able to hear audio that is below a value that represents the least significant bit. It has to do with the fact the noise keeps the least significant bit toogled on, and the usable audio and the noise are mixed together in that last bit. Now any audio that is below the noise can still be heard dynamically. You get it now?

What rendering a 24 bit mix gains you is the assurance that there is enough resolution to retain that low level audio without introducing yet MORE noise which would cover it up. Thus, you definately retain the full 16 bit resolution because you are asking the software to use more bits to represent it. I know, sounds crazy, but it works, and that is the way of things.

So no, you will not get the full benefits of 24 bit resolution unless you initially started out recording 24 bit, but 24 bit rendering assure that you don't LOSE any resolution, and you may in fact have more "percieved" resolution than 16 bits, even though you recorded at 16 bits. Make more sense now?

I may be over simplifying a bit, but that is the gist of it. I think I may ask skippy to help compose something a bit more indepth and articulate about this.

Ed
 
I should clarify too that it is the fact that when you render at 24 bit, the audio that is sitting at least significant bit is not suffering from the same degree of quantinization errors, which lowers distortion, thus, our ears actually hear audio, instead of distortion. Sorry, should have included that all in on the equasion.

Ed
 
Some comment!!! I actually had you in mind sjoko2 in helping out on an article on this subject. You game?

Ed
 
All technical details about the benefits apart, you are at least suggesting that you in fact can render a 24 bit mixdown from a bunch of 16 bit tracks, which was my main issue. I'm aware of that a DSP decrease in volume is in fact a decrease in bit representation of the sound and that was in fact the reason for my question. It seems a waste to decrease the resolution of all 16 bit tracks down to fit within a new 16 bit file, if it is possible to decrease the 16 bit files bit depth less and fit them into a 24 bit file. Experiments however has indicated that mixing down from 16 bit to 24 bit has only rendered me a 24 bit file with 16 used bits and 8 bits of "air". I am mainly working with Cubase and I often mixdown within the software using Export Audio. Perhaps I must use another method to record a fully occupied 24 bit file from my 16 bit tracks in Cubase.

I will read the articles you recommend, thanks for pointing them out to me.

/Mats D
 
"I'm aware of that a DSP decrease in volume is in fact a decrease in bit representation of the sound"

This is not true! It is exactly the opposit. ANY DSP applied to audio means that the audio needs more bit depth to represent it accurately. It all has to do with quantinization errors. Once you understand that, the rest is a piece of cake and will make sense. Proper dithering HAS to be applied to the audio INTERNALLY AFTER DSP for the DSP to sound worth a shit when it is reduced back to it's bit resolution. Much of the software available does not do this at all, or not very well. So your rendered files at 24 bit may in fact only be 16 bit with 8 bits of "air" because your software may not be dithering properly. This is not a new problem with cheap software based mixing applications.

I would not consider Cubase as being a high quality mixing platform. In fact, I would not consider ANY Windows based software to be even a suitable mixing platform. 32 bit floating point internal resolution is just not powerful enough to keep nuance in the audio unless you do NOTHING to the audio. Your typical ProTools system, or high end digital DSP boxes use at least 48 bit FIXED internal bit resolution. That is a very big difference, and you can HEAR it.

Ed
 
OK, I think I meant that in the final mix, each track will have a reduced dynamic range compared to the range of the original tracks in order for all of them to be able to sit in the context of a new 16 bit or 24 bit file. They all suddenly will have to "share the space" of a file with the same (16 bit) or somewhat greater (24 bit) depth as they originally had each. That's why I'm interested in mixing down to 24 bit without any unused bits - to get as good a resolution as possible. Of course I can always use 24 bits all the way or even the 32 bit floating point with True Tape fake dynamic reserve, but I'm still curious if mixing down to 24 bits within the software or some other way can render me a better resolution than mixing down to a 16 bit file. If the 24 bit file contains 8 unused bits, I have acheived nothing but to occupy my harddisk space with anneccesary big files.

Thanks for taking time to explain things.

/Mats D
 
Ed - I'm so busy at the moment trying to catch up it would be wrong for me to promise anything. But, if you want to write it, I will respond to you direct with any details you might need, suggestions for edits etc.

However, I think any article like that will need to start from te absolute basics, as from glancing through the questions raised, and opinions rendered, a lot of people seem completely unaware of what bitrate or clockspeed actually is or does.
 
While Ed and Sjoko2 are working on the definitive article about resolution and accuracy, allow me to regale you with an old-fart story.

I'm old enough (just barely) to have done a significant portion of my engineering education with a slide rule. Programmable calculators became cheap enough to be viable even for a destitute college student during my academic career, but when I started out the weapon of choice was the good old slipstick. And as a result of that, I learned a certain level of respect for numbers that seems to be missing in the digitally-driven education process these days.

When you do extensive calculations with a slide rule, you learn very quickly that you can only work to a certain level of precision. All your results come up with a limited number of decimal places that are actually meaningful, because the slide rule compares logarithms with limited accuracy. So, to take a quick and easy example, when you calculate the area of a circle, you'd measure the radius (let's say I measured it as 1.05 feet), square it (easily done on a slide rule) and multiply by pi (3.14). And that'd give you a result of 3.46 square feet. It was easy to keep track of the *precision* of your calculation, because it was very effectively limited by the means of the calculation.

Now, let's do that on my modern high-zoot calculator here. I'll just punch in (((1.05)**2)*pi), and it says right here that the answer is 3.4636590058....

Which is wrong, given the *precision* of my measurement of the radius! The *right* answer in an engineering sense is 3.46 +- .03 (accuracy and tolerance), because that's all the precision I made the basic measurement of the radius with. Just because the calculator gives me that _resolution_ (that many digits), doesn't mean that all those digits are correct! I used to love watching the people who could afford calculators get dinged on tests for reporting too much "accuracy": the fact is, it ain't accurate. Really.

Here's why. When I measured 1.05, that implies an accuracy of +- 1/2 of my least significant digit. My eyes are old and tired, so I don't read that ruler too well any more- that "1.05" could *actually* have been any value between 1.045 and 1.055, and I just couldn't tell the difference because my measurement system (ruler plus eyes) is no better than that. Assuming that the value really *meant* 1.050000000000000 is *absolutely* incorrect! Fact is, below that "5", we have no freakin' clue what's going on: the rest is just noise. In every measurement, there is uncertainty, you see. So If I plug in the limits of that measurement and report all those digits, look what happens:

(((1.045)**2)*pi) = 3.43069771754.....
(((1.055)**2)*pi) = 3.49667116326.....

So 3.46 is a reasonable *estimate*, and we'll divide the error band (the difference between those limits) in two and say that it's 3.46 +-.03. Which is true, and correct. Because of the (relative) inaccuracy of the initial measurement, the result *could occupy any value in that range*.

This is a great example of why you want your initial measurement to have as much _precision_ (read: resolution) as possible: if we could have measured that circle's radius more precisely as 1.0453 feet, the right answer for its area would be somewhere between (((1.04525)**2)*pi) and (((1.04535)**2)*pi), or 3.43233... to 3.43299..., or 3.4326 +- .0003, if you wanna get pedantic about it. We measure with 2 decimal places more *precision*, we get results with two decimal places more *accuracy*. Period. End of statement.

And that's why you got dinged on the test if you reported results to 10 decimal places: you were reporting _noise_ as fact.

Now, the application of this to digital audio and the 16-versus-24-bit problem may be a little less obvious. But the first thing that you must do is free yourself from the idea that all those digits/bits are meaningful, just because they're _there_. Unless you take great pains to make your measurements with unerring precision, and do your math with the _correct level_ of precision, some of that measurement uncertainly will show up in your result- and it is noise, dammit!

And for the other engineers lurking out there: yes, I *know* that the error bands here are actually (1/2LSD)**2, or .025 and .00025 in these examples- I'm trying to keep it simple, and focus on the behavior of the errors as extracted from the results, and not get hung up on the exact derivations of them or their exact values. Please bear with me for simplicity's sake- if it got any nerdier than this, nobody'd read it!

Hope that helps, anyway. Now, I'll turn it over to Ed and Sjoko.
 
Last edited:
Back
Top