When I export as WAV - how is the downsample done?

  • Thread starter Thread starter sjfoote
  • Start date Start date
AGCurry said:
There's a lot of information in this thread but I think some of it is not quite right.

As I trust we all know, digital audio is a stream of samples taken at set intervals.

The sample depth - we're talking here about 16 bits and 24 bits -

I think you mean Bit Depth. Not to be confused with Bit Rate of course.
 
I think you are absolutely right to record at 44.1 if the project is destined for a CD. If it were a music for video project you'd record at 48. In other words, use the sample rate at which the project will be delivered. If you were recording at high sample rates like 88.2 or 96, you'd use 88.2 for CD and 96 for video, as these are easily divided by 2 to get to your 44.1 and 48 delivery rates.

As far as 16 versus 24, the first thing I want to say is that if you record at 24 do not just save as 16. You need to first determine if your program is dithering to get to 16 or truncating. You don't want truncation, because what happens with that is that the top 8 bits just get lopped off and you can get artifacts.

Also, the quality of the dithering algorithm is very important and varies widely. These days it is easier to find good dithering, in my opinion. But you need to check settings because software will often have a dialogue box hidden somewhere where you can choose the quality of the dithering. i.e. something like "fast", "medium", or "slow". It takes more computational cycles to do better dithering, so they give you the choice. For a final master choose the highest quality.

I usually record at 24 bits and then dither down to 16. I feel I get more information that way. If you look at the bit stream on a bit monitor you may be surprised to see that not all of the top bits are being used. So 16 bit is really more like 12-14 bits, and 24 bits is usually 20-22 bits. If you are recording 24 bit and really getting only 19-20 bits, which is entirely possible, you still get more audio information in the data stream by using the higher rate and then dithering down to 16.

I should mention in this regard that when recording 24 bits all the higher bits are there, just empty. The actual data can top out at 20 or so.
 
AGCurry said:
Conversion from a higher sample depth to a lower sample depth is a matter of loading the sample into a memory location and masking off (throwing away) the extra bits. This is very simple and can be done in software or hardware with NO difference in results.

I'm not familiar with the term "masking off" Masking is what happens when one sound is "covered up" by another sound. Like with MP3s or dithering, the idea is that the noise created will be masked by the audio itself. Masking what keeps you from hearing the noise created by throwing out those bits.

Also, different sample depths do not result in a change in file size.

Sure they do. 60 seconds of stereo audio at 44.1/16 bit takes up roughly
10 MB. The same thing at 24 bit takes up roughly 15 MB. It also affects Bit Rate. Stereo 44.1/16 is roughly 1.41 Mbps, while stereo 44.1/24 transfer rate is roughly 2.12 Mbps.
 
Albert, you're confusing me a little there. When you are talking about the "top" bits, do you mean the most-significant bits or the least-significant bits? I tend to imagine digital words horizontally with the least-significant bit on the right. Working vertically, is your "top" bit the most- or least-significant bit?

When 24-bit is truncated to 16-bit it's the smallest value bits - the least-significant bits - that get lopped off. However, those are the most used bits in any digital signal as the numbers count up and down - unless you're recording a true square wave of course. Those 8 smallest value bits are the "filler" that give a smoother and more accurate "finish" to the sound of 24-bit recordings than you get with 16-bit. The least used bits are the most-significant bits.
 
Last edited:
Wow, so much to consider!

I have asked the Tech Support folks at CakeWalk how the MC3 program converts a file from 24 bit to 16 bit. At least I should know how they do it. Then I can consider my options and choose the recording parameters that will work best with my hardware and software and will best meet my needs & expectations.

Once again, thank you all for the information - I feel that I have learned a lot!

Steve
 
RAK said:
I'm not familiar with the term "masking off" Masking is what happens when one sound is "covered up" by another sound. Like with MP3s or dithering, the idea is that the noise created will be masked by the audio itself. Masking what keeps you from hearing the noise created by throwing out those bits.

I'm using "masking" as a mathematical/programming term. Each bit (Binary Digit) is 1 or 0. If I have an 8-bit value, say:

10101010

and perform an AND operation with

00001111

I am MASKING the first four bits, resulting in 00001010. This is how sample depth - or, as you call it, "bit depth" - is converted from higher to lower.

Conversion from 24 to 16 bits, as I stated before, means loss of dynamic range on the "soft" end. Noise is not created by "throwing away" bits; if anything, you may lose some noise if it exists in the original program material.

Dithering has nothing to do with bit-depth conversion (you probably know this, but others seem not to...).
 
RAK said:
Sure they do. 60 seconds of stereo audio at 44.1/16 bit takes up roughly 10 MB. The same thing at 24 bit takes up roughly 15 MB. It also affects Bit Rate. Stereo 44.1/16 is roughly 1.41 Mbps, while stereo 44.1/24 transfer rate is roughly 2.12 Mbps.

I'll take your word for it!
 
AGCurry said:
10101010

and perform an AND operation with

00001111

I am MASKING the first four bits, resulting in 00001010. This is how sample depth - or, as you call it, "bit depth" - is converted from higher to lower.
Not quite. It is the least-significant bits at the right hand side that get chopped off when down-converting.

A 24-bit word: 101101001101101110011001
will get trunctated to
a 16-bit word: 1011010011011011
and 10011001 is discarded.

Thinking in terms of bytes, the two most-significant bytes are retained and the least-significant byte is discarded. That is why a 16-bit file is more compact - two bytes per word whereas 24-bit is 3 bytes per word and in some proprietary formats 4 bytes per word.

Obviously, when you introduce dithering down-converting is a little less crude.
 
Last edited:
sjfoote said:
Wow, so much to consider!

I have asked the Tech Support folks at CakeWalk how the MC3 program converts a file from 24 bit to 16 bit. At least I should know how they do it. Then I can consider my options and choose the recording parameters that will work best with my hardware and software and will best meet my needs & expectations.

Once again, thank you all for the information - I feel that I have learned a lot!

Steve

Did you try the Music Creator forum on Cakewalk's site?
 
AGCurry said:
I'm using "masking" as a mathematical/programming term. Each bit (Binary Digit) is 1 or 0. If I have an 8-bit value, say:

10101010

and perform an AND operation with

00001111

I am MASKING the first four bits, resulting in 00001010. This is how sample depth - or, as you call it, "bit depth" - is converted from higher to lower.

Conversion from 24 to 16 bits, as I stated before, means loss of dynamic range on the "soft" end. Noise is not created by "throwing away" bits; if anything, you may lose some noise if it exists in the original program material.

Dithering has nothing to do with bit-depth conversion (you probably know this, but others seem not to...).

I guess I was just thinking more of lossy audio compression, and how psychoacoustics plays a role. When the data is thrown out, noise is created, and the idea is that it will be masked by other sounds. Here's another explanation from wikipedia. It's not very scientific, but I can't find the other better articles I have read on psychoacoustic modeling:

"While removing or reducing these 'unhearable' sounds may account for a small percentage of bits saved in lossy compression, the real savings comes from a complementary phenomenon - noise shaping. Reducing the amount of bits used to code a signal increases the amount of noise in that signal. In psychoacoustics based lossy compression, the real key is to 'hide' the noise generated by the bit savings in areas of the audio stream that cannot be perceived. This is done by, for instance, using very small amounts of bits to code the high frequencies of most signals - not because the signal has little high frequency information (though this is also often true as well), but rather because the human ear can only perceive very loud signals in this region, so that softer (noise) sounds 'hidden' there simply aren't heard."
 
iqi616 said:
Not quite. It is the least-significant bits at the right hand side that get chopped off when down-converting.

A 24-bit word: 101101001101101110011001
will get trunctated to
a 16-bit word: 1011010011011011
and 10011001 is discarded.

Thinking in terms of bytes, the two most-significant bytes are retained and the least-significant byte is discarded. That is why a 16-bit file is more compact - two bytes per word whereas 24-bit is 3 bytes per word and in some proprietary formats 4 bytes per word.

Obviously, when you introduce dithering down-converting is a little less crude.

It really doesn't matter logically which end they're discarded from. Big-endian and little-endian and all that.
 
RAK said:
I guess I was just thinking more of lossy audio compression, and how psychoacoustics plays a role. When the data is thrown out, noise is created, and the idea is that it will be masked by other sounds. Here's another explanation from wikipedia. It's not very scientific, but I can't find the other better articles I have read on psychoacoustic modeling:

"While removing or reducing these 'unhearable' sounds may account for a small percentage of bits saved in lossy compression, the real savings comes from a complementary phenomenon - noise shaping. Reducing the amount of bits used to code a signal increases the amount of noise in that signal. In psychoacoustics based lossy compression, the real key is to 'hide' the noise generated by the bit savings in areas of the audio stream that cannot be perceived. This is done by, for instance, using very small amounts of bits to code the high frequencies of most signals - not because the signal has little high frequency information (though this is also often true as well), but rather because the human ear can only perceive very loud signals in this region, so that softer (noise) sounds 'hidden' there simply aren't heard."

I'll be dogged. It would seem that lossy compression creates noise that was not there in the original program. This article seems to be referring to algorithms used to convert, for example, WAV to MP3; does it also apply to changes in bit depth when no other conversion is occurring? I dunno.

I was thinking of the noise that is present in every recording, whether noticed or not. THAT noise would, logically, be reduced by decreasing bit depth.
 
AGCurry said:
It really doesn't matter logically which end they're discarded from. Big-endian and little-endian and all that.
But sonically it does matter. Discarding the eight most-significant bits would result in some serious distortion! :eek:

All bits are not equal, which would you rather lose? A few cents or a few hundred dollar bills? :)
 
Last edited:
AGCurry said:
I'll be dogged. It would seem that lossy compression creates noise that was not there in the original program. This article seems to be referring to algorithms used to convert, for example, WAV to MP3; does it also apply to changes in bit depth when no other conversion is occurring? I dunno.

I was thinking of the noise that is present in every recording, whether noticed or not. THAT noise would, logically, be reduced by decreasing bit depth.
When referrinng to dithering and the like, the term "noise" is meant in an information theory way more than in an audiology way.

When just simply truncating the last bits off of a digital word, "noise" is created in the loss of accuracy of the word value. This is not (necessarily) "noise" in terms of hiss, static, popping, etc. (though some audibile "noise" may be a side effect of such truncation.) It is noise in the terms of information loss. This is how it really should be approached.

Dithering adds "noise" to the last bits, but again what's important here is not that the noise sounds like a hiss if amplified loud enough (though it may), but rather that it's "noise" in the information theory definition; it's the addition of bit values that contain no useable information.

The result is a pseudo-randomization of the values of the last bits of the truncated words, a sort of "flattening" or "smearing" of the least signifigant bits. When converted to an analog signal, the audible artifacts of such flattening/smearing are less than if the truncated digits were just left alone as "rough cuts".

The seeming dichotomy is that one actually adds noise to the signal in order to reduce noise in the signal. The dichotomy disappears, however, when one realises that the added "noise" is digital and the the reduction of "noise" is in the audible domain. Or, more accurately, one adds "noise" in the definition of information theory in order to reduce "noise" in the definition of audiology.

G.
 
Last edited:
iqi616 said:
Albert, you're confusing me a little there. When you are talking about the "top" bits, do you mean the most-significant bits or the least-significant bits? I tend to imagine digital words horizontally with the least-significant bit on the right. Working vertically, is your "top" bit the most- or least-significant bit?

When 24-bit is truncated to 16-bit it's the smallest value bits - the least-significant bits - that get lopped off. However, those are the most used bits in any digital signal as the numbers count up and down - unless you're recording a true square wave of course. Those 8 smallest value bits are the "filler" that give a smoother and more accurate "finish" to the sound of 24-bit recordings than you get with 16-bit. The least used bits are the most-significant bits.

You know more about this than me! I used the term "top" because on the bit meters I've worked with, the bits are represented from bottom to top. Not sure which is the least significant. The "top' bits as indicated on the meters are the empty ones.

Obviously, a little information in the wrong hands (mine) can be a dangerous thing!
 
SonicAlbert said:
You know more about this than me! I used the term "top" because on the bit meters I've worked with, the bits are represented from bottom to top. Not sure which is the least significant. The "top' bits as indicated on the meters are the empty ones.

Obviously, a little information in the wrong hands (mine) can be a dangerous thing!
Well, I've never personally used a bit meter so I'm probably about to embarrass myself.

Looking at the diagram on this page... http://www.izotope.com/support/help/ozone/pages/meters_dithering.htm

The most-significant bits are at the top and the least-significant bits at the bottom. So, when the signal is truncated it is bits 17 to 24 that are taken away and if you loaded the 16-bit signal back into 24-bit, you'd see a big gap at the bottom like their second explanatory figure shows.

Note that labelling of a particular bit with a number is hazardous as it is contextual. From my background, "bit 1" is the least significant bit but sometimes the LSB is "bit 0". :) I guess they are counting down from 1 because they're trying to communicate bit depth rather than bit height because wearing our digital audio hats, everything hangs below 0 dBFS.

That page is brief but informative. The one point I'd make is that failing to use all 24 bits is not necessarily a bad thing. A quiet piece of music doesn't have to use as much of the available dynamic range a loud piece of music.
 
iqi616 said:
But sonically it does matter. Discarding the eight most-significant bits would result in some serious distortion! :eek:

All bits are not equal, which would you rather lose? A few cents or a few hundred dollar bills? :)

You're not understanding me. All I'm saying is that in certain architectures, the most significant bits will be on the left, while in others they will be on the right. It's not important that we know where they are, but that, as you say, the least significant ones are lost.
 
You must spread some Reputation around before giving it to SouthSIDE Glen again.
 
I spread a fair bit of pos rep. It's just that Glen earns it more than most.
 
AGCurry said:
You're not understanding me. All I'm saying is that in certain architectures, the most significant bits will be on the left, while in others they will be on the right. It's not important that we know where they are, but that, as you say, the least significant ones are lost.
But the convention when writing numbers (decimal, binary, octal, hexadecimal) is with the least significant digit on the right. I guess that's why I didn't understand you.
 
Back
Top