BoB Katz & dithering

  • Thread starter Thread starter Bulls Hit
  • Start date Start date
B

Bulls Hit

Well-known member
I'm working my way through Mastering Audio and Bob's descriptions of dithering , what it does and how it does it got me wondering.

How do digital samples, which are just 16 or 24 bit numbers, contain all the music texture we hear?

For example the 44100 16 bit numbers that describe the first second of "Good Times Bad Times" hold all the information that, when converted back to analog, sound like guitars, bass and drums. How much would those numbers have to change so that we heard the same tune but played with trumpets, violins and timpanis?

How does a 16 bit number contain not just the amplitude of the music at that instant, but also the musical content? Is it possible to change the number so instead of hearing a guitar, we hear a violin?

I need to get my head around this before trying to absorb more of Bob's wisdom
 
Yes, those numbers could be changed to hear the music differently as any second of audio in any song would contain the same amount of information.

Similar I guess to how different stories are contained in similar looking books with the same number of pages. Many different stories can be written with a few thousand words.

It is amazing, isn't it? The thought of recording sound really blows my mind when I think of it.
 
Excellent question. I've wondered this myself. But I've noticed that I can create several 3 minute wave files but each one is a different size. So there must be bits relating to each frequency present in a time slice.
 
Here's a introduction to the whole mess: Jean Batiste Fourier was a mathematician born in 1768, and one day he figured out that sine waves (which were fairly well understood mathematically) could be combined to produce any complex wave imaginable, even that of the opening second of "Good Times Bad Times." You couldn't very well do that with pencil paper fast enough for it to mean anything, but as modern computers got faster somebody applied one to the "Fast Fourier Transform", which is the mathematical description of a complex wave in terms of multiple interacting sine waves (if you want the math, here it is: http://mathworld.wolfram.com/FourierTransform.html; just don't expect me to explain it). By the early 1980's, Sony and Philips were using the math to sample music, and for various reasons they settled on 44,100Hz as the sampling rate for the new CD standard. As any CD owner knows, this is frequent enough that, when you sample music and play it back, it still sounds like music. Of course, there is debate among high-end engineers (of whom Bob Katz is one) about how much of a difference sampling rates higher than 44.1KHz make on the music, but we're not going there. The important thing is that, yes, FFT allows an analog-to-digital convertor to record all the frequencies of that instant in time, which can then be reconstructed back to the music we hear via a digital-to-analog convertor.

All that is a separate issue from bit depth, and it's not the number of bits that determine how complex the digital wave form is, but the quality of the AD convertors. Bit depth simply describes how many possible levels of amplitude there are available in a given sample or piece of equipment. The CD standard calls for 16 bits, which was a lot in the '80's, but not so hot now, which is why media other than CDs are 24 bit or higher: they're not locked into the limitations of 20-year-old technology.

Anyhow, when you have a very low level signal, the ADC doesn't know for sure whether it should be a "1" or a "0", so it may toggle back and forth between the two, causing noise. This is, remember, very low level, so there's a bit depth of only 1, regardless of whether the device is capable of 16 bit or something higher. So you have noise generated, and not pretty noise either. The solution is to ADD noise to bump that 1 bit sound up into the higher bits where it can be shaped, or masked, or somehow otherwise made less prominent. So dither is that added noise. It's like rocking your car back and forth to slosh the last of the gasoline in the tank into the fuel pickup so you can get home without stopping at a station.

Hope this helps.
 
EddieRay said:
Excellent question. I've wondered this myself. But I've noticed that I can create several 3 minute wave files but each one is a different size. So there must be bits relating to each frequency present in a time slice.
The files are not all exactly 3 minutes. If they were the same length they would be the same size.
 
lpdeluxe said:
The important thing is that, yes, FFT allows an analog-to-digital convertor to record all the frequencies of that instant in time, which can then be reconstructed back to the music we hear via a digital-to-analog convertor.

OK, so if I understand correctly, this FFT is an algorithim that at any given instant, converts a bunch of frequencies into a single number? Does that conversion include the amplitude information at that instant as well?
 
Yes, within the limitations of the bit depth you are using.
 
This is what I'm struggling with.

A 16 bit number can have 65536 possible values. If we perform an FFT on 1 second of Good Times Bad Times, one of those samples might have a value of say 14222. If we do an FFT on 1 second of Back in Black, one of those samples could also be 14222. Yet the Back in Black sample won't 'sound the same' as the Good Times Bad Times sample. Will it?

Given that there are hundreds of musical instruments, their different charecteristics and timbres, the range of frequencies and volumes, the way they're played and the countless number of ways of combining these instruments together, it just seems like 16 bits aren't enough to hold all the information required to reproduce the analog sound
 
Bulls Hit said:
OK, so if I understand correctly, this FFT is an algorithim that at any given instant, converts a bunch of frequencies into a single number? Does that conversion include the amplitude information at that instant as well?

As I understand it (and I don't understand the math) FFT is used for DSP, not the conversion process itself, which is a relatively straightforward measurement of amplitude over time. It doesn't create a bunch of frequencies at a given instant, because there is no such thing, you can only measure frequency as a change in amplitude over time. What FFT does is split a complex waveform into multiple simple waves, so to speak, such that DSP like EQ can be performed.

A single sample just contains amplitude. A single sample cannot be said to contain any frequency information, because you can only derive frequency in relation to other samples. So the identical sample from different songs will sound the same, like this:

* click *
 
* click *

Well said. Once more: bit depth doesn't relate to the complexity of the music, but only to how amplitudes are processed (and changed, because the bit level for a particular amplitude can never be the same as the analog source).
 
Click for sure !

Even an FFT needs a sampling of samples before it can make sense out of individual samples and do anything useful (at least in the tools I tend to use) - a 50ms window is a common timeframe for an FFT to collect samples for something useful like rms stats or a frequency spectrum .

Of course the ear can't make any sense out of 50ms worth of audio and needs longer still to put together the sounds that begin to identify an instruments' timbre and pitch. I don't know what my own minimum 'windowing' is I would think maybe a second or so to identify a new sound - especially in the midst of the chaos of a full mix of other complex sounds. :D
 
kylen said:
Of course the ear can't make any sense out of 50ms worth of audio and needs longer still to put together the sounds that begin to identify an instruments' timbre and pitch.

Sixteenth notes at 120bpm is 125ms, at which one can readily identify pitch and limited timbral info. Much faster than that, and pitches get blurred together, however a single pitch for 50ms might be identifiable.

I'd think 250ms is more than enough to pick out detail. I can name a lot of songs on the radio in less than 500ms.
 
mshilarious said:
Sixteenth notes at 120bpm is 125ms, at which one can readily identify pitch and limited timbral info. Much faster than that, and pitches get blurred together, however a single pitch for 50ms might be identifiable.

I'd think 250ms is more than enough to pick out detail. I can name a lot of songs on the radio in less than 500ms.
Yes - I tried this at 50ms & 100ms - I can't identify anything in a full mix of that sampling window size. I see what you mean though depending on the tempo things start to get pretty clear around 250ms by 500ms I have heard enough to win a contest ! Put me on the radio :D
 
mshilarious said:
It doesn't create a bunch of frequencies at a given instant, because there is no such thing, you can only measure frequency as a change in amplitude over time.

Right, this is what I was missing. It's the series of samples over time that give the frequency info. I'm beginning to see the light.

So, back to dithering.

If I'm converting my 4 bit recording to 3 bit (this is a real early version of Cakewalk)
a sample 1101 without dithering would become 110. Right?
With dithering it becomes 111, in all cases or will some randomly remain at 110?

Also a sample 1110 converted without dithering becomes 111.
With dithering it....what? Stays at 111 or becomes 000?
 
To make a guitar sound like a trumpet, the envelope of attack, decay, etc, would need to be altered, as well as harmonic structure...

Much of what gives a sound it's particular timbre is based upon which harmonics are present in the wavefrom, and their relative amplitudes, and then there are things as well like the envelopeyness shape of the onset and offset. Hence how a synth is able to make almost any given sound based on attack, decay, sustain, etc, modulating a signal containing various frequencies of various types of waveforms.


An individual sample is just a measure of amplitude, and when you put them together, you get the approximated "graph" of the waveform present, which contains all the different frequencies and amplitudes present in the sound source. Without whatever it is the brain does when processing that soundwave, of course, it's pretty much a meaningless look at amplitude of sound pressure level over time.

=D

So is my understanding anyway.
 
mattamatta said:
An individual sample is just a measure of amplitude, and when you put them together, you get the approximated "graph" of the waveform present, which contains all the different frequencies and amplitudes present in the sound source.

Yeh, it's just wierd to me how a series of numbers hold not just the amplitude information but also the hundreds if not thousands of underlying individual frequencies that make up the music
 
Bulls Hit said:
Yeh, it's just wierd to me how a series of numbers hold not just the amplitude information but also the hundreds if not thousands of underlying individual frequencies that make up the music
Well....

The sample value holds the amplitude information and nothing else.

The overtones, (or individual frequencies as you put it) come from graphing out that amplitude information Vs a time axis.

It takes a minimum of two samples to get any kind of frequency information (the absolute highest frequency possible).

It takes a whole mess of samples to get any frequency information at a lower pitch.

Say you plot the amplitude on a piece of paper vs the time the amplitude was recorded at. After you plot the first 500 samples, you have one wave that looks like an S laying on its side. However long it takes from the start of the S to the end of the S along the X axis (the time axis) is the frequency, or how high the note is.

Now say you plot the first 500 samples and the S is not perfectly smooth. It's all jagged. Those jaggies are the overtones. Different jaggies make different overtones make different instrument sounds (to greatly simplify it).

In other words, no a single sample holds any frequency or overtone information. It takes hundreds of them plotted out to get tone information. A single sample is only amplitude.
 
EddieRay said:
Excellent question. I've wondered this myself. But I've noticed that I can create several 3 minute wave files but each one is a different size. So there must be bits relating to each frequency present in a time slice.

This shouldn't be the case for a wav file recorded at the same sample rate and bit depth.
 
Bulls Hit said:
Right, this is what I was missing. It's the series of samples over time that give the frequency info. I'm beginning to see the light.

So, back to dithering.

If I'm converting my 4 bit recording to 3 bit (this is a real early version of Cakewalk)
a sample 1101 without dithering would become 110. Right?
With dithering it becomes 111, in all cases or will some randomly remain at 110?

Also a sample 1110 converted without dithering becomes 111.
With dithering it....what? Stays at 111 or becomes 000?

A 3 bit recording? Hopefully just keeping the example simple!

Dithering is essentially noise to randomize the last bit so it will be 1 sometimes 0 others.

So 1101 without dithering gets truncated to 110
1101 with dithering is sometimes 111 sometimes 110

1110 without dithering gets truncated to 111
1110 with dithering is sometimes 111 and sometimes overloads the accumulator which will cause a digital over and leave the result at 111.
 
Back
Top