BoB Katz & dithering

  • Thread starter Thread starter Bulls Hit
  • Start date Start date
Farview said:
The files are not all exactly 3 minutes. If they were the same length they would be the same size.

Nope. It's just like I said. In fact I have several files that are smaller yet longer than others. File size does not appear to be directly related to length of time.
 
EddieRay said:
Nope. It's just like I said. In fact I have several files that are smaller yet longer than others. File size does not appear to be directly related to length of time.

What is the audio format for the files?
 
EddieRay said:
They're 16-bit, 44.1kHz .wav files.

at 44,100 samples per second/16 bits:
2 bytes/sample * 44,100 samples/sec = 88,200 bytes/sec
88,200 bytes/sec * 60 secs/minute = 5,292,000 bytes/minute
5,292,000 bytes/minute * 2 channels for stereo = 10,584,000 bytes/minute or approx 10MB/minute

How a computer allocated storage for files may not represent this number exactly, but this is the math.
 
masteringhouse said:
at 44,100 samples per second/16 bits:
2 bytes/sample * 44,100 samples/sec = 88,200 bytes/sec
88,200 bytes/sec * 60 secs/minute = 5,292,000 bytes/minute
5,292,000 bytes/minute * 2 channels for stereo = 10,584,000 bytes/minute or approx 10MB/minute

How a computer allocated storage for files may not represent this number exactly, but this is the math.

I'm at work right now but my numbers went something like a 5:01 stereo wav file was 51Mb, and a 5:00 file was 58Mb. Also had a 3:13 song that was smaller in size than a 3:01 song.

I don't doubt your math but I do have a hard time believing that 3 minutes of silence is going to be the same size as 3 minutes of a 10-piece band. On the other hand, I'm struggling with how sonic content is represented by digital data.
 
EddieRay said:
I'm at work right now but my numbers went something like a 5:01 stereo wav file was 51Mb, and a 5:00 file was 58Mb. Also had a 3:13 song that was smaller in size than a 3:01 song.

I don't doubt your math but I do have a hard time believing that 3 minutes of silence is going to be the same size as 3 minutes of a 10-piece band. On the other hand, I'm struggling with how sonic content is represented by digital data.

Eddie -

As far as storage (for this audio format) it doesn't matter if it's silence or a 10 piece band. The 2 bytes of the sample represent an amplitude of the waveform at that particular sample (or 1/44,100 of a second). Same amount of storage either way. Maybe you're not trimming silence from the start and end of your wav file so that even the the music is a particular length of time, the wav file actually represent a larger time interval.
 
I don't know if this was straightenned out or not, I only read most of the posts.

Most things DSP are based on Fourier, but that has little to do with the basic concept of digital audio.

Speakers produce sound by moving in and out. It's creating complex waves in the atmosphere which we interpret as sound. When the speaker moves forward it creates a high rpessure area, when it backs off it creates a low pressure area. The rate the speaker pushes and pulls relates directly to the frequency of the sound it produces. 20,000 complete pushes and pulls a second being about as high a pitche as we can hear.

Digital audio is just a series of distances, negetive and positive, out and in, that "imply" where the speaker cone is supposed to be at any given time.

Basic stuff, but if you get how a speaker is making noise in the room, a digital audio signal is pretty obvious.

I said imply, because the sigital signal is not supposed to be interpreted directly into speaker cone positions, or pressure variation. Take a 22049 hearz signal at maximum volume. The wave file would look like one dot at the top, one at the bottom, one at the top etc...there's no information in the file to produce a smoothly oscilating sin wave. This is inferred from the data by the converter. Some guy called Nyquist has a theorem that says sin waves up to almost half the samplerate of a wave file can be extracted properly. IN practice we get close.

The Fourier stuff comes into play with mp3s (and a zillion other things). I THINK this is how it works...

Take, oh say, 5 ms of audio and fight with it until it's a frequency graph and not a wave form anymore. Take half of the freq bands, the loudest ones, and throw the volume of those frequencies into a file,throw the rest of data away. You now have a file that's half as big as the one you started with.

The decoder does the opposite, it reads chunks of 5 ms of frequency data at a time and generates smooth sin waves to listen to.

This building blocks of this process are called the FFT (Fast Fourier Transform) and Inverse FFT. This is an obvious simplification, but I'm pretty sure that's the basics of how it works.
 
EddieRay said:
Nope. It's just like I said. In fact I have several files that are smaller yet longer than others. File size does not appear to be directly related to length of time.
The file sizes are directly linked to the length of the song. Yopu must have different sample rates or bit depths going on. (we are talking about wav files, not mp3s right)
If there are 44,100 16bit samples a second you would have 88,200 16bit samples in 2 seconds...etc It really is simple math, the content of the song has nothing to do with the file size. The sample rate, bit depth and length determine the size of the file.
 
storing silence takes as much space and storing music. If you recorded 3 minutes of nothing on a cassette tape, it would use just as much tape as 3 minutes of an orchestra.
 
kylen said:
Doug H - are you OK ? :D
heh, I guess the topic has moved on, I saw fourier and some other things mentioned a few posts back and had to weigh in.
 
Farview said:
storing silence takes as much space and storing music. If you recorded 3 minutes of nothing on a cassette tape, it would use just as much tape as 3 minutes of an orchestra.

Now THAT makes sense. Duh!
 
This is something I struggled a long time to understand and finally made sense to me when someone explained it to me this way.

Imagine a rowboat sitting in the water, the level of the water on the side of the boat can only be a single height at any given time. You can drop pebbles into the water making small waves, which makes the water level go up and down and the more pebbles you drop at a time the more complicated the up and down motion, but it's still only a single height at any point in time.

Sound is like that with your ear, it doesn't matter how many instruments are playing, how many people are talking. The sound hitting your ear can only be one value when it hits your ear (just like the water level on the side of the boat, only here it is pressure on your eardrum).

And if you've wondered how something as simple as a speaker can reproduce any kind of instrument or sound, it's just this process in reverse. The mistake is in thinking that because all these different instruments with their complex sound average together to make a single value at one moment that all that information is retained. All the individual information is lost and can't be reconstructed with any certainty. Just as you could go around your block and find out how much all of your neighbors make a year, if you averaged all the incomes together you would get a single value that even though it is an accurate measure and was influenced by each person's income, doesn't allow you by itself to know how much anyone made, how many people there were, etc. So a WAV of an orchestra playing is an accurate measure of the sound they produced, but doesn't tell you anything with certainty about what instruments they were playing or the acoustic properties of those instruments.
 
ok, this is how it works, at least in my understanding,

PART A - sample rate

any individual "sample" contains one sampling of voltage. as in how much voltage. a waveform on a wire is represented by a varying voltage. so, any single sample contains the information for an amount of voltage.

by itself, one sample cannot represent a waveform. it takes more than one sample to make a waveform.

This is why it is said that the highest frequency that can be represented by a 44,100 sample rate is 22,050. in this scenario, each 22,050 hz waveform is represented by two samples: a peak and a trough.

If the freqency were 11,025, each waveform could be represented by 4 samples. and if it were 5512, it could be represented by 8 samples, which would give an even better estimate of what the waveform once was.

PART B - bit rate

the number of bits is relative to the detail of the voltage reading. this is confusing, because it is also the range of voltage reading.

each bit = 2 possibilities...

so 1 bit = 2 possible voltage samplings: 0 or 1

2 bits = 4 possible voltage samples: 00, 01, 10, 11

3 bits = 8 possible voltage samples: 000, 001, 010, 011, 100, 101, 110, 111

etc, etc.

so, it can be said that each bit "doubles" the amount of samples.

well, in decibels, each 6db "doubles" the volume. this is convenient, because each bit can now represent 6 db.

16 bits = 96 db range and 20 bits = 120 db.

here is the cool part (IMO).

TIMBRE
-------
why does an "A" note played on a piano sound different than an "A" note played on a guitar? Why does the "A" note on a les paul sound different than an "A" note on a strat?

the answer is the harmonic structure. The varying relative degrees of the harmonic structure is what determines the character of the instrument. Its a fact, look it up.

what this means is that if you have an A:440,

the detail of the quality of the note is going to be present in frequencies up to the range of human hearing and higher (as we know it has been proven).

This is why 44,100 was chosen as the "standard" bit rate. It is the range of human hearing x 2 (remember 2 samples needed to make a waveform).

PUT IT ALL TOGETHER
---------------------

more bits = more db right? well, this also translates to more detail. the detail of an instrument is in its harmonic structure and these harmonics are at generally a much lower volume than the fundamental notes. this is why capturing the low-level sounds are important. low volume material = more detail.

because its not only a smaller level you can read, but also a smaller change in sample values, giving you more detail. everything is in ratios... so if a very loud sample is 1 and a very quiet is 1/1000 but more bits could get you 1/10000 (even lower level), then that means you just gained all of the values between .9990 and .9999.......each bit is like adding a significant digit. you just get more and more detailed...


I'll try to post some diagrams.
 

Attachments

  • bitrate.webp
    bitrate.webp
    3.2 KB · Views: 38
  • mixing.webp
    mixing.webp
    15.2 KB · Views: 34
Last edited:
OK, didn't read the whole thread, but to answer the original question...

You cannot take an (1) individual sample and ask "what does this sample sounds like?" Taken in isolation, it doesn't sound like anything. It is the difference from one sample to the next that matters. It is these changes over time that end up describing a waveform which in turn describe the various instruments.

Incidentally you need even more time to figure out the sounds of different instruments in a mix. For example, take a mix and loop 1ms of it... what does it sound like?
 
So if I understand this right (given 16/44.1 rates), one sample will have its own 16 bit word to describe its amplitude, and there will be 44,100 of them per second, per track.

An analog tape machine might be capable of running at 30 inches of tape passing over the head per second.

A millisecond is 1/1000 of a second. (eg. 500 milliseconds is half a second)

So one millisecond of sound would be equal to 44.1 samples, or 0.03 inches of analog tape at 30 IPS. A length of tape around the size of a human hair (given that it's 0.003") would be able to hold the equivalent of around 4 samples.

One sample would last for a duration of around 0.0227 milliseconds (roughly).


Or something.



sl
 
Back
Top