24 bit / 96 kHz discussion

  • Thread starter Thread starter jgohman
  • Start date Start date
J

jgohman

New member
There are many products selling on the 24/96 feature and I want to start a discussion regarding whether this is overkill in many (but not all) circumstances.

I was a mastering engineer for Sony music for several years and have had many discussions, teleconferences, etc. with company engineers on issues like jitter, sampling rate, dithering, etc. In many cases the discussions were qualified due to the fact that we were a large company distributing DVD's, high profile cd's, and other products.

I am not saying that the home computer recording project studio does not justify the quality of higher sampling rates and bit resolution, but here are my views on why these improvements end up mostly as hype:

1. The sonic improvement of jumping to 24 bits is signal to noise ratio. And it is significant. Only, for anyone with their computer in same room as their monitoring environment, this S/N ratio jump is easily drowned out by the sound of the hard-drive and fans in the CPU box. So, I think that anyone who claims to hear the difference between 16 and 20/24 bit (on the basis of signal to noise ratios) must have one heck of a quiet studio. The car or living room won't cut it either. I am not saying there isn't an audible sonic difference between different A/D converters though. And (for manufacturers) why take 24 bit converters and slap them on the PCI board? I understand: cost. But that S/N boost is foofed (new word) out by the stray RF flying around the motherboard and adjoining PCI cards, let alone the electronics on its own PCI card. (Also, 24/96 requires a huge amount of HD space)

2. The reason to go to 96 kHz sampling rates is mainly for DVD production. How many project studios are doing this? Really, I am wondering- maybe there are quite a few. The other reason to jump to 96k would be to improve Frequency response. The frequency response of your recording will be half of your sampling rate. Can anyone hear to 48k? No. But sometimes upping the sampling rate is justified by moving the brick wall filter artifacts out of our range of hearing. I don't think I can hear the artifacts of 44.1k in my home studio. The artifacts are so low level that the previous discussion regarding S/N applies here. All I can hear is my hard-drive and fans. And my hungry cat.

I'm not trying to sound like a know-it-all. I really would like to see input. We all have views and I want to learn from them.
 
Well I mostly agree with you on the basis of the same principles you've used for justification even though I've never had anything that can play back 24/96 to actually have a listen. However: as to point one: what if I'm using outboard converters and no microphone? I record synth stuff direct alot of the time and I'm hoping my next synth will have 24 bit samples and a digital output. At that point a recording system that supported 24/96 would make a lot of sense. Also any after the fact digital processing would have fewer artifacts due to the fact that a good chunk of them would (or could) be lopped off anyway when sampling down to CD. Many months back in a similar thread I predicted that soundcards would be including 24/96 capability long before they upgraded the rest of the design to match simply because it's easier to market a buzzword like 24/96 than tell people that the design is "higher quality".
 
In defense of some of the members here they have built sound rooms in their basements and keep their computers on the outside. But I'm with you, if you're going to use 24 bit converters get an outboard like the apogee rosetta and keep it away from the computer.

Here is another point. I bet the hearing of 70% of those who do home recording can't even hear the frequencies we are talking about. It would be interesting to see if there is something on the net with which we could test our selfs to see how far out our hearing goes, if we had earphones or speakers that are good enough of course.
 
I think 96K is overkill as well as way beyond the current spec's of a decent computer system can handle (unless you don't need a lot of tracks). I'm finding 24/44.1 hard enough on a PIII800 with lots of memory and 7200RPM drives. But what about not necessarily having to hit -1 or -2 all the time to get a decent resolution? This way your resolution can stay in the 16 - 24bit range without too much grief on the front end. I think that's one advantage of 24bit. Also, most pro-sumer cards these days have the convertors located in the breakout box, away from on any computer generated EMI/RF interference.
 
For those that work at home and bring their material to Pro studios to continue
the work there, I can see justification for working with 24bit.
Many people track at home and mix in pro studios, or as I do quite a bit - Edit at home and export the edited material back. Why spend money on studio time when I can edit at home? Why track a direct sound, if I can do it at home ?.

I have used ADAT's and DAT's for quite a few years in many different studio's and I can tell you that some of the converters on some of the audio cards today using 24 bit, sound better then the older version's (still very popular in many pro studios today) of ADATS as well as some DAT's.
 
DAW or HDR?

My 24 bits,
For over a year I've been puting together PC recording system: TMD1000, IF-TAD Converter, two Pulsar cards, Emagic, Samplitude in the original Athlon 751 chip PC(my biggest problem). I've had some recording success, but many problems.
I recently bought a Fostex D824 HDR(24bit/96K 8tracks), and the sound quality was obviously better than the Pulsar PC. Fostex uses the same converter manufacturer as TASCAM's MX2424, but they are one step up above TASCAM's converter. I suspect the converters are respondsible for the lifelike recording. Since I'm connected to the TMD with ADAT(EBU card is optional for 96K sampling) so my recording was 24bit/44.1K.
Was this same debate going on between 8 bit and 16 bit? One observation I heard was that 96K produces a smoother, warmer sound than 44.1. I guess an anology could be photo resolution. A Nikon digital camera's high cost is due to the higher resolution in both the lens and the bits.
The 16bit/44.1 HDR's cost around $1000. The 24bit/96K D824 cost $1500(It's best to get it for $1100 without the HD and put it in yourself).
It's always tempting to rationalize getting the less expensive soon to be outdated equiptment, but 24bit/96K is the next standard. As for eating HD space, computer power and HD space are doubling every six months.
My 24bits worth
Chuck
 
I like the argument from Emeric about not having to hit -1 and -2 to get good S/N. There are a lot of valid arguments here. I also understand that 24/96 forces manufacturers into tighter design specs that really benefit us all (most of the time)in non-sonic ways.

But, sonically speaking, if you take a 96k sampled wave Vs. a 44.1k sampled wave, you will have to lop off at least half of that with your output lowpass filter in your DA converters. This all relies heavily on the quality of the DA ( and vice versa). I can understand how 96k could sound better if the DA had excellent jitter and filter specs, but really when it comes down to the DA's, we don't have any control over what the end listener will be listening through. So we really need to be concerned about the AD converters right? Emeric, I was led to believe that many of the break-out box systems have the AD's in the break-out box. But I'm finding that many don't. The Delta 66, Aardvarks, lexicon core 2 don't as far as I've been able to tell. If you take parallel signals and shove them through a wire in analog form until they reach the PCI card, there is bound to be crosstalk also. Has anybody had any experience with this?

I don't really think this argument particularly pertains to the 8 to 16 bit argument, because the sonic differences there are painfully apparent. But you're right, I may look back and laugh at my views in a few years.

With sonic quality getting so good on such cheap systems, where do any of you think the next major changes in audio are going to occur? I'm still a stereo purist and hate to think that everything is going to go surround. Laugh at me if you like, but I'll be a fuddy duddy on that point. I haven't heard a surround system that is very realistic. Binaural actually does the trick better as funny as that sounds.

Good stuff, everybody. thanks. Keep it going.
 
Not sure about the Delta 66, the Delta 1010 has convertors in the breakout box. All of Aardvarks products have the convertors in a breakout box. As far as crosstalk, I haven't noticed any at all, in either the Aardvark 20/20+ or the Delta 1010. I'm not sure what crosstalk would sound like in a digital setup. Easy enough on an analog system. I would think crosstalk on the interface cable would show up more data corruption and be easily audible(maybe not)? I was thinking of error correction, but wouldn't that require an elaborate buffer/storage system?. I really don't know... Skippy?
 
I was thinking of the situations in which you have a break-out box but the converters are on the PCI card. In that case, the crosstalk would sound just like analog crosstalk. Know what I mean? Crosstalk is uncommon in digital systems due to the threshhold between high and low, but I imagine it could show up as data corruption. Luckily there is error correction built in to the audio streams.

I know that the 1010 has the converters in the box also, but I heard that the converters for the 66 are on the card. Can't believe everything I read. Wish I could.

Thanks, emeric.
 
Where's it going next?

igohman,
You're right about dropping cost and quality. The D824 uses AKM 128 oversampling converters A/D5393 and D/A4393 for $1100 w/o HD. ($100 & put it in yourself) Wether its the converters or 24 bits, I've never heard recording like this.
As to where recording is going next, I think audio and video digital processing are going to merge more and more. And I think DVD surround will be the center of the experience. With an optional card, Fostex HDRs are equally dedicated to video P2 standards as well as audio.
Chuck
 
I joined this one pretty late but this is in response to the first post. When it comes to quanitization levels I can't tell the difference above 20 bit. But as for sample rate I feel there is a major difference and that difference is harmonics. Harmonics are what make us hear the difference between a 10kHz tone played by a flute and the same tone played by a piano. If you sit in a room with a 30kHz tone pumpin' out the speakers you can't hear it, but your ears sense the change in air pressure which adds to the ambience factor of your music. PEACE
 
Second pass on this article: the first one read badly, and had some errors from typing before thinking...

Igohman is right. However, a lot of people don't understand the "aliasing artifact" thing, or why it matters so much with what you hear- so I'll take a swipe at it, because it is something everybody who plays with this stuff ought to understand.. We're getting into the theory of sampled data systems a little bit, here, so please bear with me... Nerdy jit follows. Sorry if it annoys anyone.

Remember good old Nyquist's theorem? That states that the absolute maximum frequency that can be represented by a sampled data system is 1/2 the sampling rate (Fs). Which means that the absolute max frequency that a 44.1 kHz system can theoretically reproduce is 22.05kHz.

So why can't we use everything up to 22.05kHz? The filters on the A/D/A converters. People who haven't studied sampled-data systems usually don't know why these damned things are so critical.

What Nyquist tells us is that we can't have anything useful above Fs/2. What that *doesn't* tell us (in a casual reading, anyway) is that the reason is that everything above Fs/2 looks just like everything *under* Fs/2, as if it was reflected in a mirror: in the frequency domain between 22.05kHz and 44.1kHz you find your entire signal spectrum *repeated again, backwards*. Yup: your 100Hz boombox kickdrum is there, fullscale, at 88kHz, and that piccolo is kickin' ass up at 48...

This is called the "first alias". The sequence of baseband:backwards-alias:Fs:backwards-alias:forwards-alias then repeats up to infinity in the frequency domain on 88.2kHz intervals. Here's the problem: if you make the mistake of leaving stuff in the input signal that has components *above* Fs/2, that frequency-domain "mirror" created by the sampling process will fold them right down into the passband. A 25kHz sine wave sampled at 44.1kHz will show up in the reconstituted output as a distorted sine wave at 19.1kHz. A 30kHz sine wave will show up as a 14.1 kHz signal- and so on. Gawd help you if you leave a good lump of 88kHz in there, say maybe from the horizontal deflection coils on your high-zoot monitor: your kick drum will have some unintended company at 100Hz when everything is played back. Gotta get rid of all that stuff *before the converter*.

Repeat after me: yucko. The antialiasing filter's job is to get rid of all these components, and keep only the baseband stuff that we can hear (and that we want), so that that the folded-mirrored-aliased-frequency-munged stuff doesn't appear with our good stuff- which would sound like stirfried dogshit on wheels.

Why is 96kHz "better"? Because the antialiasing filter can be much, much easier to design. The filter design for a 44.1kHz sampling rate is an absolute bitch to do right: they are called "brickwall" filters because they have to roll off unbelieveably quickly, and they can't have lumps out in alias land (on the other side of Fs/2), or you'll get all manner of mobile stirfried unpleasantness.

Designing a brickwall filter like that that (no lumps in the stopband to let UHF aliasable crud slip through, and no audible effects in the passband below 20kHz) is a right bitch, *period*. You have to be _on your game_ as a circuit designer to do this. Pretty much all antialiasing filters for 44.1kHz use have ripples (peaks and valleys) or phase anomalies in the passband, and that's one of this things that we can hear: an Apogee's antialiasing filter is much tidier *in the passband* than whatever was built on the motherboard of your PC... Or the PCM501ES that we were talking about in another thread (whose antialiasing filter exhibited ripples that peaked at +6dB at 15kHz, which led to its deserved reputation as "too glassy and bright"...). Yes, you _can_ hear these artifacts- no question about it.

Think about it. The antialiasing filter for 44.1kHz has to drop roughly -60dB in the frequency range from 20kHz to 24.1kHz to be even vaguely useful: call it 180dB/octave? But the 96kHz system only has to do that -60dB drop between 20kHz and *172*kHz: a nice slow gradual 20dB/octave or thereabouts, which is very doable with a lower order filter, like maybe a Chebyshev or even a Butterworth. The benefit? Lower order filters are much easier to do with *no audible artifacts in the passband*.

The 44.1kHz system has a deadband (the difference between the upper frequency of interest and Fs/2- think of it as "no-man's-land") of only 2.05 kHz: a razor-thin margin to design with, by any measure. Great for maybe telephony, where digital audio all started, but bad news for us: passband artifacts are really unavoidable. A 96 kHz system has a deadband of a relatively huge 76kHz, and any artifacts that are left are way up beyond the audible range to boot. Win-win! You can trade off filter order, because you have room to work with. You can build a softer filter, and it will work better. Why build a brickwall, with its passband ripples and phase anomalies, *if you don't have to*?

I claim that the audible advantage of 96kHz systems is entirely a function of the antialiasing filters. Nyquist says that Fs/2 is the limit. It's not just a good idea: it's the law! (;-) But leaving more deadband margin for the circuit designer is the key to the success here. For a given bit depth, linearity of the D/A itself is important, for sure- but the antialias filter is _everything_, and the extra deadband is really the key to that.

192kHz is just icing on the cake: that reduces the filter requirements even further (to 12dB/octave, perhaps, which you could probably even do with a third-order Butterworth). But with what people learned trying to hit the 44.1kHz requirements, 96 is _easy_. The extra deadband between 96 and 192 doesn't really buy you much. Therefore, you won't see anywhere near the audible improvement between 96 and 192 that you see between 44.1 and 96.

Shoot, even a 48kHz system can be a huge win in filter design, with its 4hKz deadband. And now you know why _that_ sounds so much better, in a purpose-built system: the deadband is twice as wide, so the filter slope only needs to be half as steep. With a lower-order filter, by definition the passband ripples are significantly less, and the ripples (if any) start at a higher frequency where they're harder to hear anyway. Even if you just reuse your 44.1kHz filter design at 48kHz, since you already paid for it, you're moving its warts up further away from the passband.

Bottom line: IMHO, 44.1kHz was a mistake. But in 1981, inexpensive A/D/As couldn't really go any faster. We can all blame Sony and Philips on that one!

Does that make sense?

[Edited by skippy on 01-09-2001 at 14:40]
 
skippy said it

and yes, the 96khz samples sound warmer and smoother. The 44.1 might sound perfect--until you hear the same thing at 96khz.

and in a test that I heard somewhere on the internet, I don't remember where, you could always pick out the 24bit samples over the 16 bit samples, cos they sounded cleaner and clearer.

the thing now is: when you are ready to burn your cd and you have to convert everything back to 16/44.1 then does it make a difference? I am yet to find that out for myself, but other people say "heck yes"
 
Skippy- thank you. That was very well written.

I'd like to know (like CyanJaguar) what peoples experiences are with converting back down to 16. It seems that there is actually the potential to distort the signal, however slight that may be. I'm not up on the current algorithms. Any suggestions?
 
Thanks for the kind words! One or two other things about high-order brickwall filters, and then I'll shut up. Forgot to mention these earlier, and that post was too dadgum long anyway...

I looked it up, rather than work from my aging and failing memory: the average rolloff of an antialiasing filter for 44.1kHz use needs to be on the order of 300dB/octave. *Now* I remember why I quit designing that stuff- even thinking about that sort of rolloff rate gives me hives, hairballs, headache, and hemorrhoids (the 4 H's of analog design). You can certainly do it: one quite common design is to use a high-order elliptical filter, like 7th or 9th. A 7th-order elliptical can be made to be flat up to 20kHz, notch out to -90dB by Fs/2 (22.05kHz), and have a minimum stopband attenuation of say -65dB out in the inevitable bumps that show up in the stopband. Not too shabby, and not too terribly difficult. It comes right out of the Analog Cookbook over on the shelf there.

But that circuit design has two problems, right off the bat. -65dB in the stopband was just peachy in the days when mics rolled off at 20kHz, and that's where *all* the signal came from. Nowadays, sound cards live in computers with switching power supplies and big deflection magnets and 1-GHz clock generators that radiate pure electronic _crap_ from basically DC to daylight. You can have huge crud leak in way the hell up in the stopband, be insufficiently attenuated, mirror down into the passband, and kill your noise floor on the spot. This is why serious converters live outside the computer box, and have their own (*very quiet*!) well regulated power supply. Some manufacturers of sound cards do their spec measurements with the card extended outside the box to minimize noise coupling. Do they _tell_ you that? No. That's why I've found a resource like http://www.pcavtech.com/ to be very useful: I can see what a candidate card does in a real system, measured by somebody who doesn't appear to be being given ad revenue....

Surprised? Think about it. Here's an example. That 1Ghz clock ticker you're so proud of is square in the middle of the 45,351st alias of your 44.1kHz-sampled audio stream (45351.4739, more or less) which means it'll show up as, lets see, that's an odd alias so it's upside down... a 11.6 kHz tone. Hearing any unexpected pitched wheezes in your audio? It's probably some repetitive signal inside your machine, leaking past a not-quite-good-enough antialiasing filter, and being "heterodyned down" by the sampling process into your baseband. Computer recording makes it more important than ever to have that stopband be a serious freakin' _stopband_. You could possibly hear your power supply, your SCSI bus, whatever: it's all up to the antialiasing filter to keep it out. The conventional wisdom that "there's no energy above 20kHz in audio" goes right out the window inside a computer cabinet. The audio ain't the problem, in that case.

The steeper the slope of a filter, the greater the *time domain* anomalies, as well. Run a 100Hz square wave through your A/D/A chain, and look at the results. A high order elliptical has a couple of poles real close to that mythical left half-plane (the start of resonant instability): one section needs to *peak* at almost +20dB very close to the corner frequency to buck the rolloffs of all the other sections, keeping the overall response flat out to 20kHz. That section is very high-Q, almost a doublet pattern in the frequency domain (a sharp peak followed by a deep notch). So the little bastard will *ring like a bell* at the peak frequency of that section if you kick it with a transient, like the edge of a square wave. There is no way around it! This is why square waves show ringing in the lower-end filters, and if it's bad enough, you can definitely hear that ringing as a "sheen" on transients. Does it show up in a frequency response sweep? No. It's _noise_, but it is _pitched_. Seriously- I'm not making this up. Better filters avoid this problem by careful and expensive design- or by not being as steep...

Another problem: that +20dB peak is going to kill your headroom, since that section of the filter will get to its analog clipping threshold 20dB before the "real" threshold for the filter system taken as a whole! So the crunch can start early, and you have to make sure that your design does not hit the analog headroom limit *at any point in the filter* before the digital converter hits 0dB. How many soundcards bother to do that? Search me. Whaddaya want for $90? (;-)

Lots of design considerations here. See, *this* stuff I can both measure and hear, so I believe in it. This ain't the voices of the angels, and it's part of the reason I decided to bail on the 4H club and do nice simple digital design instead... (;-) Just thinking about this stuff for the first time in years has already got me started achin', scratchin', and heavin'. Anyway, relaxed filter design is a very compelling reason to dive into a higher sampling rate, if you are a manufacturer that is trying to sell high-end quality.

[Edited by skippy on 01-09-2001 at 20:53]
 
So I lied. I never could shut up.

Lastly, addressing jgohman and CyanJaguar's last question: both sample rate conversion and resolution reduction can be done *without adding significant noise or distortion* to the signal.

They can *also* be done really poorly, and therefore really screw it up in a major way. It's an exercise for the interested student to determine whether a given box does it well or not.

The sample rate conversion can be done without adding to the noise floor at all (by doing upsampling/decimation and then using a digital lowpass filter). A good reference on the algorithms would be the comp.dsp FAQ- a quickie web search led to this conversation:

http://www.hr/josip/DSP/FAQ/28.html

And on the resolution-reduction side, this is a good reference for "why dither is your friend":

http://www.digido.com/ditheressay.html

Anyway, the bottom line is "record with all the resolution you can, in both the time and word-length domains, and then downsample/dither-down *only once*, right when you print your mix to CD".

Near as I can tell, anyway... Your mileage may vary. Let's see what everybody else has to say, anyway. Now I really will shut up.
 
Thank you, again. No need to shut-up. If anybody doesn't want to read it, they can skip it. I found it very interesting.

When you talk about time anomolies in brick-wall filters, do you mean shifts in phase of the frequencies in the sloped area? I guess that makes sense.

This might be a dumb idea: What if I recorded white noise into my system at 44.1, played back two identical channels of the recording- only with one phase reversed and low-passed at higher frequencies. Would I hear some artifacts more clearly? Any other suggestions on how I can test my system without an investment?
 
i'm not half as smart as any of you but DIGIDO.com has really detailed info that even i can understand.
if you want to learn, better read it!

guhlenn
 
Skippy - if you shut up, I'll beat you senseless. You obviously know your stuff and it's been a pleasure to read this thread.

Thanks

/Ola
 
Back
Top