What Would Be Your Ideal File to Receive for Mastering?

Anderton

New member
I've always recommended that anyone submitting a file to a mastering engineer leave some headroom to make sure there isn't clipping, not use a maximizer in the master bus to squash the dynamics, leave the fades to the ME, don't add dithering in case files needed to be crossfaded, etc. However when I said this in the GearSlutz mastering forum, a mastering engineer said I knew nothing, this was terrible advice, he felt sorry for anyone who attended any of my seminars, that I was just in it for the money, that I had no right to say this given that I wasn't a full-time mastering engineer, etc. etc.

I don't believe he's correct, but hey, I'm always willing to learn something. Maybe times have changed. For example, I always used to ask for at least a 24-bit file at whatever sample rate they used for the original project. Maybe 32-bit or 64-bit files are the norm now...I don't know.

So mastering engineers, if someone handed you a file to master tomorrow, what characteristics would you want to see?

Thanks for any input you can provide.
 
24-bit at the project's sample rate or a multiple. Some "natural" headroom (non-limited), really any amount. Non-dithered for the most part.

Nothing really wrong with anything you said...
 
It's unfortunate that a lot of people don't have a better understanding of dither. When to use it, why and what kind.

Any time you quantize something to a fixed point format it should have dither on it. Your A/D converters should be dithering when you record tracks. Once you're in a DAW, the mix engine will generally be floating point math which can't be dithered. There's no way to do it. If you use hardware inserts for external effects processing or something, the output should have dither on it. When you're monitoring the mix the master buss should have dither on it. When you render a mix the output should have dither on it, either via a plugin on the master buss or an internal process of the DAW.

The reason is because when you quantize to a fixed point format (16 or 24 bit fixed for example, or even 48 bit fixed point as very old Pro Tools systems used to be) something happens called truncation. Truncation does 2 things to the audio, and dither counteracts both of those things. The first thing is, when the waveform is very close to the zero crossing (meaning at a volume or power level of very close to, but not quite silence) there isn't enough resolution available to accurately set the level, so it gets rounded to the closest available spot. It causes horrid distortion with a spew of nasty harmonics. This happens at an extremely low volume level, so you'd have to amplify a 16 bit wave several times beyond normal levels to hear it directly. When you apply dither, the first bit (or Least Significant Bit) gets controlled by a random probability generator rather than the actual audio signal. Dither adds a sound very much like white noise or tape hiss. It also traps, de-correlates and gets rid of the truncation distortion.

The other thing that happens is, if you're reading a page of text without an obsticle in the way like your hand, you can see all of the text. All of the information is intact. If you put your hand in your field of vision over part of the text, you're effectively blocking some of the information. This is truncation. If you flail your hand around rapidly in front of the text, your hand is still in the way but you can read all of the text. None of the informaion is lost. Audio is the same. If you try to record a sine wave in 16 bit format, the 16 bit format has a range of 96 dB. If you set the level of the sine wave to -100 dBfs and allow it to truncate, you'll get silence. Nothing. Digital black. Your signal is out of range. If you dither it, you'll hear white noise. Inside the white noise, you'll hear a sine wave at -100 dBfs with no distortion. Truncation throws away low level information. Dither preserves that information and allows the system to have linear transfer. This is very similar to what bias does on an analog tape.

If you're in a 24 bit or 48 bit fixed point format, it's impossible to hear quantization distortion or dither noise. It happens at a signal level beyond the thermal limit. Even in 16 bit, the noise and/or distortion is at gnat fart signal levels, a few orders of magnitude less than tape hiss. The point of it is linear transfer. Truncation prevents it. Dither enables it. It's going to have an effect on the audio signal where any kind of infinite process is used. A good example of an infinite process is reverb. Spatial cues, panning width, the depth, size, punch, separation and space of individual sounds - these are all things that get negatively affected by truncation and preserved by dither. People have made the observation that analog audio, like a nice piece of vinyl on a good system, sounds deeper, wider and less fatiguing than digital audio. Dither makes digital sound a lot more like analog.

So if you don't dither at any of the key points like when you print the tracks, run hardware inserts and render the mix, truncation gets baked into the signal. It can't be removed after the fact.
 
There actually is a form of "dither" in floating point math known as "denormal" noise. It seems to be actually built into the underlying architecture of most of our computers and coders have to do things to override it if the don't want it. Basically, when the bottom bit should be somewhere between 0 and 1, you can either round up every time or round down every time or you can just set it randomly and most of the time the "noise" from randomization works better than rounding.

To the OP -

Mix until you're done and happy with the sound. Leave nothing for the mastering engineer to do. If you don't trust yourself and are leaving things for the ME to fix or finish, you're doing everybody a disservice.

But if you're mixing a relatively diverse collection of songs one by one, maybe in separate sessions, it probably is best to leave the final decisions about absolute levels and dynamic range to the mastering stage.

So like, if you've got some compressor on your Master bus and it's snorting and wheezing and sounds fucking awesome, leave it on and render through it. If the whole thing sounds good but you're adding a limiter or clipper just to get the integrated loudness up to some standard, I'd say leave that off.

Then render to floating point and don't worry at all (!!!) about the levels. Absolute meter readings - peak or RMS - mean absolutely nothing as long as they fall somewhere between like -300 and +300. If the ME needs to turn it up or down, they will. As long as it's not actually clipped, anybody who claims it's "too loud" or doesn't have enough "headroom" is actually just incompetent. Even rendering to 24 bit, you've got an absurd amount of dynamic range available. As long as you don't run out of headroom (peaks over 0dbFS) and the noise in the mix is louder than that of the 24 bit floor, it doesn't matter at all whether it peaks at 0 or -12. Digital gain/attenuation is perfectly clean and noiseless and practically free. Still, use floating point just in case something does go over 0dbFS. Then the ME can decide how to smash that peak down rather than letting the truncation just chop it off.
 
I think that noise in general is part of what's "missing" from those sterile digital mixes. Adding noise actually randomized the disrtion at both ends.

The "quantization error" is exactly what I said. The correct value is somewhere between 0 and 1 but we have literally no way of knowing what it should be else we'd probably round to the nearest and he even better off. Instead, we flip a coin.

But - especially when we're using a "curvy" nonlinear processor to handle peaks and set our ceiling - randomizing that one bit makes a real (if extremely subtle) difference in what comes out. It changes the character of the distortion.

Then, I grew up listening to cassette tapes, and in vinyl it's the same thing. There was always that (not actually) subtle indication that you're still listening to the album that you get from the surface noise of the medium. The hiss between songs tells you that there's more to come, and when it stops you know you at least have to flip it over.

So I tend to add a rather large amount of noise in my full album renders. Literally just a noise generator parallel to the actual tracks, mixed in across the entire timeline of the album, through the master chain. I usually do have tiny fades at the heads and tails of individual tracks to avoid clicks, but they're not long enough to notice that the hiss dropped out.
 
If I recall correctly, Pro Tools before it went floating bit was 40 bit fixed point.

It was 48 fixed point. There was a thread several miles long somewhere (probably the DUC) with people complaining about the sound. Basically when you started the mix it sounded fine as long as you didn't do anything. As you got deeper into the mix, it sounded progressively worse. Eventually as a result of that thread, Pro Tools gave everybody the option to use the 48 bit fixed, "self dithering" mix engine. It sounded great, and everyone was happy.

Then Pro Tools went to a 32 bit floating point mix engine. The hard learned lesson on dither got forgotten when they said not to bother with dither on 24 bit files in the manual. To be fair, it can be difficult to hear the difference depending on the source material if you're in 24 bit. I wouldn't leave that to chance.
 
I have never had a problem sending 24bit 44.1 files for mastering, and if people want me to master as long as it's a wave file and not an mp3 all OK.

Alan.
 
Everything is eventually converted down to 44.1 16bit. Even lower when its made into an MP3. So, that's all I need. Note - unless its mastered for itunes but that's not the point here.

example - It's like if a video had to be converted down to 720. Me working with 1040, 4k, a million K video is all irrelevant. If I was converting to 720, all i need is 720.

This is just common sense...
 
Everything is eventually converted down to 44.1 16bit. Even lower when its made into an MP3. So, that's all I need. Note - unless its mastered for itunes but that's not the point here.

example - It's like if a video had to be converted down to 720. Me working with 1040, 4k, a million K video is all irrelevant. If I was converting to 720, all i need is 720.

This is just common sense...

There are good reasons to do processing at higher resolution than the delivery format.
 
Yeah.... I don't want 16-bit. And I tend to work in - well, "unlimited" resolution (analog) much of the time. But upsampling is pretty common to get rid of aliasing that can happen along the way and many plugs that don't upsample natively just tend to sound a bit (nicer?) at multiples. And 44.1/16 might be *a* delivery format, but it's rarely the only one... Same with video. Same with photography. There may certainly be a point of diminishing returns - and I'm not one of those "everything should be recorded in DSD" people, but to start at the lowest common denominator isn't usually the wisest move.
 
Last edited:
Mix until you're done and happy with the sound. Leave nothing for the mastering engineer to do. If you don't trust yourself and are leaving things for the ME to fix or finish, you're doing everybody a disservice.

That's fine, but the question was about what you want to receive as a mastering engineer, where you don't have control over the mix. I've received files that had clipping, files that were rendered at 16 bits without dithering, and files that registered as 0 but had significant intersample distortion (like +3 dB!), so whoever mixed the thing didn't realize that even though the file itself wasn't distorting, they were mixing while monitoring a distorted signal through the D/A converter. If you trust the people doing the mix to deliver you something that's incredible, great. My experience is that isn't always the case. I ask clients to send me a version with master bus processing bypassed and another with their processing included. Often, I'll find the processed one has a stereo limiter. Just replacing it with a multiband one often makes the client much happier.

But if you're mixing a relatively diverse collection of songs one by one, maybe in separate sessions, it probably is best to leave the final decisions about absolute levels and dynamic range to the mastering stage.

Spot on. The most recent album I mastered was Trigger, by Bryan Ferry's guitarist. It had been recorded in multiple studios, by multiple engineers. I didn't ask that there be no master bus processing, but the engineers delivered that anyway. The mixes were all good, but they didn't hang together in a unified way at all. Mastering solved that issue. I've had the same issues with compilation and live albums. I suppose purists could argue that if the cuts were wildly different, then that's the way it should be. I just don't happen to agree, but of course if that's what the artist wants, that's what they'll get.

Then render to floating point and don't worry at all (!!!) about the levels. Absolute meter readings - peak or RMS - mean absolutely nothing as long as they fall somewhere between like -300 and +300. If the ME needs to turn it up or down, they will. As long as it's not actually clipped, anybody who claims it's "too loud" or doesn't have enough "headroom" is actually just incompetent.

Again, I'm not delivering the mix, but I wish the people doing mixes followed your advice. I see clipping a fair amount, if not baked into the file, then levels that foster intersample distortion and therefore has the potential to influence the mix because people aren't hearing an accurate representation of the music. So I advise people to leave some headroom, which just about guarantees they're not going to run out of headroom when the signal hits the analog reconstruction filter in their audio interface (not to mention that the most non-linear part of D/A converters is at the very lowest and very highest levels - I know a lot of engineers, with platinum records on the wall to back up their opinions, who never peak signals much about -6 or even less because they swear it sounds better).

Still, use floating point just in case something does go over 0dbFS. Then the ME can decide how to smash that peak down rather than letting the truncation just chop it off.

That was actually one of the main points I was curious about. I've always been happy to receive 24-bit files, but I wondered if MEs were starting to ask for 32-bit or 64-bit FP. I don't know the answer to that one.
 
Last edited:
But upsampling is pretty common to get rid of aliasing that can happen along the way and many plugs that don't upsample natively just tend to sound a bit (nicer?) at multiples.

Yup. With virtual instruments, if people record at 96 kHz, the foldover distortion usually isn't a problem. But at 44.1/48, it definitely can be. When I can hear it, I recommend that the client go back to the mix, save the synth preset and export the MIDI track, and open a 96 kHz or 192 kHz project. Then, import the MIDI file, load the synth and preset, then render at the higher sample rate. This insures that the foldover distortion won't get baked into the audio range when they resample back down to 44.1/48 and bring it back to their original project. It can make a really huge improvement with some synths and amp sims.

Also, I've found that offline oversampling can often produce a better sound with virtual instruments than real-time oversampling. A software engineer explained to me that this is entirely possible, because the algorithms that are designed to work in real time have to cut corners compared to the ones that have the luxury of chewing up as many CPU cycles as they want.
 
Back
Top