I'm not really in total disagreement with all that you said except that one has to ask is there something there that I may not be hearing today for a variety of reasons?
What I *honestly* don't understand, Tom, is how an ME can perform their job for what they "might not be hearing". This makes little sense to me in a couple of ways:
First the whole idea behind all the ear training, mastering suite design, $10k audiophiliac monitors, boutique gear, etc. is to hear what is there as accurately as possible, certainly with far more accuracy than the highest majority of playback systems and ears out there in the field. If you can't hear it there, how can it be heard anywhere else? That's what us mix engineers are paying you mastering guys for

.
Second is the assumption that what one doesn't hear is an improvement over what one does hear, or at least would be if only one could hear it. This assumption basically says that dithering is *always* - 100% of the time - better for the client and the production than non-dithering is. In this modern world of production values where no property of the production from dynamic range to signal distortion is sacred or immune from the fangs of the producer or client, how is it that dithering is always the appropriate sound and answer all the time for every production? Especially in those cases where one's ear cannot verify it? I can't think of one other single processing effect that's done strictly for sound quality reasons that is applicable, necessary and desired 100% of the time.
BTW one last comment. Dithering is more than adding noise, it helps to remove harmonic distortion caused by quantization.
Technically, it IS noise, just not in the traditional analog sense in which we're used to thinking. Remember we're talking digital information here, and this is all an offshoot of information theory, which roughly defines noise as any signal that does not itself carry any information.* This noise is actually another form of quantization distortion that is added to the QD of truncation. This additional distortion can sometimes yield test measurements that indicate lower HD numbers, sure, but when I hear that used as an argument for the use of dither, I flashback to the late 70s/early 80s when there was the last big debate over harmonic distortion.
That time the debate was in the form of HD measurement specifications in audiophile amplifiers, integrated amps and receivers. There were spec wars back then that kept pushing HD numbers lower and lower in this gear. It wasn't unusual to find a mfr boasting a HD level of 0.0005% in expensive full-page advertising campaigns and claiming that they had to sound better than their competitor who's amp spec'd out to a full 10 time noisier at 0.005%. It was all eventually deemed to be a bunch of baloney when rigorous testing showed not only that the audibility of HD was entirely frequency- and content-dependent, but that on average anything below 0.1% or so was either completely inaudible or so swamped by S/N or intermodulation (IM) distortion or any of a raft of other factors as to be rendered meaningless.
OK, Tom, I can completely understand the audiophiliac desire and ethic to which many mastering engineers subscribe to make things a good as possible. But there comes a point where it can become pragmatically unnecessary. We're lucky with dither in that exercising the process is the matter of a non-destructive mouse click, so it certainly doesn't hurt to give it a try. But to default to it, flipping bits when they simply don't matter (when you are not sure of what you hear, even with the very, very best of gear and training), just is not a pragmatic choice, IMHO.
-----
One last analogy. We often hear the term around here that "this is not rocket science" (or, jokingly, "rocket surgery"

.) What if it were? Then we'd obviously be using every last decimal point of precision we could muster and get things as right as possible, right?
Not necessarily.
The amount of "correction" (actually more accurately, "fudging") that dither applies to truncation is akin to the difference between the accuracy of Newtonian mechanics and Einsteinian relativity. By using relativistic equations, we can achieve a level of accuracy in our calculations and theories we simply cannot get with the old apple falling from the tree of Newton.
So when the engineers at JPL and NASA program their computers that control their spacecraft to the moon, Mars Jupiter or even Pluto, they load their navigation software up with Einstein's equations, right? NOPE. It's all the same basic Newtonian
force=mass x acceleration stuff, used as well for the Mars Pathfinder mission controllers as by WWII artillery officers. The level of refinement offered by relativistic mechanics simply is not needed at the scale and speed of spaceflight (not until we figure out warp drive, anyway

)
Nor is (usually) the refinement of dither often necessary in the macro world of musical reproduction. Perhaps when working on laboratory calibration and refined scientific experiment-quality audio, such refinement matters, but without it the music will get to our ears just as fine as the Voyager probes made it to the outer planets.
If you guys want to continue clicking that mouse, be my guest. I just ask that anybody who does be honest with themselves as to why they are doing it.
Is it actually because it is a real improvement in the resulting production? If so, fine. If not, then why?
G.
*Well, to get really technical, most dithering algorithms are not random or even pseudo-random, but rather follw fairly simplistic patters. In this way they do themselves convey the information of that pattern, but it is information that is not directly relevant to, and smudges, the information within the signal to which it is applied. it is in fact this "smudging" of information that folks call the "removal of harmonic distortion" or "de-correlation" of the sample information.