It's all about resolution, or accuracy. Let's take a very simple example:
Take a simple decimal sample set that has an accuracy of 0.01 (1 100th):
5.55, 5.55, 5.55, 5.55, 5.55, 5.55
Now if we were to decrease the resolution of these samples to 0.1 (one tenth) by truncation (lopping off), we'd get:
5.5, 5.5, 5.5, 5.5, 5.5, 5.5
That's pretty close, but we're ignoring the fact that each sample was just as close to 5.6 as it was to 5.5!
Now if we were to use a "dither" technique, we might add a very small amount of noise to the original samples before truncating. Let's add noise that looks like this:
0.05, 0.00, 0.05, 0.00, 0.05, 0.00
If we add that to our original samples, we get:
5.60, 5.55, 5.60, 5.55, 5.60, 5.55
Now let's truncate these samples:
5.6, 5.5, 5.6, 5.5, 5.6, 5.5
There, that's a much more accurate representation of our original sample set. Now in the real life dither we're talking about, it's a thousand times more complex than this, but the principal is still essentially the same.
Another way I've heard dither described that's pretty cool is this:
Hold your hand up in front of your monitor with your fingers spread. You'll obviously notice that much of your monitor is blocked and you no longer have a good view of the screen. Now move your hand back and forth very rapidly. All the sudden you've got a much better view of what's on the screen, even though your hand is still blocking it!
Slackmaster 2000