OK, class A. class A means that the amplifying device (transistor or tube) never turns off, within the intended range (ideally the power rail). Let's take a single transistor as a basic example. Let's say we have a +/-16V power supply, and the input of the transistor is biased to ground (0V). That's a class A amplifier. Let's presume a load of 8 ohm so we pretend we are driving a speaker (that's a bit more complicated but bear with me).
At idle, that is no AC input current, how much is our transistor conducting? Using round figures, the "emitter" of the transistor will be at 0V also (actually -0.7V below the input, or "base", but let's keep it simple). That 0V goes to the load of 8 ohms, then to the negative rail of -16V. The other leg of the transistor (a transistor is a three-legged device) is called "common", that is what we hook +16V to.
OK, current, using Ohm's Law, is (0V - (-16V)) / 8 ohm = 2 amps. Power is V * A, that's 32 watts.
Or is it? We have a 32V supply (+/-16V). 16V drops across the resistor. The other 16V drops across the transistor. Therefore, total power is 64W, but half is wasted in the transistor.
OK, that's with no AC input. Let's say we have an input signal, a sine wave of +/-16V peak (this would be called 32Vpp, or peak-to-peak). When it's +16V, we have 16V on base and emitter, which means 32V across 8 ohm resistor = 4A which yields 128W! Kickass!
At -16V peak, the transistor turns off. No current, No power.
OK, if we map out all possible points in between and average them (actually "root of the mean squares", where you square each point, average them, and take the square root--this is "RMS" power), we would discover that our RMS power into our load is exactly . . . 32W. And the RMS power wasted in the transistor is also 32W.
Therefore, class A can never be more than 50% efficient--at least half the power is always wasted. In practice, class A amps are worse, and tubes are even worse because they have an inherent power loss in their heaters.
Well, if we want a stinkin' loud amp that doesn't require a dryer plug to get power and doesn't weigh 400 pounds from the giant heatsinks required to let the transistor dissipate half the power without melting, we start to think about a more efficient design.
Let's use two transistors, one from +16V to ground (0V), and one from 0V to -16V. Same 0V input bias, same +/-16V input signal. This is a class B amp. What happens?
The positive transistor will be on only when the input signal is positive, the negative transistor . . . you get it. Since each transistor is only on half the time, for a given power into the load there is only half as much power wasted. At 0V idle, there is no power wasted at all. Maximum efficiency possible approaches 80%! Sweet, smaller heatsinks = lighter unit!
But wait, there's a problem. Our poor transistors can't exactly switch on and off perfectly at 0V. In fact, partially due to that 0.7V problem I referred to above, there is a small-signal range where they'll both be off. That creates a huge amount of distortion, called "crossover distortion", and it's worst when the signal is quiet. That's extra bad.
How do we fix that? Simple, we give back some power, and bias the inputs slightly such that there is a range when both devices are on, just not all the time. That reduces the crossover distortion, and more importantly pushes it to a level where it only occurs if the signal is loud. That's a class AB amp.
.
.
.
OK, which is better? Well, complicated question. In a power amp, we can't ignore efficiency (small-signal stuff is very often class A because efficiency is not important). First, you have to understand that class AB is a range, it could be quite nearly class A except for the absolute maximum possible power, or it could be set right just above that crossover region to get rid of the worst distortion but still maintain maximum efficiency, with a small amount of distortion considered acceptable. Both must be labeled class AB amps, but there could be a very wide range of performance.
Now we have the marketing problem: let's say you build a tube amp that is class AB (it would work a bit differently than the transistor example, but the concepts are the same), but is biased pretty far into class A. If we were an intelligent designer, we'd set that point by measuring output distortion at various levels, and selecting the bias point at a good compromise between efficiency and performance. And the finest audiophile ears might be completely unable to discern the difference between that and a pure class A amp. If the volume doesn't get loud, it IS a class A amp.
But the sales brochure would still say class AB, so nobody would buy it. Therefore, once you commit to AB, it's very tempting to go for a bias closer to class B than class A, because that gives you the marketing advantage of lighter weight and/or more power.
So . . . in conclusion, intelligent class AB design is probably best, but there is no way of knowing without trying or testing a particular class AB amp if it's a good or bad compromise.
Compression and breakup have to do with behavior at maximum rather than minimal levels, I think that has more to do with the type of device and overall circuit design than class A vs class AB. For example, the stuff I was saying to VP the other day about push-pull vs. single-ended. Push-pull, whether class A or class AB, is gonna tend to have symmetrical breakup vs. single-ended asymmetrical. Asymmetrical is generally better sounding in terms of breakup (even-order distortion). Then there is the tube vs. MOSFET vs. BJT debate, negative feedback, and so forth. So there's a lot going into a design that impacts those sorts of things.