I've always viewed it as "your playback system is what it is."
I don't see any particular reason why you can't get a good mix out of any relatively accurate playback device, be it far-field loudspeakers, near-field loudspeakers, headphones, or earbuds. The only condition I would place on it is that the playback device needs to be reasonably accurate in both the transient (i.e., accurate temporal-shift representation) and steady-state (i.e., accurate frequency response) domains. Of course, that's the killer right there.
I've always maintained that on the low-budget, you can get a better-sounding result out of a pair of headphones than a similarly-budgeted loudspeaker. I say "budgeted" because speakers interact with the listening room, which might require more money to adjust (or not, as Glen mentioned).
Before talking anecdotal evidence, I just want to point out that no system is perfect.
For headphones, you have the benefit of low-volume output requirements, typically lower costs, more distinct spatial imaging, and zero room interaction. On the flip side, you also have to contend with generally less accurate response, faster fatigue, the lack of joint-listening capability, and zero room response (yes, it's both an advantage and a disadvantage).
For loudspeakers, typically the converse of the above is true (since the contrast was between the two), though generally speaking there's an additional element of relative fragility: it's easier to cause mechanical damage to a piece of furniture that weighs 20+ pounds and has exposed moving parts than than a lightweight accessory that generally has it's moving parts tucked away. I know what you're thinking, but this is important when you have particularly small locations (like some home studio environments), where you might have to consider that you could accidentally nudge/bump your speakers while moving gear (or yourself) around.
Anyway, for the anecdotal part (to fit in with everyone else): I've done listening tests in-store for both headphone and near-field playback devices.
For my money, my $200 (new, typical street price)
Beyerdynamic DT770's hold up at least as well (typically better) to my scrutiny than any commonly manufactured new speaker pair on the market today under about $800/pair (new, street price). To get to speakers that I really thought out-shined those headphones, I very quickly found myself in the $1000/pair range. Given that room treatments were also out of the question for me at the time (in college and constantly moving between rented apartments), there wasn't really any question how I should go about getting my system in place. Even if they weren't, room treatments could have easily totaled an additional cost equivalent to the headphones in the first place.
Have all my mixes been great? Definitely not. But I'm more apt to believe that the fault lies mostly with me and my ears than with the deficiencies of the playback system (sometimes the mixes have been good, also). I have never heard a particularly convincing argument that convinces me that there is some inherent deficiency making it impossible to mix well on headphones. Certainly there's arguments for why it's not
preferable, but that's not really the topic of interest.
As always, YMMV.