There are advantages and disadvantages to either option, as some have mentioned. The kick-ass thing about headphones is that, for the most part, they'll sound exactly the same no matter which way you turn your head, which room you're in, etc. The reliable repeatability and inherent portability are two important factors, as well as the fact that a decently-isolated closed pair of headphones will allow you to mix both quieter and louder than speakers might allow in a residential situation, and at any time, due to the super-low noise floor (compared to most apartments, houses with kids, etc.
The disadvantages, however, are really impossible to ignore. The biggest issues are the warped sound stage and difference in detail and ambience levels that come with having foam-surrounded speakers bolted right next to your ears. The reality is that most people listen to music using speakers, either at home, or in their cars, or between bands at a concert, or on a "boom box" in their kitchen, or whatever, and these speakers are often in a noisy environment with its own built-in ambience, and are non-optimally placed. Using reference monitors in a room will allow you to listen to and make critical decisions about your mix in an environment much more similar to how your audience will be listening 90% of the time. The reason PROPER room treatments matter, is because if you mix using reference monitors in a non-treated or poorly-treated room, you'll end up listening in an environment that's not representative in a general enough way, and so you run the very real risk of making a record that sounds great at home and on headphones, but inconsistent (at best) or shitty (likely) in many other environments.
As for room treatment, the simplest things to address & understand are also (conveniently) the most important. As there are a shit-whack of fantastic resources out there giving you the specifics of room treatment, I'll give you an incredibly over-simplified version that anyone can understand:
#1 - Just like reference monitors or headphones, a flat response (no super-quiet bass or super-loud mids or anything) is generally what you're going for.
#2 -Small rooms (bedrooms, dens, etc.) sound like shit because the sound waves coming out of your speakers are bouncing all over the fucking place and knocking into each other, canceling each other out or combining into a vortex of sound-vomit. Think of it like two kids bouncing around in a small pool, and your audio is the surface of the water. This is particularly a problem at low frequencies, which is why most home recordings have a shitty/odd low end.
#3 - To fix the low-end, put bass traps into essentially every corner you can fit them in. It's difficult-to-impossible to put too many bass traps into corners (meaning: where your walls meet, where your ceiling meets your walls, where your floor meets your walls, and where all three meet). Even two or three is WAY better than none.
#4 - High-frequency sound bounces all over shit. Want to have a fun experiment that will illustrate this perfectly? Get a ~1'x1' mirror or a piece of glass (CAREFUL!), turn some music on, and move the mirror/glass around near your ear. Crazy stuff.
#5 - When high-frequency sound is bouncing all over shit (y'know, like your walls and ceiling), your imaging gets all shitty and bad. Get that mirror from #4 and have a friend move it along the walls between you and your speakers. Wherever you see your speakers in the mirror, that's where you'll want to put some mid/high-frequency absorbers up. Good foam works pretty well, actual rigid fiberglass traps work better. Make 6 of 'em and put them on either wall, and on the ceiling, and on the wall behind you, in the spots where you could see your speakers in the mirror.
This will get you 75% there, and in a WAY better environment for mixing, and it shouldn't cost you more than a couple hundred bucks.
Friendly Tip: Don't put moving blankets, egg cartons, or too much foam all over the walls, 'cause you'll end up with a dead-sounding room at best, and a smeary confusing turd mess at worse (likely).
Hope this helps!