In regard to monitors, I'd go the JBLs. You won't regret it, especially monitoring keys. Low E on a bass guitar is around 42 Hz. 5"s won't get close to that and your keys go even lower. I'd suggest 8" drivers.
As for your room, early reflection won't be a big problem, but you can clap and whoop to find the resonances and the overtones you'll want to treat, or better yet, you can get physical:
At an air temperature of 64 degrees F, sound travels at 1123 feet per second, or 13.476 inches every thousandth of a second. If our room is 12' x 12' (144" x 144") with 9' (108") ceilings, and (for the sake of simplicity) we plop ourselves down perfectly in the centre of it and play a single, stacatto note, this is what will happen: The sound of the note will take 5.343 thousandths of a second to hit each of the four walls and take roughly as much to bounce back to the point of origin, that's 10.69 thousandths of a second, total. The same sound will reach the ceiling in 4.005 thousandths of a second and return in as much for a total of 8.01 thousandths. These return rates, being unequal, will be translated as room sound. Wall reflection 10.69 divided by ceiling reflection 8.01 is 1.3345817. All of those decimals mean that the reflections do not return to the microphone at an agreeable return rate.
Stay with me, because I'm really simplifying this for the example. The example assumes that we are sitting dead centre of the room...which will rarely be the case in reality. If we move in any direction, the return results will vary from several milliseconds sooner or later. And the speed of sound increases as air temperature rises, so at 70 degrees F it would be 1180 feet per second.
On top of that, a room of this size will actually resonate at two different fundamental frequencies: The distance from one wall to the next, and the distance from the floor to the ceiling. These two fundamental tones which the room dimensions correspond to will be accentuated, as well as the first and second overtones. In other words, most rooms are 'tuned' to at least a couple of tones because architects and builders have a tendency to work with 90 angles, meaning that
the other walls are either perpendicular or parallel to one another. If the room is rectangular, rather than square, there will be three fundamental tones, each corresponding to the length, breadth, and depth of the room. Oh, crap!
Remaining on the subject of our example room, here's how we find those fundamentals:
Where ƒ represents the frequency in question, and C represents the speed of sound, merely divide the width of the room into C to arrive at ƒ. That is, 12 (room width in feet) divided into 1123 (s.o.s.) equals 93.58 Hz. That is close to F# below C3. Using the same formula, ceiling to floor measurements correspond to 124.77 Hz which is B below C3.
Hope that helps!