Latency, yes, that will be a problem, however a "audio router" could solve much of that by significantly prioritizing audio traffic over everything else, much the way routers today can prioritize voice over IP traffic.
Also consider this : ADAT lightpipe (8 channels of audio) is tranmitted at 9.216 Mbit/s, at a 48Khz sample rate with 16 bits.
Also, since one bass player in Japan won't need all eight channels and only one, the audio routers could hack those 8 channels up in a proprietary way, and receive one channel (1.152 Mbit/s) from that bass player. That leaves 7 other channels available for simultanious recording of other musicians.
Most cable modem providers offer business class services which have prioritized traffic as compared to residential customers, and they give you 20Mbit/s bandwidth on their entry level plan. By real bandwidth from a real provider and the latency issue is not gone but at least significantly reduced.
For example, with my bandwidth, if I were to ping "yahoo.com" on any of my servers down in the basement, I get a reply in the 2 to 3 millisecond range.
That's getting very close to what is necessary for real-time recording over wide area networks.
A friend of mine who writes device drivers for a living (for a video card manufacturer) wrote both of us a vista driver whereas the output of his windows mixer can appear as an input device on my windows mixer - so if he plays an MP3 or sings into the mic (badly, but sings none the less), I can move the volume control for that remote device and listen in - at the 48Khz, 16 bit mono quality - one channel of adat.
We put this together and we're just a bunch of monkeys. Imagine what a professional team of engineers working for say, Cisco, could do? I hope much better
I think it's very doable in the near future. I really do.