when you say "the peaks above the wall at about 15k" do you mean every peak 15k and up that are clearly crossing the thick rectangle??
Pretty much, yeah. There may be a small occasion here or there that isn't directly related, but most of what you see in that spectragram that rises above about 15.5k or so corresponds to a sibilant or percussive moment in the vocal track.
If you look at the FFT (which I did also) you probably won't see much that's immediately recognizable as untoward sibilance in the "typical" range (around 6-8kHz); the area is filled with peaks and valleys, but nothing really jumps out at those key points in the vocals. This matches the spectragram which also shows no noticeable concentration of energy in that frequency range at those points in time (or anywhere else in the clip, FTM.) More importantly, it matches what I hear, which isn't the typical harsh, fuzzy-sounding sibilance, but rather something with more of a colder high frequency sizzle.
Back on the FFT, you can visually confirm this by looking up at about 16k and above. As with most MP3s, the FFT slopes down sharply and crashes to zero right around 15k or slightly above. But when the sibilant or percussive parts of the vocal hit, you'll see the area just to the right jump with some smaller but significant lobes of energy.
Once again, I gotta ask you what YOU hear. Are you hearing this on the original WAV of the vocal itself, or do you not hear it until you sum your tracks into a stereo mix, or is it in the MP3 encoding that it appears?
EDIT: OK, I just took a listen and look at the vocal-only WAV file. It is extremely rich in high frequency, all the way up to about 22k. At the sibilant points there does seem to be some peaking around 8-10k - which in itself is on the high end of typical sibilance, but even more striking IMHO is the amount of energy *above* 10k at those points. Simply put, that is an extremely bright recording.
Attached is an FFT snapshot taken at about 24.9 seconds into the file. Note the wide "peak" extending from about 8k all the way up to about 14k, and the fact that though things drop off from there, there is still quite a bit of energy all the way up to 22k (the end of the chart).
I'm not sure of the cause at this time, but I'd still start with trying what I suggested on the MP3, except on the vocal WAV itself, and that is to low pass at about 10k or so with a fairly steep slope to take out a lot of that HF energy.
G.