Closets generally make horrible recording spaces (except in the movies). The only time (okay, there were several) I was able to turn a "closet" into something resembling a vocal booth, I had the drywall completely removed, the studs filled with massive (no pun intended - we're talking kilograms per cubic meter) glass or rockwool (Roxul Safe 'n Sound worked pretty well) insulation, covered the insulation with cloth (furring strips to cover the staples) and added additional layers of drywall to the opposite sides. Still not great - But not too bad either. Higher voices obviously better than lower (because, physics), but somewhat usable stuff.
And we can't access the clips.
[EDIT] But I could access the page -- Definitely "something" going on that I can't put my finger on. Might be a "semi-cheesy" mic, might be a condenser where a dynamic would be more appropriate, might be some upper-midrange comb filtering. Would be interested in knowing your chain and the space (rough dimensions and surfaces) used for the clips.
Now - on the "sensitive" side - I don't do a crap load of VO work anymore, but I've got a few hours in if you know what I mean... Your end T's are pretty good ("devote" for example) but you've got a decent bunch of "fors" that are coming out as "fers" -- Don't feel bad, it's fairly common, especially if you're new-ish to the game. And I wouldn't come in without a "pro-tip" as I suffered from that liability myself.
[PRO TIP] When presented a script on paper, cross out the word "for" and write "4" over it. If you're lucky enough to have a digital copy in the form of a doc or txt file, replace "for" with "four" -- practice the lines once or twice (as they may be visually confusing on the cold run) and go 'fer it.
Once you see those "4's" or "fours" it's like magic. Your brain just pronounces them automatically. And it's a subtle thing - A fast "for" can make you trip up in a long sentence. But it's important to get that "aw" instead of "er" in there. One of my warm-ups was (is) always "one small, two medium, three large, four cream, four sugar" and that eventually made it so I didn't need to make those annotations anymore. [/PRO TIP]
"One small" - reminds the mouth to end one word before the next or you wind up with "onezmall" -- "Two medium" - the end of "two" puts the mouth in the perfect low spot to start "medium" -- "Three large" - the high position at the end of "three" is the exact same position as the start of "large" except for the position of the tongue when it pushes forward for the "L" -- and then the "four cream" (end of "four" again, the exact same position as the start of "cream" except for the back of the tongue) and "four sugar" (end of "four" prepares the *tongue* for "sugar" but the shape of everything else needs to change - then the lips are like a see-saw in motion if you stay on "four sugar" a few times - both lips tense and raise for the "sh" and both relax and protrude for the "fou" - Plus, it's just gets the whole "four" sound in mind. Go ahead everyone, read it slowly. I actually taught this one to my voice coach.
But as I do it less as of late, I've gone back to the "4" and "four" on the script. Just did a bunch of station ID's last week and needed to replace a "for" with "four" after I said "fer" several times (thank Jeebus I do my own editing before anything goes out...).
I don't mean to turn this into a critique session - but that ("fer") was more distracting to me than any audio anomalies you might be worried about. I've done a zillion "less that wonderfully recorded" VO editing sessions - many with the producer / director in the room (or more recently, on Zoom or something). Audio quality can be massaged to some extent as long as the recordings are "decent" (yours can certainly be improved, but they're at least consistent). "Fer" instead of "for" or "Firmiliar" instead of "FAMiliar" are the ones where we look at each other and decide if a re-do (or an alternate VOA) is in order. [/EDIT]