You've just touched on the secret behind "mic modeling", basically.
The problem with trying to
model one mic with another is that the published specs are never what you actually need. They pretty graphs in the mic catalogs were usually made with 1/3 octave (or broader) filters, which produce nice, smooth-looking graphs. The realities of the frequency response for a mic (especially the "colored" ones!) is a much different thing, if you use a narrower filter, with resolutions down to a few Hz.
Generally speaking, the "color" created by a mic consists of a number of very narrow spectral peaks, and maybe some general frequency response bumps. The general bumps show up in the prettified 1/3-octave graph- the narrow peaks don't. The mic modeling companies use a technique called "deconvolution" to build a much more accurate picture of a mic's actual impulse response, with resolution down to just a few hertz. They can then digitally build up a filter set that can mimic these very narrow resonances. The math is annoying for anyone other than a serious nerd, but the bottom line is that they are essentially deriving the Fourier transform (ooh, yuk!) of the mic's response, and modeling that: processing the signal in the frequency domain, and not the time domain. More yuk.
To attempt this with a standalone EQ, you would need a multiband parametric that can be tweeked to provide a number of extremely high-Q resonant peaks superimposed over a set of broader, much lower-Q general bumps and valleys. The problem is getting the data to start with, to know which knobs to twist: the mic companies sure won't give them to you, and that's the stock-in-trade of the modelers, so they won't either.
I'm hardly saying that it can't be done. However, it is difficult, if not outright impossible, to mimic the narrow resonances with a 1/3 octave EQ: the Q of each section is too low, so you simply can't achieve the narrow high-Q resonances that are the heart and soul of "color" in the mic sense, and you get unintended phase errors outside the filter passband that sound like dogmeat. It gets ugly. To really do it right, you need several (many!) sections of a very good parametric EQ, and this is best done in the digital domain, not in the analog domain: DSP was _designed_ for doing frequency-domain tweeks of exactly this nature.
With deconvolution-convolution techniques, theoretically you can take an SM-57, mic a source, deconvolve out the SM-57's many peculiarities (leaving a "perfect" representation of the uncolored input), and then convolve that "perfect" input with the impulse response of say
a Neumann U47- and get something that sounds like it was recorded with a U47.
Right. That's the theory, and that's what Antares and whoever else is selling. In practice, this falls a little short, because you can't ever completely work around the really gross characteristics of the input mic (like there not being any signal below 30Hz or above 15kHz to work with, or really huge phase response errors due to the mass of the diaphragm, or whatnot). Unsurprisingly, for modeling, the better the input mic, the better the output results... It's your call as to whether this works well enough for your application.
Anyway, it's not so much differences in how the mics are made as it is individual frequencey characteristics that are well beyond the resolution of the marketing-driven graphs you can get. The big-dollar vintage condensers will have 4 or 5 resonant peaks of +6-8dB (or more!) that are only a few hz wide, up in the 7-12kHz range. You and I hear it, and call it "air"... and marvel at the fact that two mics with "flat" 1/3-octave response curves can sound so different...
What makes a mic sound "good"? The two pints of gray soup between the ears of the listener! Your mileage may vary.
Hope that helps...