To me, the easiest way to think of an A/D converter is to imagine a wavelength travelling from left to right along a horizontal axis: ~~~~~~~
An A to D converter essentially maps the position of the wave in timeslices (left to right positions) and amplitude (distance from the mid-line). These positions also correspond to changes in air pressure over time, which is what sound is.
The mapping can get more and more accurate if you make the timeslices thinner and if you make the measurements from the mid-line finer and finer. The timeslices is the sampling rate (e.g., 44.1KHz represents 44,100 slices of time in a single second) and the measurements from the mid-line is the bit-rate (16 bit, 24 bit, etc...).
Thus the A/D device has to first reflect a highly accurate picture of the analog wave. Then, the mapping to digital data is fairly straightforward, although some devices do interesting mathematical modelling in the process of rounding the data when it falls between slices or bit points. The biggest issue though, in my mind, is the representation of the signal to the digitizing process. That's where Lucid, Apogee and others stand apart from the mutt converters.
EEs, you can rightly take me to task on my informal knowledge of electronics. However, I'll think you'll agree that the basic message is sound.