The thing that is hardest to wrap your head around is the part about the signal being band limited. The fact that any 'detail' that would exist between the samples would be at a frequency above nyquist (and above our ability to hear) rules out all of the other, seemingly possible, paths.
For example: at 44.1k sample rate, if the original waveform had a squiggle in it that fell between two samples, the frequency of that squiggle would have to be higher than the sample rate, which is twice as high as the nyquist frequency, and higher than anyone claims to be able to hear. So that squiggle really couldn't exist and would be filtered out in the conversion process, so it could not possibly be part of the reconstruction. That's why, even though the possibilities seem limitless, there really is only one possible waveform that intersects those points.