Newbie interested in digital recording of voice on microcontroller and PC

richard7882

New member
Hi. I've been trying to find a forum that specifically focuses on the recording of voice using a microcontroller or PC. I would like to know how a PC or a microcontroller records and plays say a WAV file. Anyhow I came across Homerecording.com and joined to see if I might find someone who has the knowledge I am looking for. I am also trying to find a good book that would help me understand how WAV files are created and played back - I mean in some technical detail, but not too technical. Regards. Rich

P.S. As an example of what I'm trying to learn: In the WAV file there will be a start of the audio data (PCM) at some memory address. How in basic terms does a program know where the start and end of this data is, in memory?
 
Last edited:
I suspect very few, if any of us know the technical information you seek. It's not something we would need to know - I guess in the same way a concert pianist would not know the exact string length for Bb1, or know the chemical composition of the plating on the string. It physically fits, and produces the right tone?

I looked up the answer you seek (I think) I suspect us musicians need to know none of this.

Microsoft WAVE soundfile format

Ignoring the graphics, this is what it says.
The canonical WAVE format starts with the RIFF header:

0 4 ChunkID Contains the letters "RIFF" in ASCII form
(0x52494646 big-endian form).
4 4 ChunkSize 36 + SubChunk2Size, or more precisely:
4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)
This is the size of the rest of the chunk
following this number. This is the size of the
entire file in bytes minus 8 bytes for the
two fields not included in this count:
ChunkID and ChunkSize.
8 4 Format Contains the letters "WAVE"
(0x57415645 big-endian form).

The "WAVE" format consists of two subchunks: "fmt " and "data":
The "fmt " subchunk describes the sound data's format:

12 4 Subchunk1ID Contains the letters "fmt "
(0x666d7420 big-endian form).
16 4 Subchunk1Size 16 for PCM. This is the size of the
rest of the Subchunk which follows this number.
20 2 AudioFormat PCM = 1 (i.e. Linear quantization)
Values other than 1 indicate some
form of compression.
22 2 NumChannels Mono = 1, Stereo = 2, etc.
24 4 SampleRate 8000, 44100, etc.
28 4 ByteRate == SampleRate * NumChannels * BitsPerSample/8
32 2 BlockAlign == NumChannels * BitsPerSample/8
The number of bytes for one sample including
all channels. I wonder what happens when
this number isn't an integer?
34 2 BitsPerSample 8 bits = 8, 16 bits = 16, etc.
2 ExtraParamSize if PCM, then doesn't exist
X ExtraParams space for extra parameters

The "data" subchunk contains the size of the data and the actual sound:

36 4 Subchunk2ID Contains the letters "data"
(0x64617461 big-endian form).
40 4 Subchunk2Size == NumSamples * NumChannels * BitsPerSample/8
This is the number of bytes in the data.
You can also think of this as the size
of the read of the subchunk following this
number.
44 * Data The actual sound data.

Ww may not be the right forum for your question really - perhaps you already know this information and it doesn't answer it for you? Sorry - but we're committed users of data, and making sound come out is challenge enough sometimes!
 
Hi. Yes, I've come across the RIFF specs. for a WAV file. I even dowloaded a RIFF viewer a few hours ago. I do think it is a bit of a long-shot that the information I am looking for will turn up here, but you never know. :-) I might make some progress by joining a forum of an IC manufacturer who produces voice recorder chips. Most of these chips seem to be made in China. Not sure if the West makes voice recorder chips. :-) Also, some microcontroller manufacturers offer application notes on voice recording. Rich
 
I shall (of course!) direct you to Sound On Sound | The World's Premier Music Recording Technology Magazine

There are people there that I am sure know everything there is to know about computer sound recording down to the nth digit!

Hugh Robjohns is technical editor and BBC trained. Then Pete Kain is of Scan Computers fame and gives unstintingly of his no doubt very valuable time. There is a contributor "KAFTAT" who has "named and shamed" many an AI mnfctr because he does PROPER tests on round trip latency and pulls no punches. I suspect he has a very deep understanding of the whole process.

I on the other hand just switch the 'kker on and try to remember how I did "that" last week. You can ask me pretty much anything about analogue valves mind!

Dave.
 
The piece it seems you're trying to understand is the connection between the hardware, analog input/output and the piece that manipulates it in the "computer" and that's a combination of the chip/ASIC (could be more than one but almost always one these days) that's doing the conversion from analog to digital and back and the resources that are managed by the computer operating system, which involves the driver that manages the movement of data to/from computer system memory and the AD/DA bit, in coordination with the operating system. The application part will work with the file system and driver to move data between memory and file storage in the proper order so the audio file is correct, and probably tell the driver how to either process the data or flip bits in the ASIC to do that (my guess is things like 16/24-bit settings are managed in the chip hardware or firmware). Things like WAV vs any other PCM format are probably software side since I don't think there's much difference, while conversion to/from compressed formats would be in the application side most likely.

This is simplified and ignores the discussion of how the audio driver knows which bus the chip is on and manages using (e.g.) USB protocols to address and move data to/from the audio chip. Not having written any audio software, but other system software apps and drivers, I have no doubt made a glaring error or two, and, of course, what/how you write these pieces will depend on the hardware and operating system you're targeting.

I'm sure there's open source media player application and driver software that probably has all you'd need, once you had the specific spec of whatever chip it is you're looking at. At least, if you're targeting one of the major OSs that run on x86. If you're into a microprocessor programming, you'll have a bit more work ahead.
 
I wonder if anyone can answer this question: Is the start location of the PCM data in a WAV file, recorded in the meta data of the file, or, is it ascertained by complex methods, by the application in conjunction with the OS? I hope this question makes sense. :-) I guess, the basic answer is either yes or no to the first question. Rich

P.S. I think the answer is no, the start of PCM is not recorded in the meta data of the file. That might work if the memory address or otherwise location address for the start remained constant.
 
Last edited:
I wonder if anyone can answer this question: Is the start location of the PCM data in a WAV file, recorded in the meta data of the file, or, is it ascertained by complex methods, by the application in conjunction with the OS? I hope this question makes sense. :-) I guess, the basic answer is either yes or no to the first question. Rich

P.S. I think the answer is no, the start of PCM is not recorded in the meta data of the file. That might work if the memory address or otherwise location address for the start remained constant.
Just a cursory glance of the post from [MENTION=178786]rob aylestone[/MENTION] suggests the PCM data is a constant offset from the start of the file. The file/WAV header tells how much data there is from that point.

But, note that the format of the data is also dependent on other information in that header, so you have to decode that to know how to "program" the chip in order to have the data converted back to analog properly.
 
Richard, somewhere I have an AES journal with the original article detailing the Japanese development of PCM digital sound recording (to video tape).

If I can find it I shall post it to you if you want it.

Dave.
 
Most of these chips seem to be made in China. Not sure if the West makes voice recorder chips. :-) Also, some microcontroller manufacturers offer application notes on voice recording. Rich

I know Cirrus Logic is a fabless chipmaker. They design their products and contract out to foundries like TSMC in Taiwan. Analog Devices and Texas Instruments are made in the US. For the answers you seek, you should be looking for white papers published by these companies. The wav protocol was developed by Microsoft, but I think you already know that. Justin Frankel would be a good resource. Can you find any documentation from him?

Ask him directly....

Ask Justin Frankel
 
Well, I just asked Mr Frankel. I would post the link but not allowed to post links yet. The link is contained in the above post where it says "Ask Justin Frankel". I figure that if I can ascertain whether the header plays any part in an application knowing the starting address (which will be a memory addrtess or a HDD address) of the PCM data that resides in the WAV file, that is suffient progress at this point.
 
Last edited:
Just a cursory glance of the post from [MENTION=178786]rob aylestone[/MENTION] suggests the PCM data is a constant offset from the start of the file. The file/WAV header tells how much data there is from that point.

Yes, the header is 44 bytes in length. Apparantly then, the start of the PCM data in the WAV file, which is the actual audio, is 44 bytes away from the start of the file. I've not read anything yet though that says the header is used to let an application know the HDD address or memory address of the start of the PCM.
 
So, the application typically does not know, or at least care about, the actual CHS of the file's location on disk. It lets the operating system's file management part worry about that. From the file management system, it would open the file for reading or writing, then (e.g.) if reading, use information from the file system (i.e., about the file's size) to transfer the file into memory. A large file would be read in pieces most likely, and a file write would almost always be done in chunks, especially on an operation where the file length is not known beforehand.

The [logical] memory address of the data would be managed by the program, because it would obtain a block of memory (from the operating system, either dynamically or as part of the program's initial static storage requirements) for whatever purpose it needed, and have an address "in hand" so that it would have to keep track of where in that block it was obtaining bytes from, e.g., to inspect a WAV file header, or to pass a memory location and size to the audio system/driver for transfer to the audio device or file system to fill (a read) send to a storage device (write).

Metadata contained as part of any file's content would almost never contain any information about its location in memory or on the file system, though it typically will have data about the data, i.e., metadata, and that may include an offset to the start of other data, e.g., if the header is not fixed in size, or the data itself has content in sections that can vary in length.

This is turning more into a discussion about application programming than audio per se.
 
Last edited:
The wav file is going to start at any address that the operating system places it in RAM when it pulls the file off the hard drive/internet/etc. The header is going to start at address 0000 of the file. Like you said, the header is 44 bytes long, then the actual data. Does the operating system, place another header in front of the wav file header? Maybe so for indexing purposes or file management. That would add more length to the header.


Agreed with Keith; your questions are probably beyond the scope of this forum. Though we might have programmers among the membership, we are more about the actual recording of music rather than the mechanics behind the audio files. I hope you find your answers and if you do, report back here. We are a curious bunch and would gain from the background knowledge.
 
In answer to the question "Does the header in a WAV file (also a RIFF file) play any part in allowing an audio application to know the starting address of the PCM data that resides in the file?" Justin Frankel said "Yes" and offered this link: Microsoft WAVE soundfile format

When one considers the matter, one thinks one might be able to substitute "audio application" with "Operating System". I guess.

Just one more thing: When I started thinking about the issue, the context was a microcomputer and a WAV file stored in PROM. This represents a situation where a WAV file is read directly from memory, which memory is not writeable to - as it's PROM. And memory addresses are static. Which may represent a less complicated situation.

But, anyway, I'm happy to consider I've got some help here. I feel I'm on the road, as it were. Rich
 
Last edited:
Cool. Glad to hear you have your project moving forward. :) Also glad to hear someone like Justin Frankel is approachable for Q's and A's.

Best wishes.
 
...
When one considers the matter, one thinks one might be able to substitute "audio application" with "Operating System". I guess.
...
in almost any sense, no, those are not equivalent. Now, if all you want to do is create a simple device that renders the same audio output and does nothing else, you *could* extend the operating system to perform that task on receiving the specific interrupt, e.g. from a button press, motion sensor, etc al. But you are describing something more like an embedded system vs an application.

There are good reasons why application code is kept out of the operating system. It is very difficult to debug your application if a fault in it causes the system to halt.
 
Hi. What I was actually meaning is that in the sentence that I wrote to Justrin Frankel I could have said "Operating System" rather than "applicaion". Because I was thinking perhaps it might be more the case that it's the Operating System knowing where the start of the PCM data is, rather than the application.
 
Last edited:
Hi. What I was actually meaning is that in the sentance that I wrote to Justrin Frankel I could have said "Operating System" rather than "applicaion". Because I was thinking perhaps it might be more the case that it's the Operating System knowing where the start of the PCM data is, rather than the application.
And I can assure you that's not the case on something like Windows, OS X, Linux, whatever.

So, the operating system is a kind of application, using the latter term in the broadest sense, but (simplistic description) its purpose is the running other applications and managing attached/external internal resources. Except for the case of application loading, where the organization of data vs. instructions within the executable object are of interest, the operating system doesn't care what's in a file or block of memory, only "who" owns it, who can access it, and at what privilege, whether it's readable, writable, etc. I.e., your statement is wrong.

I'm going to suggest you take a course in computer architecture, as well as application programming.
 
Please note I said: "Because I was thinking perhaps it might be more the case that..." I don't want words putting in my mouth. I suggest you take note of that. I wasn't trying to say this or that IS the case. Obvious I think to most. How could I have been so definitive since I don't know much about the subject? I couldn't.
 
Last edited:
Back
Top