Voice activation

  • Thread starter Thread starter IanKay
  • Start date Start date
I

IanKay

New member
Hello :)

I was wondering if someone could help me. I am developing some software that is voice activated. I want to be able to setup a mic which will be able to hear me anywhere in a room. (typical house sized rooms)

My problem is I know nothing about mics.. I'm guessing it would need some sort of background noise cancellation? but what sort of attributes should I be looking for? I have seen things like "frequency range" and "output impedance" however I do not know what these means or what values would be good/bad for what I'm trying to accomplish.

If anyone could offer some advice I would be very grateful!

Thanks :)
 
You're going to run into some problems if you want 'background noise cancellation' and for it to pick you up anywhere in the room. They're kinda contradictory!

If I wanted something to be able to hear me talk anywhere in the room, I would set up an omni, then compress the signal fairly heavily to compensate for variations in level due to the volume of the voice and the distance from the mic. However this setup would pick up a lot of room noise, which would be further accentuated by the compression.

It depends on how tolerant the software you're developing is and how much room noise you can afford to have in the signal being inputted into it. I've used the first method with a really heavy compressor as a talkback mic before (allowing you to be anywhere in the room and whisper, shout, etc) and it works great with a human listening at the other end, but it might not be suitable for your situation. In fact that's how the big drum sound on In The Air Tonight (Phil Collins) was achieved, but that's drifting off topic a bit (quite a good story though)!

To reduce the amount of 'room noise' you get you would need to do two things...
a) Use a mic with a tight polar pattern (e.g. cardioid or hyper-cardioid) and talk on-axis
b) Move closer to the mic

...however both these things prevent you from being anywhere in the room.


The only other solution I can think of that would allow you to move around the room and have very little room noise would be a wireless lavalier mic. That could get costly though.
 
You don't want compression as that will make it more difficult to discriminate between room noise and voice. You are looking either for a peak level that would be triggered by your voice (presuming your voice would always be louder than ambient noise), or you would need to incorporate frequency discrimination in your DSP routine. That could be as simple as a high-pass filter since much ambient noise is low-frequency, or if you need to discriminate between sliding a chair across the floor and your voice then you would need a fairly sophisticated routine to look for the sonic signature of your voice. A couple of characteristics common to voices and not common to random noises are a harmonic series and a characteristic envelope--noises from dropping things are sudden; voices are sustained. You may need to combine many of those techniques--for example, perform a FFT on a window, ignore frequencies below 100Hz (and probably above 5kHz, although I doubt hey are significant), accumulate the highest frequency peaks and compare to the next couple of subsequent windows. If the peak frequencies are still within a reasonable tolerance of the first detected peak, then it's likely the source is a voice (or an instrument!) and not an ambient noise. Although if a sliding chair has a resonance, all bets are off. A final possibility is to train the DSP for the user's voice, looking for their specific overtone series. That should limit false positives from other resonant sources.

As you can see, this is mostly a DSP problem and not so much a microphone problem.
 
Maybe use the compressor "backwards" as an expander along with a noise gate to make only sounds over a certain threshold pop through.

I though computers were supposed to make all of this easy?
 
Expanders still don't help. Either the peak signal is voice or it isn't. If it is voice, then you don't need any processing to detect it, just a simple peak indicator. If it is not necessarily voice, making it louder or softer doesn't help.

Computers do make this very easy insofar as you don't need any analog processing (compressors/expanders are relatively expensive circuits to execute in analogland). But you do need someone who knows how to write DSP routines.
 
Back
Top