Audio with Video: How much temporal adjustment for distance is too much

  • Thread starter Thread starter Shadow_7
  • Start date Start date
Shadow_7

Shadow_7

New member
I do more event style stuff with video (or try to anyway). And I seem to be ultra sensitive to sync issues. I'm rarely closer than 5 yards from my visual reference with the camcorder and mics and even though I've been syncing the external audio to match the audio that the camcorder recorded to the T (nearest 1 to 5 sample(s) of 48000 samples per second). I find myself wondering about the distance and speed of sound. And whether I should compensate for that. I tend to think out loud and crunch numbers a lot (if you haven't noticed).

So 30 fps video at 48kHz audio comes out to about 1601.6 audio samples (of 48000 per second) per frame of video.

(30000/1001) = frame rate

(48000/(30000/1001)) = samples per frame (i.e. 1601.6)

1100 feet per second (+/- 100 feet per second) = speed of sound

So if I'm say 5 yards (15 feet) from the subject. And the mics are in the same location (relatively) as the camcorder. Should I also compensate for the 15/1100 time differential? aka speed of sound.

15/1100 = 0.0136.... of a second between source and mic.

0.01363636*48000 = 655 samples (rounded up).

655/48000 * 1000 = milliseconds (almost 14 ms after the visual reference)

Or 2/5th's of a frame (roughly). i.e. > 1/3rd of a frame. For some reason I seem sensitive to this. Even though in the real world we are used to hearing sounds while watching the source, delayed from the source. But in video, where you can zoom in up close, even though you're far away... I find it annoying to be that much out of sync. I don't want to be early on the audio, our brains yell and scream and can't compensate for that, but I can tell that it's noticeably off. And yet damn close. I can tell even when it's not a full frame (or more) off, and otherwise matches what the camcorder recorded.

Now I don't go around speeding up and slowing down the audio based on the distance to the source and the perceived zoom factor in the middle of a video. But should I or should I not be taking the comfortable average there of? I don't have or use (currently) a wide angle lens, so perspective wise the camcorder is already a few feet in front of it's own mics. Even with the zoom all the way out.

So, should I compensate, or just watch more kung fu theater? Or are there other things in play that is throwing my perceptions off? Like monitor latency (LCD)? Should I also compensate for that? It's most noticeable when viewing 60p video of something with drums in it. And my perceptions seem to differ based on how rested (or not) I am. Even when the source material hasn't changed. Or maybe my camcorder is a bit off / odd. I always seem to have at least one extra video frame relative to the duration of the audio track.

I'm am by no means a full frame out of sync. And I'm probably over thinking it. I just have the one pair of mics so I don't have an on stage or lapel mic to sync with. But if my mics were in that other place, and my camcorder had audio inputs, I would imagine that this temporal adjustment would be compensated for with the speed of electrons / light. I'm just having to take a more more manual approach. Or not.

I record audio externally, so I have the content to pick and choose from and even add to the tail end to match the number of frames. And I'm already lining up and extracting that audio, so outside of a little math, it's not really more work per say for my current flow.
 
You're overthinking it.

You can't get audio in closer sync to video than within one frame. That's a 25th of a second in PAL/50 Hz countries and a 30th of a second in NTSC/60 Hz countries. It takes that long to build a frame so working down to individual samples is pointless.

Most people can't detect video as out of sync in less than 2 frames--the best might detect one frame out.

Now, audio travels at roughly 1100 feet per second. When setting up delays for live events, I use 1ms per foot distance for a "back of the envelope" approximation. Since one frame of video in NTSC is 33.3ms, the equates to a bit over 30 feet distance being one frame of delay, 60+ feet being 2 frames delay.

Usually, if your 30 feet or more away, you can't see enough detail to be worried about exact sync anyway but, if you have to (a long telephoto shot or similar) just slip the sound by one frame/two fields for every 30 feet of distance. Any more accuracy is wasted because of the duration of each individual frame.

...and, whatever you do, don't stare at it trying to detect sync problems. The brain plays tricks and, as soon as you're looking for problems, you'll see them.

Hope this helps,

Bob
(Who worked in broadcast TV sound for 35+ years prior to retirement.)
 
I was thinking the 1ft/ms thing also.... one frame = 30ft is a cool rule of thumb to have. Thanks Bob!!

One other aspect that you might consider is realtime rendering vs. offline rendering. You didn't mention which program you're editing with, or if you're even digital (I'm assuming you are). If you're watching a realtime render, like in Vegas, you might see a sync issue that isn't there when previewing the realtime render. This would be due to your cpu trying to chug down everything in realtime; edits, effects, etc. You said you're recording externally, so I'm guessing you're importing audio. Be sure your waverforms are lined up and let it go and when you go to render offline, the audio should be synced up.

That's been my experience, fwiw, ymmv, etc...
 
30' is only 10 yards though. I can detect it, but I'm looking for it. And for music things, particularly drums. If you're looking for it, it's very evident. 32nd notes at 120bpm aka 64 beats a second. Just over two notes a frame at 30fps. Kind of obvious to me that the drummer stopped playing two notes ago (one frame). More so at 60p where you can actually see most of the played notes.

Basically at 48kHz roughly

3.64 samples per inch
43.64 samples per foot
130.9 samples per yard
1,300.9 samples per ten yards
At 20 yards you're well over a frame off in terms of sync. 2602-ish samples. Or 1.6 frames.
At 73-ish feet, you're two frames off.
At about 110 feet (36.6 yards) you're three frames off. That seems noticeable to me. Certainly enough to warrant an adjustment if you're at full zoom. For those of us who record in stadium-esque venues from afar. Assuming something musical where the prolonged nature of the sync-ness will gnaw at you.
 
Wow dude - this is a very interesting thread - and i think you are a little crazy - but that is a good thing because creative people that create the best work are always a little crazy :)

Is this all musical instruments that you are recording? For music videos? You mention the drum which is why I wonder if its all instruments.

I am guessing you are recording with a mixer and mics and syncing that up to the video? Are you using the video internal audio to sync the mix to? It seems like the distance factor would create different latency in the vid mic and the mix -

I think it would be an easier solution to try and use some creative editing than to try and sync every single aspect of the performance -( unless you are creating a technical instructional video for the instrument )- for entertainment I would think a lot of editing would more than make up for it - especially since most viewers are not going to be thinking about these things :)
 
I'm thinking that any syncing problem would go unnoticed by all but a tiny number of your viewing audience. It is a degree of perfection that might ultimately satisfy you, but has no appreciable impact.

So rather than worry about it, I would concentrate on the artistic aspects of the video.
 
In my case, the camcorder has no audio input. And the mics are at roughly the same location as the camcorder. Audio goes into a field recorder that I edit in post. I mean really all I'm doing is finding a sync point and trimming the long audio segment to match the shorter video segment. I only have one camcorder, so nothing too exotic just yet. I'm having to find and edit the sync point anyway, so crazy or not, I'm just adjusting the offset a smidge. Which I'm adjusting anyway to line up the field recorder audio to the camcorder audio. It's mostly scripted to speed up the field recorder audio (1.000117788) to match the camcorders speed and other re-sampling tricks I've learned. So all that is pretty much a done deal and mostly automated. I just need to extract the audio segment and sometimes the lag (physics) is just too much to meet my goals.

It is mostly music, but aside from a few speakers (pastor) who appear to have side gigs as a ventriloquist. It gets weird to see the lips move a noticeable duration before hearing the audio. I do some marching band like stuff, so the distance and therefor the lag can be quite dramatic. Not quite 300 yards away most of the time, but sometimes it can come off that way. If my camcorder had audio inputs and I did audio on a boom arm, this lag would already be mostly compensated for by the location of the mic. But my mics are in the same location as the camcorder. And otherwise out in the audience in the middle to rear of the venue.
 
A couple of things:

First, the reason I say getting down to the sample is over-thinking it is that your video isn't "continuous" like human vision. It's a series of freeze frames with a 1/30th of a second duration. For this reason, corrections more accurate than one frame (33ms) are meaningless.

Second, it concerns me that you're running the field recorder faster to compensate. If all was working properly, the delay between the picture and sound would be consistent as long as the distance to the subject remains the same. If both camera and sound recorder were running at a perfectly accurate speed all you should need to do is find a sync point and lock them together. The fact that you have to run the sound recorder slightly fast indicates that there is some drift in the system--and you'll be very lucky if this is consistent. I suspect you're compensating more for inaccuracies in the speed of the camcorder, field recorder or both rather than the distance.

In the professional world, there are specific ways to ensure that the camera and sound recorders work to perfect sync--"crystal sync", time code, pilot tones, etc. but these are more difficult to emulate with consumer level gear.

Bob
 
Well, there is no world clock on my gear. Or ADAT, or whatever the current option(s) are. Kind of hard to call them standards. We'd have to call 8-tracks standard, and from a certain POV, I guess they "WERE".

While the video is not continuous or "complete". I do record in 60p, so that's double the information, or 1/2 the "stutter" of typical video. Granted that I have to render out to 30p for most deliverables, if only to save on the file sizes. But I do have some say in which 30p (odd or even) to choose from.

Most of my syncing is more peace of mind. And repeatability in editing / experimentation. If it's out of sync, even mildly, that's severely impacts the repeatability of doing the same thing over and over (with different settings). The speed adjustment is to limit the drift over time between two independent devices. Audio is captured at 24/192 and re-sampled to 48kHz for the deliverable, so there is some flex in there IMO. It's still not an ideal edit, but a possible one without much in terms of noticeable degradation. That shift of sorts is less than one second per HOUR of content. Close to 1/5th of a second per hour to be more thorough. Or 6 frames at 30p. Noticeable IMO, for say a trumpet player to have his horn "down" and yet still be playing at the end of an hour concert. A movement which can be done in 6 frames or less.

My edit for now is something like this on my current project.

= 00:14:01.26025
(the offset in the audio that matches the 00:00:00.00000 of the video audio)

= 00:14:45.26025
(adding 44 seconds because that's where we are extracting the video from / start point)

+ 0.00290 (alignment)
(Frames rarely align with full seconds, so there are samples there to the next nearest frame after a full second)

+ 0.00600 (distance)
(approximate shift based on distance from the camcorder and the speed of sound)

- 0.00288 (LCD latency of 6ms)
(my gear lies to me, often, so at least help it look like it's telling the truth)
(chances are good that someone else / everyone else has the same or similar gear)

where .##### is the number of samples (/48000 for the fraction of a second).

So.....

What WAS an offset of 00:14:45.26025
Will likely be extracted from offset 00:14:45.26627

It's just 602 samples (best guess), but that's 602/1601.6 of a full frame. In proximity of 40% of the way to the next frame. Significant IMO. More so if a CRT is used without the 6ms latency factor. i.e. past half way. Or a full frame off at 60p. Yeah, most ordinary folks wont be looking for it and "MIGHT" not notice it. But there is a certain pop factor when you nail it IMO. Like being in tune, as opposed to NOT in tune.
 
did you stop to think that at 30' away
there is naturally a 30ms delay in sound (1ft/ms)
and you are >removing< the reality from the film?
 
You may also want to consider the distance the viewer will be sitting from the screen they are watching.

If the are sitting at a computer monitor it should not be a big deal, but if they are in their living room and the couch is say 9ft (about 3 meters?) from the television and speakers then there is going to be a slight delay, about equal to your video delay - but compounded this will effectively double the latency to 60ms.

- you may need to have the audio actually be ahead of the video for those people. But conversely if they have placed their speakers behind the couch then there will already be a slight inversion (hardly noticeable as the speed of light is so great, UNLESS producers are already syncing for average viewing distance in which case the viewers would actually hear the sound before they saw the source)- hopefully these people have adjusted their audio systems to compensate for that already.

Just something to think about...
 
did you stop to think that at 30' away
there is naturally a 30ms delay in sound (1ft/ms)
and you are >removing< the reality from the film?

Except that when you Zoom, you don't "look" 30' away anymore. And in a noisy environment where you might be inclined to read the lips, it helps if the audio is mostly there and in near perfect sync. Enough that the full head shot at full zoom and the delay are at odds to each other IMO. The reality factor is that I have to sync to the closest (on video) sound source (with a visual cue that is sync-able). Most time that's the guy on the stage, not the circus of an audience within 10'+ of me. Like I said, I'm not really doing anything that wouldn't already be done if the mic was on a boom over the subject and my camcorder had audio input. i.e. What Hollyweird has already trained us to expect.

- 200 samples (wind)

Oh well. It looks like I'll have to invest in a small weather station or something.
 
The thing is, working to the sample is overkill when the video frames are in 33ms (or 66ms for progressive scan) chunks. Just round off your correction to the closest frame.

Of course, you'd improve your product greatly if you ran mics up by the pulpit/stage anyway...or tied into a PA mixer or used radio microphones. Your sound recorded from the distances you're talking must be pretty poor anyway. If your mic is really 60 or 100 feet away from the sound source, lip sync is the least of your issues.

...and, the drift in your external recorder worries me far more than the delay due to distance.

Bob
 
Why would that drift worry you? It's expected, even normal. We are talking about a pro field recorder. And a consumer camcorder. With NO syncing mechanisms. And the drift in question is less than a second over the course of an hour. But it's enough to make clips longer than 10 minutes with say drums stand out as being out of sync. Even if they started in sync. Unless you compensate for said drift.

The distance is not normally an issue (much). Given that the sources I record can/do exceed 100dB at distance. In proximity you'd have all sorts of timing issues (100+ members), that is otherwise solved by being further away. Baring a few odd gigs. I call it a three to one ratio, not to be confused with the more widely used 3 to 1 ratio to get mono-ish tracks from multiple sources in proximity. My version is being 3 times further away from the group, than the group is wide (or stage is wide).

Of course I could just be compensating for other quirks with the camcorder. Or the editing software and it's results. I'd really need to run some tests. To isolate what percentage is distance (physics) and what percentage might be a software quirk that drops or adds a few 100 samples on the ends of the tracks. I already know that my audio track on camcorder comes up over a frame short in a lot of cases. Which appears to be mostly missing from the back end of the clips. But it's possible that audio starts a few samples early (or late), since it ends a significant number of sample early. But just guessing without some tests to measure multiple aspects. i.e. football field (with accurate lines), camcorder over one line. And various clicks at each yard line to allow a sync reference that can be used to calculate the true speed of sound (relative to the clocks of the device) and any oddities at the initiation of the recording.
 
The reason I say the drift worries me is that it's just that, drift rather than an offset.

The distance thing is easy--if you're 60 feet away from the sound source you just move the sound once by 60ms and leave it. If there's a 2 field coding delay in the camera, you can fix that once.

However, drift is something you have to be constantly playing with and, barring something easy to judge by, you have to be constantly playing. With lip sync, it's very easy to start "seeing things". It's the one thing on you list of considerations that is difficult to fix properly.

(OT aside...I used to frequently work on pieces by a TV reporter we nicknamed "liver lips". There was something about his mouth that he LOOKED out of sync even when we knew it was absolutely perfect as regulated by pro level gear with timecode etc.)

Anyway, I firmly believe that any calculations down to sample level is bashing your head against a brick wall, especially with variables like a drifting recorder, digital coding delays, LCD lag, sample rate conversions etc. etc. Just work to the nearest frame since that's the smallest meaningful unit of video.

Anyhow, good luck with your efforts.

Bob
 
By drift I just mean that the clocks on the different devices do not run at exactly the same speed. Baring any drastic changes in temperatures there shouldn't be too much drift of that other nature while recording. Just different lengths of results at the end of the day. Until you speed up one to match the other in length for a given content. We're not talking reel to reel tape machines that can and do drift depending on friction and other factors. Or the lack of friction due to moisture issues. None of that in play here.

Most times my sync points are percussive instruments with a definite sync point. Syncing voice is by far more challenging. Applause and other things can get me to line up the audio to the video as recorded on the camcorder. And it's just a minor if any adjustment from there. And there's probably lots of folks like senor liver lips. But that's not my normal genre. Just trying to avoid drum hits when the drummers sticks are in the rebound / away position. Or applause when the hands are near the shoulders, and not contacting each other. Due to simple physics of a given location, not really much to do with the recording chain.
 
Back
Top