I've not done this but here's a few logical points that might apply or to watch for.
If they were both digi' recorders likely you won't see huge time drifts between them, but if there are you could split in sections perhaps to slide one or the other. Likely it would be the purer direct path parts from the naturally delayed audience recorder where audible combing or whatever would show up. And here the same rules combining all things offset; Short offsets shows up in the 'top end' first, 50-50 mix gives the deepest aberrations, and the diffuse stuff doesn't 'comb cause it's random.