Programatic control of audio & video delay

I have a rather unusual situation where I am combining sources, some of which are Zoom callers who need a delay added to them to be in sync with the other inputs. However, because there will be lots of different callers with different latency, I need to adjust the delay every time we switch to a new caller.

So, I am hoping there is some way to programatically access the audio and video delay settings on a source, so that I can connect it to some external controller, for fast and easy adjustment.

Is this at all possible?

Are you trying to sync up music?

If so, this system won’t work as latency through Zoom is variable. You should use one of the dedicated systems for collaborative music making:

https://www.jamkazam.com/

I am at the early stages of testing right now, but I was assuming that there would be some stability in the latency of a single Zoom participant (e.g. in a break out room). Different latencies for different people, yes, but I just need to get 1 remote person in sync with me.

Are you suggesting that an individuals latency varies, or were you thinking that I need to sync up multiple people at once?

I will also take a look at your suggestions though, although the solution needs to be simple and frictionless (no extra signups, etc) for the end users.

I guess my question would be why you need better sync than what Zoom or mimoCall is offering by default?

And yes, latency of each person varies individually.

Let’s just assume for the moment that, yes, I do need to do this :grinning: and return to my original question — is it possible, as I can’t see anything in the web hook docs about this level of control?

The audio and video delay settings on a source are not exposed through the http API.

I would still love to hear the use case for this.

This use case is for a karaoke event where I can’t expect much more from the participants than being able to join a Zoom call.

I send music and video to a remote singer (wearing headphones), who sings back to me. i then mix their singing with the backing track that I sent them to produce a mix. So I split the music video feed, send one feed with no delay to the singer, and delay the other feed to match their singing.

Of course there would be a line-up process prior to each song, to establish the delay for that particular singer, which is where an easy to operate way of adjusting the delay would be so useful.

Sounds cool!

I found a couple of articles online:


In both cases, the solution is to uncouple the singing from the music and then synchronise the music at both ends. Of course, instead of Zoom, you could also use mimoCall and get better audio quality for the signing.