Twilio Conference uses a jitter buffer to smooth out irregularity in media packet arrival times when mixing audio for conference participants. This buffer results in fewer audio artifacts, but introduces a fixed delay ( aka. Latency ) for the audio of each participant.
If a participant suffers from extremely high jitter ( commonly seen using applications or browsers on networks experiencing poor conditions ), the jitter buffer may swell to compensate, causing their media to be delayed. Once the jitter buffer has grown, it will not shrink - even if the jitter is eliminated on the media stream. At sizes greater than ~250ms the jitter buffer can be perceived by the participants as audio latency.
Customers have raised questions where they have viewed Voice Insights and can see Carrier related poor metrics indicating a bad network on an otherwise clean call leg i.e. no packet loss or jitter showing on the PSTN leg .
If you view the specifics of these metrics the culprit for the bad network tag is Latency.
The reason for this Latency is the Conference Jitter Buffer. The conference engine is doing its best to play out packets from all legs of the conference in an orderly fashion. When Jitter increases on the leg experiencing poor network conditions , this causes internal latency due to the Conference adapting to the inbound Jitter. This also causes the Internal Latency on the PSTN calls to increase.
Please see this Blog post https://www.twilio.com/en-us/blog/improve-call-experience-new-twilio-conference-jitter-buffer-controls that discusses Twilio's Conference jitter buffer .