In telephony, an echo is the distracting effect when you hear your own voice back after more than one millisecond. The human ear cannot tell the difference between echo and the original sound if the delay is less than 1/15 of a second. Both volume and the length of delay affect how annoying an echo can be. Interestingly, PSTN and VoIP call echo is the same volume. A PSTN-to-PSTN phone call echo is loud but not delayed, therefore, less noticeable. A VoIP echo can be both delayed and loud, and therefore noticeable. The time it takes to decode a VoIP signal for transmission on the PSTN is long enough to make a noticeable echo, which is why VoIP hardware and codecs are designed with echo cancellation.

Echo is an inherent part of PSTN telephony. At the user end, the caller talks on a two-wire loop, and the signal is sent to a trunk that carries the signal over a four-wire loop for transmission on the network. The device that converts the two-wire signal to a four-wire signal is called a hybrid interface, and is the main culprit of PSTN echo. Early echo cancellation devices were designed in the 1960s as hardware devices, and modern echo cancellation in the PSTN is digital signal processing built into telephone switches.

VoIP echo is a result of the time it takes to decompress a signal for transmission on the PSTN. The VoIP codec is taking samples of your voice every millisecond, and sending those samples along both the internet and the phone lines. The data itself can also be subject to delays as it travels along the VoIP network, which causes echo as well. The same principal also applies the other way, as the sound being received is being compressed and routed along the VoIP network. As you can imagine, the risk of echo is most severe when a hybrid interface and a VoIP gateway both cause a delay, which will register as echo in the listener’s ears.

A common echo cancellation algorithm is to generate multiple copies of the received signal, and each copy is slightly delayed. The algorithm recognizes when the actual delay syncs up with the self-generated delay, and lowers the volume. The difference between different algorithms, and subsequently different VoIP codecs, is how well the copy is recognized, and how much the volume is lowered. In fact, you can have too much echo cancellation and that will result in choppy sound or even an echo of its own!

Echo cancellation is built into every VoIP codec. Echo cancellation, when carefully calibrated, even makes PSTN calls sound better on VoIP phones. New codecs, like every aspect of computing, are being developed all the time, and different VoIP providers adopt new codecs as technology evolves.

Additional Reading

Acceptable Jitter & Latency for VoIP: Everything You Need to Know
The Biggest Causes Behind Echo in VoIP