AEC - Basics

Acoustic Echo Cancellation (AEC) prevents the far end participants in a conference call from hearing their own voices echo back to them. In a phone call or teleconference, there is a Near End and a Far End. The near end is your location and the far end is the location of the other participant in the call. In each location, there will be at least one mic and one loudspeaker.

When you talk, your microphone picks up your voice and it is transmitted to the far end where the loudspeaker allows the far end to hear you. When the far end talks, the far end mic and the near end loudspeaker allow you to hear what is being said in the far end.

If only one room’s microphone is turned on at any time, there is no problem. This is called half duplex communication and it is not a very satisfying experience. Each person’s microphone is muted when the other person is talking. That makes it impossible to hear the far end while you are talking and means that if you want to interject with a point, or ask for clarification, there is no way to communicate with the far end until they stop talking because your mic is muted while they are talking. For these reasons, half duplex conferencing is not acceptable in most cases

The problem occurs when both microphones are turned on at the same time. This is called full duplex conferencing. When the far end begins talking, the far end talker is picked up by the far end mic and sent to the near end loudspeaker. The near end loudspeaker audio is then picked up by the near end mic and sent back to the far end loudspeaker. This might not initially seem like a problem, but the round-trip latency of a phone call using analog lines is usually at least 80 to 100 milliseconds. VoIP calls will have even longer latencies and video conference latency can often be 1 second or more. This means that the far end talkers will hear their own voices being echoed back every time they speak and this echo makes it nearly impossible to communicate

An audio signal may be eliminated by mixing it with an inverted version of itself, so it should be possible to make the microphone ignore the sound coming out of the loudspeaker. We know exactly what the audio signal looks like when it is sent to the loudspeaker. However, that doesn’t exactly match the audio picked up by the microphone. The audio coming out of the loudspeaker is reflected multiple times by surfaces in the room and these reflections arrive at the microphone at differing times; each reflection has had different frequencies absorbed or blocked by the various surfaces and objects in the room. Each of these reflections now sounds very different from the original signal and each other.

To remove the loudspeaker audio from the microphone signal, the AEC algorithm first needs to figure out what the loudspeaker audio sounds like when it gets to the microphone. The AEC algorithm compares the microphone audio to the audio being sent to the loudspeaker to generate a room impulse response. This room impulse response then becomes the basis for the filter that is used to eliminate the loudspeaker audio from the microphone signal.