AEC – A Complete Guide to Reference

Applies to:

ControlSpace EX-1280C conferencing sound processor

Correct routing and processing of the Acoustic Echo Cancellation (AEC) reference is critical for preventing echo in conference rooms. It is also one of the most challenging aspects of conference room design.

The Conference Room Router (CRR) goes a long way to ensure the AEC reference is done correctly. However, there are often questions about how processing outside the CRR affects the AEC reference, and there may be designs that use AEC without using the CRR.

Below are some principles of the use of the AEC reference in good conference room designs and some advice about common applications.

The Reference Must Contain the Correct Signals

In general, the AEC reference should receive a mix of all the far-end and program audio that will be played through the loudspeakers. Furthermore, all the loudspeakers in the room should play that same mix.

Signals Missing from the Reference

If a signal is not in the AEC reference, the AEC will not cancel it. If a matrix cross-point between a far-end input and the AEC reference is mistakenly muted, echo will be heard from that far-end, but the problem will not be noticeable in the local room. This is a common pitfall that the CRR prevents.

If program audio is missing from the reference, the far-end may hear a muddy or reverberant version of the program audio (including the direct mix of program audio sent to the far-end, along with what is picked up locally by the mics). This is less severe than echo from the far-end, but still not ideal. In most cases, it is preferable to include program audio in the AEC reference. One exception to this is in the case of positional audio, described below.

Extra Signals in the Reference

If a signal is sent to the AEC reference, but not played out of the loudspeaker, the AEC could diverge whenever the signal is active.When the signal is active, there will be no corresponding audio detected by the microphone. The AEC will converge to a signal path with no echo in it and must re-converge when a far-end signal (that belongs in the AEC reference) becomes active.

Voice-Lift

Some designers put microphone signals in their own AEC references to try to improve gain-before-feedback in voice-lift systems. The benefit of doing this is questionable, but could work well in some rooms. In other cases, the feedback reduction behavior of the AEC could be inconsistent, or residual echo could be audible by the far-end during double-talk.

In most cases, it is preferable to use pre-AEC microphone signals for voice-lift and to leave the microphones out of the AEC references. This provides the lowest latency for local reinforcement and will cause no problems with the AEC.

Stereo

Stereo signals in a mono AEC reference are theoretically a problem. If the stereo signal has much separation, significantly different signals will be played from the left and right loudspeakers. The echo path between the left and right loudspeakers and the microphone have different impulse responses, and a mono AEC cannot converge to both at the same time. An AEC with a stereo reference is designed to handle this properly, while a mono AEC is not.

In practice, an AEC with a mono reference will work well most of the time in a stereo conference room. Except for highly customized telepresence rooms, microphones are rarely panned even when sent to stereo codecs. Far-end audio is almost always mono, and far-end audio is our primary concern for AEC.

Dialog-heavy program audio is also likely to have most of its energy panned to the center, and would be canceled even with a mono AEC reference. If residual echo from program audio is audible, it will have low latency and likely be perceived by the far-end as additional reverberation or muddiness, rather than as a distinct echo.

Positional Audio

Positional audio is different from stereo in that each loudspeaker is playing a distinctly different channel. For example, a telepresence room might have two codecs connected to different sites and play the audio from each site from loudspeakers near the corresponding display. An AEC with a multi-channel reference is required to properly cancel echo in this type of system. With a mono AEC reference, echo would likely be frequently heard from this type of system whenever someone at a different far-end begins to talk.

Positional audio can possibly work if only one loudspeaker position is used at a time. For example, loudspeakers in the ceiling or table may be preferred during a voice-only conference, as local participants will be facing the center of the table. During a video conference, loudspeakers near the display would be preferred. If voice-only far-end audio is also rendered to the display loudspeakers during a video conference, a mono AEC reference can handle both scenarios (although echo may be audible until the AEC converges at the beginning of a call after switching between the two modes).

If program audio is rendered on different loudspeakers than far-end audio (e.g., program audio from front loudspeakers and far-end audio from ceiling loudspeakers), it may be preferable to leave program audio out of the AEC reference. This may result in muddy program audio being heard from the far end, but is less likely to cause audible echo.

Room Combining

Room combining itself is not a cause of AEC problems. However, if room combining is done by hand with parameter sets that adjust matrix cross-points, mistakes are easy to make. Some of the AEC references could easily contain extra far-end or program audio signals, or be missing required ones. Echo will be audible from the associated microphones and the problem could be very difficult to diagnose. Fortunately, the Conference Room Combiner removes this burden from the designer, and prevents this type of mistake.

Even with a properly configured room combining system, echo may be briefly audible after the room configuration changes. This changes the echo path between pairs of microphones and loudspeakers in different partitions, and the AEC needs to re-converge after this takes place.

The Echo Path Must Not Contain Nonlinear or Time-Varying Processing

The AEC’s adaptive filter can only model a linear, time-invariant echo path. Anything nonlinear or time-varying in the echo path can severely impair the performance of the AEC. The echo path can be described as

Any processing on the loudspeaker output that is not present on the AEC reference
The acoustic path between the loudspeaker and microphone (including the loudspeaker and microphone themselves)
Any processing on the microphone input before the AEC

A typical conference room design is shown in the diagram below. The signals that are part of the echo path are highlighted in red.

The Echo Path Must Not Contain Nonlinear or Time-Varying Processing Figure

Dynamics

Dynamics processing may constantly change its gain. When applied in the echo path, the AEC must constantly readapt to these changes, creating a strong possibility of frequent residual echo.

Compressors/limiters are often applied to loudspeaker outputs to prevent clipping. If such processing is required, the AEC reference must be created from the loudspeaker signal after compression/limiting (meaning the AEC reference output of the CRR would be unused). If the loudspeaker signal is stereo, it must be mixed down to mono after compression/limiting before being passed to the AEC reference.

Attempting to compensate for dynamic processing on the loudspeaker output by placing the same processing with the same settings before the AEC reference may not work as reliably. There is some risk that the two dynamics processing blocks may not apply the same gain at the same time, particularly if stereo to mono conversion in the CRR causes a slightly different level to appear at a mono compressor/limiter when compared to those seen by a stereo compressor/limiter on the loudspeaker outputs.

If dynamics are needed on microphone inputs, they should be applied post-AEC.

Automatic Microphone Mixing

Automatic microphone mixers frequently change the gain on each microphone channel. When they switch between microphones, this drastically changes the echo path in the mix. It is tempting to put an AMM before a single AEC channel, because AEC resources are limited. However, the gain changes in the echo path make the AEC perform poorly in general. An AEC should be placed on each microphone signal before the AMM.

Volume Control

Volume control has the same problem as dynamics, but less frequently. If volume control is applied in the echo path, residual echo may be briefly audible after a user changes volume.

Distortion

It is unlikely that distortion is intentionally introduced in the signal path of a conferencing system. Bad gain structure or poor quality components could introduce distortion in the echo path. The distortion cannot be modelled by the AEC and will cause residual echo. This echo will sound obviously distorted (much more so than the pre-AEC microphone signal).

Linear, Time-Invariant Processing Is (Usually) Safe

Any processing that is linear and time-invariant can be modeled by the AEC’s adaptive filter. Only extreme settings of this type of processing may be a problem.

Gain

Fixed gain will not usually cause problems for the AEC unless extreme gain settings (i.e., a poor gain structure) are used. Often the problem caused by bad gain settings will be from the resulting distortion, rather than the gain itself.

In some designs, a great deal of gain is applied in the amplifier, and the volume control applies a lot of attenuation to the signal. This can result in a very low level at the AEC reference, which can affect the AEC’s double-talk detector’s ability to determine when to adapt. Ideally, good nominal levels should be seen by the AEC reference when the volume control is at a comfortable setting.

Equalization

Equalization can normally be applied in the echo path without causing problems. There is no need to base the AEC reference on the equalized version of the signal, or to apply a duplicate EQ to the AEC reference. If the EQ is used to flatten the loudspeaker and room response, an un-equalized AEC reference is probably more representative of the echo path.

Likewise, any filters on microphone inputs are generally harmless, and can be applied pre-AEC if desired. In some cases, such as with a HPF on a mic with a lot of low-frequency noise, applying EQ pre-AEC can be beneficial.

If crossovers are used, the AEC reference must receive the full-band signal, and not one of the crossover outputs.

If the EQ is applying a lot of boost, the AEC might perform slightly better if that boost is accounted for in the AEC reference.

Delay

A modest delay in the echo path is not a problem for the AEC. Some delay naturally exists due to the distance between the loudspeaker and microphone, and the audio buffering in the system.

Excessive delay (more than a few tens of milliseconds) can be a problem for the AEC. The delay has the effect of reducing the available tail length of the AEC’s adaptive filter. It can also confuse the double-talk detector of the AEC by skewing the time alignment of the AEC reference and the echo.

Delay in the echo path isn’t necessarily added by the designer. Many displays add delay (sometimes more than 100 ms) for lip sync. Adding delay to only the AEC reference may be necessary to compensate for this.

Care must be taken not to insert more delay before the AEC reference than exists in the echo path. This causes the echo to be non-causal (the echo arrives before the reference), which cannot be modeled by the AEC and will cause audible echo to the far-end.