Draft Andre Adrian
Document: draft-aec-03.txt DFS Deutsche Flugsicherung
Category: Experimental
december 13th, 2004
Expires: ?
Voice over Internet Acoustic Echo Cancellation
Status of this Memo
This document specifies an Acoustic Echo Cancellation implementation
for hands-free Voice over Internet telephony and requests discussion
and suggestions for improvements.
Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved.
You are allowed to use this source code in any open source or closed
source software you want. You are allowed to use the algorithms for a
hardware solution. You are allowed to modify the source code.
You are not allowed to remove the name of the author from this memo or
from the source code files. You are not allowed to monopolize the
source code or the algorithms behind the source code as your
intellectual property.
This source code is free of royalty and comes with no warranty.
Abstract
This document specifies an acoustic echo cancellation (AEC) for voice
over IP. Because of the large latency in VoIP communication (tenths to
hunderts of milliseconds), AEC is necessary. The presented
implementation is based on the well-known Normalized Least Mean Square
(NLMS) and Geigel Double talk detector (DTD) algorithms. To improve
performance, a pre-whitening filter is used. The presented algorithm
is therefore of NLMS-pw family. The NLMS-pw family is known to give
good echo cancellation for moderate processing resources. This
algorithm is of complexity O(3*L) with L number of taps in the NLMS
filter.
Table of Contents
1. INTRODUCTION
2. AEC PRINCIPLES
3. AEC algorithms
3.1. Finite Impulse Response (FIR) Highpass Filter
3.2. Geigel Double Talk Detector
3.3. Normalized Least Mean Square - Pre-Whitening Filter
4. References
A. The C++ Source Code
A.1 aec.h
A.2 aec.cpp
A.3 aec_test.cpp
A.4 Compile source code
A.5 Test source code
1. INTRODUCTION
A hands-free telephone or full-duplex intercom system has a feedback
or echo problem because the output from the loudspeaker feeds into the
microphone. Several methods can be used to reduce or eliminate the
problem:
1.) Reduce the overall amplification. If the system amplification is
less then 1 a feedback dies away. This solution leads to poor volume.
2.) Use Acoustic Echo Suppression. Echo Suppression is realized with
speech activated switches. Suppression reduces the full-duplex
telephone to half-duplex. The switches can even "switch away"
beginnings of words.
3.) Use Acoustic Echo Cancellation. This is realized with an adaptive
or learning filter. First the filter learns from given microphone and
speaker signals the acoustics. After learning, the filter can
calculate an estimated microphone signal from the loudspeaker signal.
This estimated mic signal is subtracted from the real mic signal. The
difference signal no longer contains the loudspeaker signal - the
feedback loop is broken.
The Least Means Square algorithm from Widrow and Hoff is known since
1960. Unfortunately the LMS is a slow learner. The learning speed or
convergence rate is controlled by a constant value. This value in the
LMS can only be optimized for loud signals or for weak signals.
Optimizing for loud signals produces slow convergence with weak
signals. Optimizing for weak signals gives divergence with loud
signals. Divergence can be defined as "the filter does not reduce the
echo but does increase the echo" and is very ugly.
The Normalized LMS has a constant convergence rate for loud and weak
signals, the convergence rate controlling parameter is derived from
the signal energy.
For white noise signal, where all frequencies have the same energy,
the NLMS performs good. But the human speech has more energy in low
frequencies then in high frequencies. Therefore, a NLMS gives good
echo cancellation for low frequencies and poor echo cancellation for
high frequencies. A pre-whitening filter in front of the echo
cancellation filter transforms human speech into something more "white
noise" like - the energy of high frequency signals is similar to the
energy of low frequency signals. The presented algorithm uses the
most simple pre-whitening filter possible, a first order or one pole
highpass filter with transfer frequency equal to half of the sample
frequency (4kHz for the narrowband sample frequency of 8kHz).
Because the pre-whitening filter is fixed, the complexity of this
NLMS-pw filter is still the same as for the NLMS filter.
One important point should be remembered: The AEC in your telephony
device helps your telephony partner to hear no echo. Therefore AEC is
an altruistic algorithm.
2. AEC PRINCIPLES
The core of the acoustic echo cancellation is described in the
introduction. Next to the NLMS-pw three more blocks are used:
1.) A highpass filter for the microphone signal. Telephone users are
used to a frequency range between 300Hz and 3400Hz. Narrowband VoIP
can give 0Hz to 4000Hz. After hearing a VoIP signal with frequencies
below 300Hz testers complained about the bad quality. With a 300Hz
cut-off filter sound is limited as in telephone.
The highpass filter in use is a 13 taps finite impulse response (FIR)
filter. FIR filter was used because of its stability.
2.) A double talk detector. The AEC filter should only learn if the
signal from the microphone is determined from the loudspeaker signal
only. If the local or near-end user is talking, the filter can no
longer learn successful. Detection of user talking is done by
comparing the volume levels of loudspeaker and microphone.
This implementation uses the well-known Geigel DTD.
3.) An Acoustic Echo Suppressor (AES) or Non Linear Processor (NLP).
If the Double talk detector (DTD) detects "no talking", the
microphone signal gets attenuated by 6dB. This is done to suppress
echo artefacts.
AEC block diagram. Sin is the microphone signal, Rout and Rin is the
loudspeaker signal. Sout is the echo-cancelled microphone signal:
+--+ + +---+
Sin -->---|HP|--+------->(+)----+-->|NLP|--->-- Sout
+--+ | /|\ | +---+
| -| |
\|/ | |
+---+ +----+ |
|DTD|---->|NLMS|<-+
+---+ +----+
/|\ /|\
| |
| |
Rout -<---------+---------+-----------------<-- Rin
Figure 1.) AEC block diagram
3. AEC algorithms
This chapter gives the mathematical background to the source code.
This document will not give derivations of the algorithms or proofs.
See references for more information.
3.1. Finite Impulse Response (FIR) Highpass Filter
Ambient noises are often prominent in the frequency range to 300Hz.
Typical examples are fans (2800 Rpm are 46.7Hz) and hard disks (7200
Rpm are 120Hz).
Second, the small loudspeakers have often a resonance frequency around
80Hz. This is a non-linearity to the echo cancellation.
Third, and maybe most important, the users are used to telephone
quality with a 300Hz cut-off.
The FIR filter has 13 taps. That gives a group delay of 0.8ms. Because
of the stability a FIR filter was used.
3.2. Geigel Double Talk Detector
Talk detection can be done with a threshold for the microphone signal
only. This approach is very sensitive to the threshold level. A more
robust approach is to compare microphone level with loudspeaker level.
The threshold in this solution will be a relative one. Because we deal
with echo, it is not sufficient to compare only the actual levels, but
we have to consider previous levels, too.
The Geigel DTD brings these ideas in one simple formula: The last L
levels (index 0 for now and index L-1 for L samples ago) from
loudspeaker signal are compared to the actual microphone signal. To
avoid problems with phase, the absolute values are used.
Double talk is declared if:
|d| >= c * max(|x[0]|, |x[1]|, .., |x[L-1]|)
with |d| is the absolute level of actual microphone signal,
c is a threshold value (typical value 0.5 for -6dB or 0.71 for -3dB),
|x[0]| is the absolute level of actual loudspeaker signel,
|x[L-1]| is the absolute level of loudspeaker signal L samples ago.
See references 3, 7, 9.
3.3. Normalized Least Mean Square - Pre-Whitening Filter
The NLMS-pw, NLMS and LMS are of the gradient descent-based algorithms
family. The good features of gradient-descent based algorithms are
simplicity and robustness.
First we look at the "echo cancelling" formula, the convolution. This
formula is used to subtract the (from the loudspeaker signal)
estimated microphone signal from the real microphone signal.
e = d - X' * W
with e is the linear error signal or echo-cancelled microphone signal,
d is the desired signal or the microphone signal with echo,
X' is the transpose of the loudspeaker signals vector,
W is the adaptive weights vector.
With a matching vector W the echo cancellation can be perfect.
Unfortunately, learning the vector W has limitations. The loudspeaker
is not the only audio source at filter learning. Ambient sounds and
noises, system internal amplifier and converter noises and
non-linearities of loudspeaker and microphone have a negative impact
on learning.
Due to the LMS simplicity, all elements of W are updated with the same
"mikro * e" term. This simple approach makes the LMS robust and only
demanding moderate processing resources, but this "one term fits all"
approach prevents "perfect" learning, too.
The LMS algorithm has the update formula:
W[n+1] = W[n] + 2*mikro*e*X[n]
with W[n+1] is the new adaptive weights vector,
W[n] is the previous adaptive weights vector,
mikro is the step size constant or variable,
e is the error signal
X[n] is the loudspeaker signals vector.
The constant scalar mikro becomes a variable in NLMS. This variable is
calculated from the loudspeaker signals vector with:
1
mikro = ------
X' * X
with X' is the transpose of the loudspeaker signals vector,
X is the loudspeaker signals vector.
Note: The vector dot product is a scalar. It is the sum of the
element-wise multiplication of both vectors.
The constant value 2 in the LMS formula changes into a stability
"tuneing" constant. For stable adaptation this constant should be
between 0 and 1, this NLMS-pw uses a value of 0.7.
The NLMS-pw uses for the weights vector update and the calculation of
mikro highpass-filtered values of e and X. The filtered values are
used because the NLMS converges best with white noise signals, and
human voice is not white noise. The fixed highpass filter approach
used in this NLMS-pw does not increase the overall complexity.
With
ef = highpass(e)
Xf = highpass(X)
we get our NLMS-pw weights vector update formulas:
0.7
mikro = --------
Xf' * Xf
W[n+1] = W[n] + mikro*ef*Xf[n]
with ef is the highpass-filtered value of e,
Xf is the highpass-filtered value of X,
and the other values are as above.
Both filters are 1. order FIR with a transfer frequency of 4000Hz.
For other pre-whitening algorithms see references 6, 8, 9. For non-LMS
echo cancellation algorithms see references 6 and 9.
4. References
[1] B. Widrow, M. E. Hoff Jr., "Adaptive switching circuits", Western
Electric Show and Convention Record, Part 4, pages 96-104,
Aug. 1960
[2] B. Widrow, et al, "Stationary and Nonstationary Learning
Characteristics of the LMS Adaptive Filter", Proc. of the IEEE,
vol. 64 No. 8, pp. 1151-1162, Aug. 1976
[3] D.L. Duttweiler, "A twelve-channel digital echo canceller", IEEE
Trans. Commun., Vol. 26, pp. 647-653, May 1978
[4] B. Widrow, S.D. Stearns, Adaptive Signal Processing,
Prentice-Hall, 1985
[5] D. Messerschmitt, D. Hedberg, C. Cole, A. Haoui, P. Winship,
"Digital Voice Echo Canceller with a TMS32020", Application report
SPRA129, Texas Instruments, 1989
[6] R. Storn, "Echo Cancellation Techniques for Multimedia
Applications - a Survey", TR-96-046, International Computer
Science Institute, Berkeley, Nov. 1996
[7] J. Nikolic, "Implementing a Line Echo Canceller using the block
update and NLMS algorithms on the TMS320C54x DSP", Application
report SPRA188, Texas Instruments, Apr. 1997
[8] M. G. Siqueira, "Adaptive Filtering Algorithms in Acoustic Echo
Cancellation and Feedback Reduction", Ph.D. thesis, University of
California, Los Angeles, 1998
[9] T. Gaensler, S. L. Gay, M. M. Sondhi, J. Benesty, "Double-Talk
robust fast converging algorithms for network echo cancellation",
IEEE trans. on speech and audio processing, vol. 8, No. 6,
Nov. 2000
[10] M. Hutson, "Acoustic Echo Cancellation using Digital Signal
Processing", Bachelor of Engineering (Honours) thesis, The School
of Information Technology and Electrical Engineering, The
University of Queensland, Nov 2003
[11] A. Adrian, "Audio Echo Cancellation", Free Software/Open Source
Telephony Summit 2004, German Unix User Group, Geilenkirchen,
Germany, Jan. 16-20, 2004
Appendix A. The C++ Source Code
/***************** A.1 APPENDIX aec.h *****************/
/* aec.h
*
* Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved.
*
* Acoustic Echo Cancellation NLMS-pw algorithm
*
* Version 1.3 filter created with www.dsptutor.freeuk.com
*/
#ifndef _AEC_H /* include only once */
// use double if your CPU does software-emulation of float
typedef float REAL;
/* dB Values */
const REAL M0dB = 1.0f;
const REAL M3dB = 0.71f;
const REAL M6dB = 0.50f;
const REAL M9dB = 0.35f;
const REAL M12dB = 0.25f;
const REAL M18dB = 0.125f;
const REAL M24dB = 0.063f;
/* dB values for 16bit PCM */
/* MxdB_PCM = 32767 * 10 ^(x / 20) */
const REAL M10dB_PCM = 10362.0f;
const REAL M20dB_PCM = 3277.0f;
const REAL M25dB_PCM = 1843.0f;
const REAL M30dB_PCM = 1026.0f;
const REAL M35dB_PCM = 583.0f;
const REAL M40dB_PCM = 328.0f;
const REAL M45dB_PCM = 184.0f;
const REAL M50dB_PCM = 104.0f;
const REAL M55dB_PCM = 58.0f;
const REAL M60dB_PCM = 33.0f;
const REAL M65dB_PCM = 18.0f;
const REAL M70dB_PCM = 10.0f;
const REAL M75dB_PCM = 6.0f;
const REAL M80dB_PCM = 3.0f;
const REAL M85dB_PCM = 2.0f;
const REAL M90dB_PCM = 1.0f;
const REAL MAXPCM = 32767.0f;
/* Design constants (Change to fine tune the algorithms */
/* The following values are for hardware AEC and studio quality
* microphone */
/* maximum NLMS filter length in taps. A longer filter length gives
* better Echo Cancellation, but slower convergence speed and
* needs more CPU power (Order of NLMS is linear) */
#define NLMS_LEN (80*8)
/* convergence speed. Range: >0 to <1 (0.2 to 0.7). Larger values give
* more AEC in lower frequencies, but less AEC in higher frequencies. */
const REAL Stepsize = 0.7f;
/* minimum energy in xf. Range: M70dB_PCM to M50dB_PCM. Should be equal
* to microphone ambient Noise level */
const REAL Min_xf = M75dB_PCM;
/* Double Talk Detector Speaker/Microphone Threshold. Range <=1
* Large value (M0dB) is good for Single-Talk Echo cancellation,
* small value (M12dB) is good for Doulbe-Talk AEC */
const REAL GeigelThreshold = M6dB;
/* Double Talk Detector hangover in taps. Not relevant for Single-Talk
* AEC */
const int Thold = 30 * 8;
/* for Non Linear Processor. Range >0 to 1. Large value (M0dB) is good
* for Double-Talk, small value (M12dB) is good for Single-Talk */
const REAL NLPAttenuation = M12dB;
/* Below this line there are no more design constants */
/* Exponential Smoothing or IIR Infinite Impulse Response Filter */
class IIR_HP {
REAL x;
public:
IIR_HP() { x = 0.0f; };
REAL highpass(REAL in) {
const REAL a0 = 0.01f; /* controls Transfer Frequency */
/* Highpass = Signal - Lowpass. Lowpass = Exponential Smoothing */
x += a0 * (in - x);
return in - x;
};
};
/* 13 taps FIR Finite Impulse Response filter
* Coefficients calculated with
* www.dsptutor.freeuk.com/KaiserFilterDesign/KaiserFilterDesign.html
*/
class FIR_HP13 {
REAL z[14];
public:
FIR_HP13() { memset(this, 0, sizeof(FIR_HP13)); };
REAL highpass(REAL in) {
const REAL a[14] = {
// Kaiser Window FIR Filter, Filter type: High pass
// Passband: 300.0 - 4000.0 Hz, Order: 12
// Transition band: 100.0 Hz, Stopband attenuation: 10.0 dB
-0.043183226f, -0.046636667f, -0.049576525f, -0.051936015f,
-0.053661242f, -0.054712527f, 0.82598513f, -0.054712527f,
-0.053661242f, -0.051936015f, -0.049576525f, -0.046636667f,
-0.043183226f, 0.0f
};
memmove(z+1, z, 13*sizeof(REAL));
z[0] = in;
REAL sum0 = 0.0, sum1 = 0.0;
int j;
for (j = 0; j < 14; j+= 2) {
// optimize: partial loop unrolling
sum0 += a[j] * z[j];
sum1 += a[j+1] * z[j+1];
}
return sum0+sum1;
}
};
/* Recursive single pole IIR Infinite Impulse response filter
* Coefficients calculated with
* http://www.dsptutor.freeuk.com/IIRFilterDesign/IIRFiltDes102.html
*/
class IIR1 {
REAL x, y;
public:
IIR1() { memset(this, 0, sizeof(IIR1)); };
REAL highpass(REAL in) {
// Chebyshev IIR filter, Filter type: HP
// Passband: 3700 - 4000.0 Hz
// Passband ripple: 1.5 dB, Order: 1
const REAL a0 = 0.105831884f;
const REAL a1 = -0.105831884;
const REAL b1 = 0.78833646f;
REAL out = a0 * in + a1 * x + b1 * y;
x = in;
y = out;
return out;
}
};
/* Recursive two pole IIR Infinite Impulse Response filter
* Coefficients calculated with
* http://www.dsptutor.freeuk.com/IIRFilterDesign/IIRFiltDes102.html
*/
class IIR2 {
REAL x[2], y[2];
public:
IIR2() { memset(this, 0, sizeof(IIR2)); };
REAL highpass(REAL in) {
// Butterworth IIR filter, Filter type: HP
// Passband: 2000 - 4000.0 Hz, Order: 2
const REAL a[] = { 0.29289323f, -0.58578646f, 0.29289323f };
const REAL b[] = { 1.3007072E-16f, 0.17157288f };
REAL out =
a[0] * in +
a[1] * x[0] +
a[2] * x[1] -
b[0] * y[0] -
b[1] * y[1];
x[1] = x[0];
x[0] = in;
y[1] = y[0];
y[0] = out;
return out;
}
};
// Extention in taps to reduce mem copies
#define NLMS_EXT (10*8)
// block size in taps to optimize DTD calculation
#define DTD_LEN 16
class AEC {
// Time domain Filters
IIR_HP hp00, hp1; // DC-level remove Highpass)
FIR_HP13 hp0; // 300Hz cut-off Highpass
IIR1 Fx, Fe; // pre-whitening Highpass for x, e
// Geigel DTD (Double Talk Detector)
REAL max_max_x; // max(|x[0]|, .. |x[L-1]|)
int hangover;
// optimize: less calculations for max()
REAL max_x[NLMS_LEN / DTD_LEN];
int dtdCnt;
int dtdNdx;
// NLMS-pw
REAL x[NLMS_LEN + NLMS_EXT]; // tap delayed loudspeaker signal
REAL xf[NLMS_LEN + NLMS_EXT]; // pre-whitening tap delayed signal
REAL w[NLMS_LEN]; // tap weights
int j; // optimize: less memory copies
int lastupdate; // optimize: iterative dotp(x,x)
double dotp_xf_xf; // double to avoid loss of precision
double Min_dotp_xf_xf;
REAL s0avg;
public:
AEC();
/* Geigel Double-Talk Detector
*
* in d: microphone sample (PCM as REALing point value)
* in x: loudspeaker sample (PCM as REALing point value)
* return: 0 for no talking, 1 for talking
*/
int dtd(REAL d, REAL x);
/* Normalized Least Mean Square Algorithm pre-whitening (NLMS-pw)
* The LMS algorithm was developed by Bernard Widrow
* book: Widrow/Stearns, Adaptive Signal Processing, Prentice-Hall, 1985
*
* in mic: microphone sample (PCM as REALing point value)
* in spk: loudspeaker sample (PCM as REALing point value)
* in update: 0 for convolve only, 1 for convolve and update
* return: echo cancelled microphone sample
*/
REAL nlms_pw(REAL mic, REAL spk, int update);
/* Acoustic Echo Cancellation and Suppression of one sample
* in d: microphone signal with echo
* in x: loudspeaker signal
* return: echo cancelled microphone signal
*/
int AEC::doAEC(int d, int x);
float AEC::getambient() {
return s0avg;
};
void AEC::setambient(float Min_xf) {
dotp_xf_xf = Min_dotp_xf_xf = NLMS_LEN * Min_xf * Min_xf;
};
};
#define _AEC_H
#endif
/***************** A.2 APPENDIX aec.cpp *****************/
/* aec.cpp
*
* Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved.
*
* Acoustic Echo Cancellation NLMS-pw algorithm
*
* Version 1.3 filter created with www.dsptutor.freeuk.com
*/
#include
#include
#include
#include
#include "aec.h"
/* Vector Dot Product */
REAL dotp(REAL a[], REAL b[]) {
REAL sum0 = 0.0, sum1 = 0.0;
int j;
for (j = 0; j < NLMS_LEN; j+= 2) {
// optimize: partial loop unrolling
sum0 += a[j] * b[j];
sum1 += a[j+1] * b[j+1];
}
return sum0+sum1;
}
AEC::AEC()
{
max_max_x = 0.0f;
hangover = 0;
memset(max_x, 0, sizeof(max_x));
dtdCnt = dtdNdx = 0;
memset(x, 0, sizeof(x));
memset(xf, 0, sizeof(xf));
memset(w, 0, sizeof(w));
j = NLMS_EXT;
lastupdate = 0;
s0avg = M80dB_PCM;
setambient(Min_xf);
}
REAL AEC::nlms_pw(REAL mic, REAL spk, int update)
{
REAL d = mic; // desired signal
x[j] = spk;
xf[j] = Fx.highpass(spk); // pre-whitening of x
// calculate error value
// (mic signal - estimated mic signal from spk signal)
REAL e = d - dotp(w, x + j);
REAL ef = Fe.highpass(e); // pre-whitening of e
// optimize: iterative dotp(xf, xf)
dotp_xf_xf += (xf[j]*xf[j] - xf[j+NLMS_LEN-1]*xf[j+NLMS_LEN-1]);
if (update) {
// calculate variable step size
REAL mikro_ef = Stepsize * ef / dotp_xf_xf;
// update tap weights (filter learning)
int i;
for (i = 0; i < NLMS_LEN; i += 2) {
// optimize: partial loop unrolling
w[i] += mikro_ef*xf[i+j];
w[i+1] += mikro_ef*xf[i+j+1];
}
}
if (--j < 0) {
// optimize: decrease number of memory copies
j = NLMS_EXT;
memmove(x+j+1, x, (NLMS_LEN-1)*sizeof(REAL));
memmove(xf+j+1, xf, (NLMS_LEN-1)*sizeof(REAL));
}
return e;
}
int AEC::dtd(REAL d, REAL x)
{
// optimized implementation of max(|x[0]|, |x[1]|, .., |x[L-1]|):
// calculate max of block (DTD_LEN values)
x = fabsf(x);
if (x > max_x[dtdNdx]) {
max_x[dtdNdx] = x;
if (x > max_max_x) {
max_max_x = x;
}
}
if (++dtdCnt >= DTD_LEN) {
dtdCnt = 0;
// calculate max of max
max_max_x = 0.0f;
for (int i = 0; i < NLMS_LEN/DTD_LEN; ++i) {
if (max_x[i] > max_max_x) {
max_max_x = max_x[i];
}
}
// rotate Ndx
if (++dtdNdx >= NLMS_LEN/DTD_LEN) dtdNdx = 0;
max_x[dtdNdx] = 0.0f;
}
// The Geigel DTD algorithm with Hangover timer Thold
if (fabsf(d) >= GeigelThreshold * max_max_x) {
hangover = Thold;
}
if (hangover) --hangover;
return (hangover > 0);
}
int AEC::doAEC(int d, int x)
{
REAL s0 = (REAL)d;
REAL s1 = (REAL)x;
// Mic Highpass Filter - to remove DC
s0 = hp00.highpass(s0);
// Mic Highpass Filter - telephone users are used to 300Hz cut-off
s0 = hp0.highpass(s0);
// ambient mic level estimation
s0avg += 1e-4f*(fabsf(s0) - s0avg);
// Spk Highpass Filter - to remove DC
s1 = hp1.highpass(s1);
// Double Talk Detector
int update = !dtd(s0, s1);
// Acoustic Echo Cancellation
s0 = nlms_pw(s0, s1, update);
// Acoustic Echo Suppression
if (update) {
// Non Linear Processor (NLP): attenuate low volumes
s0 *= NLPAttenuation;
}
// Saturation
if (s0 > MAXPCM) {
return (int)MAXPCM;
} else if (s0 < -MAXPCM) {
return (int)-MAXPCM;
} else {
return (int)roundf(s0);
}
}
/***************** A.3 APPENDIX aec_test.cpp *****************/
/* aec_test.cpp
*
* Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved.
*
* Test stub for Acoustic Echo Cancellation NLMS-pw algorithm
* Author: Andre Adrian, DFS Deutsche Flugsicherung
*
*
* compile
c++ -O2 -o aec_test aec_test.cpp aec.cpp -lm
*
* Version 1.3 set/get ambient in dB
*/
#include
#include
#include
#include
#include "aec.h"
#define TAPS (80*8)
typedef signed short MONO;
typedef struct {
signed short l;
signed short r;
} STEREO;
float dB2q(float dB)
{
/* Dezibel to Ratio */
return powf(10.0f, dB / 20.0f);
}
float q2dB(float q)
{
/* Ratio to Dezibel */
return 20.0f * log10f(q);
}
/* Read a raw audio file (8KHz sample frequency, 16bit PCM, stereo)
* from stdin, echo cancel it and write it to stdout
*/
int main(int argc, char *argv[])
{
STEREO inbuf[TAPS], outbuf[TAPS];
fprintf(stderr, "usage: aec_test [ambient in dB] out.raw\n");
AEC aec;
if (argc >= 2) {
aec.setambient(MAXPCM*dB2q(atof(argv[1])));
}
int taps;
while (taps = fread(inbuf, sizeof(STEREO), TAPS, stdin)) {
int i;
for (i = 0; i < taps; ++i) {
int s0 = inbuf[i].l; /* left channel microphone */
int s1 = inbuf[i].r; /* right channel speaker */
/* and do NLMS */
s0 = aec.doAEC(s0, s1);
/* copy back */
outbuf[i].l = 0; /* left channel silence */
outbuf[i].r = s0; /* right channel echo cancelled mic */
}
fwrite(outbuf, sizeof(STEREO), taps, stdout);
}
float ambient = aec.getambient();
float ambientdB = q2dB(ambient / 32767.0f);
fprintf(stderr, "Ambient = %2.0f dB\n", ambientdB);
fflush(NULL);
return 0;
}
/***************** A.4 APPENDIX Compile source code *****************/
On a Linux system with GNU C++ compiler enter:
g++ aec_test.cpp aec.cpp -o aec_test -lm
/***************** A.5 APPENDIX Test source code *****************/
The microphone and loudspeaker signals have to be synchronized on a
sample-to-sample basis to make acoustic echo cancellation working.
An AC97 conformal on-board soundcard in a Personal Computer can be set
in a special stereo mode: The left channnel records microphone signal
and the right channel reports loudspeaker signal.
To set-up a Linux PC with ALSA sound system, microphone connected to
Mic in and loudspeaker connected to right Line out enter:
amixer -q set 'Master',0 50% unmute
amixer -q set 'PCM',0 80% unmute
amixer -q set 'Line',0 0% mute
amixer -q set 'CD',0 0% mute
amixer -q set 'Mic',0 0% mute
amixer -q set 'Video',0 0% mute
amixer -q set 'Phone',0 0% mute
amixer -q set 'PC Speaker',0 0% mute
amixer -q set 'Aux',0 0% mute
amixer -q set 'Capture',0 50%,0%
amixer -q set 'Mic Boost (+20dB)',0 1
amixer -q cset iface=MIXER,name='Capture Source' 0,5
amixer -q cset iface=MIXER,name='Capture Switch' 1
To test the acoustic echo cancellation we simulate a real telephone
conversation in 5 steps:
(1) record far-end speaker,
(2) perform acoustic echo cancellation (this should change nothing)
(3) playback far-end speaker and at the same time record near-end spk.
(4) perform acoustic echo cancellation
(5) playback near-end speaker (far-end speech should be cancelled)
To record 10 seconds of speech into the file b.raw enter:
arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b.raw
To perform AEC at the far-end enter:
./aec_test b1.raw
To playback file b1.raw and simultaneously record b2.raw enter both
commands in one go:
aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b1.raw &
arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b2.raw
To perform AEC at the near-end enter:
./aec_test b3.raw
To playback the echo-cancelled near-end enter:
aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b3.raw