Blind Source Separation - Finding Needles in Haystacks

Download Report

Transcript Blind Source Separation - Finding Needles in Haystacks

Click to edit Master title style

Blind Source Separation: Finding Needles in Haystacks

Click to edit Master text styles Second level Third level

Scott C. Douglas

Fourth level

Department of Electrical Engineering

Fifth level

Southern Methodist University douglas@lyle.smu.edu

Signal Mixtures are Everywhere

• Cell Phones • Radio Astronomy • Brain Activity • Speech/Music How do we make sense of it all?

Example: Speech Enhancement

Example: Wireless Signal Separation

Example: Wireless Signal Separation

Example: Wireless Signal Separation

Example: Wireless Signal Separation

Outline of Talk

• Blind Source Separation 

General concepts and approaches

• Convolutive Blind Source Separation 

Application to multi-microphone speech recordings

• Complex Blind Source Separation 

What differentiates the complex-valued case

• Conclusions

Blind Source Separation (BSS) A Simple Math Example

s

(

k

)

x

(

k

)

A B y

(

k

) • • • • Let

s

1 (

k

),

s

2 (

k

),…,

s m

(

k

)

Measurements:

be signals of interest For 1 ≤

i ≤ m

,

x i

(

k

) =

a i

1

s

1 (

k

) +

a i

2

s

2 (

k

) + … +

a im s m

(

k

) Sensor noise is neglected Dispersion (echo/reverberation) is absent

Blind Source Separation Example (continued)

s

(

k

)

x

(

k

)

A B y

(

k

) • Can Show

y i

(

k

) =

b i

1

x

: 1 ( The

s i

(

k

) ’s can be recovered as

k

) +

b i

2

x

2 (

k

) + … +

b im x m

(

k

) up to permutation and scaling factors (the matrix

B

“is like” the inverse of matrix

A

) Problem: How do you find the when you don’t know the demixing

b ij

’s mixing

a ij

’s or

s j

(

k

) ’s?

Why Blind Source Separation?

(Why not Traditional Beamforming?)

• • • BSS requires no knowledge of sensor geometry . The system can be uncalibrated, with unmatched sensors.

BSS does not need knowledge of source positions relative to the sensor array.

BSS requires little to no knowledge of signal types chain.

- can push decisions/ detections to the end of the processing

What Properties Are Necessary for BSS to Work?

• • Separation can be achieved when (# sensors) ≥ (# of sources) The talker signals of each other and {

s j

(

t

)} are statistically-independent 

are non-Gaussian in amplitude

 OR

have spectra that differ from each other

 OR

are non-stationary Statistical independence

is the critical assumption.

Entropy is the Key to Source Separation

Entropy: A measure of regularity In physics, entropy increases (less order) In biology, entropy decreases (more order) 

In BSS, separated signals are demixed and, have “more order” as a group.

First used in 1996 for speech separation.

Convolutive Blind Source Separation

• Mixing system is dispersive: • Separation System

B

(

z

) is a multichannel filter

Goal of Convolutive BSS

• Key idea: For convolutive BSS, sources are

arbitrarily filtered

and

arbitrarily shuffled

• • •

Non-Gaussian-Based Blind Source Separation

Basic Goal: Make the output signals look non Gaussian, because mixtures look “more Gaussian” (from the Central Limit Theorem) Criteria Based On This Goal: 

Density Modeling

 

Contrast Functions Property Restoral [

e.g.

Algorithm] (Non-)Constant Modulus

Implications: 

Separating capability of the criteria will be similar

Implementation details (

e.g.

optimization strategy) will yield performance differences

• •

BSS for Convolutive Mixtures

Idea: Translate separation task into frequency domain and apply multiple independent instantaneous BSS procedures  Does not work

due to permutation problems

A Better Idea: Reformulate separation tasks in the context of multichannel filtering 

Separation criterion “stays” in the time domain – no implied permutation problem

Can still employ fast convolution methods for efficient implementation

Natural Gradient Convolutive BSS Alg. [Amari/Douglas/Cichocki/Yang 1997] =

where

f

(

y

) is a simple vector-valued nonlinearity.

Criterion: Density-based (Maximum Likelihood) Complexity: about four multiply/adds per tap

Blind Source Separation Toolbox

• • • A MATLAB toolbox of robust source separation algorithms for noisy convolutive mixtures (developed under govt. contract) Allows us to evaluate relationships and tradeoffs between different approaches easily and rapidly Used to determine when a particular algorithm or approach is appropriate for a particular (acoustic) measurement scenario

Speech Enhancement Methods

• • •  Classic (frequency selective)

linear filtering

Only useful for the simplest of situations

  Single-microphone

spectral subtraction:

Only useful if the signal is reasonably well separated to begin with ( > 5dB SINR ) Tends to introduce “musical” artifacts

Research Focus:

How to leverage

multiple microphones

to achieve

robust signal enhancement

with

minimal knowledge.

• •

Novel Techniques for Speech Enhancement Blind Source Separation:

Find

all

characteristics.

the talker signals in the room - loud

and

soft, high

and

low-pitched, near

and

far away … without knowledge of

any

of these

Multi-Microphone Signal Enhancement:

Using only the knowledge of “target present” or “target absent” labels on the data, pull out the target signal from the noisy background.

SMU Multimedia Systems Lab Acoustic Facility

• Room (Nominal Configuration) 

Acoustically-treated

RT = 300 ms

Non-parallel walls to prevent flutter echo

• Sources 

Loudspeakers playing Recordings as well as “live” talkers.

Distance to mics: 50 cm

Angles: -30 o , 0 o , 27.5

o

• Sensors 

Omnidirectional Micro- phones (AT803b)

Linear array (4cm spacing)

Data collection and processing entirely within MATLAB.

Allows for careful characterization, fast evaluation, and experimentation with artificial and human talkers.

Blind Source Separation Example

Talker 1 (MG) Talker 2 (SCD)

Convolutive Mixing (Room) Separation System (Code)

Performance improvement: Between 10 dB and 15 dB for “equal-level” mixtures, and

even higher

for unequal-level ones.

Unequal Power Scenario Results

Time-domain CBSS methods provide the greatest SIR improvements for weak sources; no significant improvement in SIR if the initial SIR is already large

Multi-Microphone Speech Enhancement

Noise Source Noise Source

y 1 y 2 y 3

Linear Processing Contains most speech z 1 z 2 z 3

y n

z n Contains most noise Speech Source Adaptive Algorithm

Speech Enhancement via Iterative Multichannel Filtering

• System output at time

k:

a linear adaptive filter • is a sequence of (

n

x

n

) matrices at iteration

k.

Goal

: Adapt , over time such that the multichannel output contains signals with maximum speech energy in the first output.

• • • • •

Multichannel Speech Enhancement Algorithm

A novel* technique for enhancing target speech in noise using two or more microphones via joint decorrelation Requires rough target identifier (i.e. when talker speech is present) Is adaptive to changing noise characteristics Knowledge of source locations, microphone positions, other characteristics not needed.

Details in [Gupta and Douglas, IEEE Trans. Audio, Speech, Lang. Proc., May 2009] *Patent pending

8 7

Performance Evaluations

6 6 7 8 • • • Room – Acoustically-treated, RT = 300 ms – Non-parallel walls to prevent flutter echo Sources – Loudspeakers playing BBC Recordings (Fs = 8kHz), 1 male/1-2 noise sources – Distance to mics: 1.3 m – Angles: -30 o , 0 o , 27.5

o Sensors – Linear array adjustable (4cm spacing) • • • Room – Ordinary conference room (RT=600ms) Sources – Loudspeakers playing BBC Recordings (Fs = 8kHz), 1 male/1-2 noise sources – Angles: -15 o , 15 o , 30 o Sensors – Omnidirectional Microphones (AT803b) – Linear array adjustable (4cm nominal spacing) 28

Audio Examples • • • • Acoustic Lab: Initial SIR = -10dB, 3-Mic System Before: After: Acoustic Lab: Initial SIR = 0dB, 2-Mic System Before: After: Conference Room: Initial SIR = -10dB, 3-Mic System Before: After: Conference Room: Initial SIR = 5dB, 2-Mic System Before: After:

Effect of Noise Segment Length on Overall Performance

Diffuse Noise Source Example

• • Noise Source: SMU Campus-Wide Air Handling System Data was recorded using a simple two channel portable M Audio recorder (16-bit, 48kHz) with it associated “T”-shaped omnidirectional stereo array at arm’s length, then downsampled to 8kHz.

31

Air Handler Data Processing

• • •

Step 1:

Spatio-Temporal GEVD Processing on a frame-by-frame basis with

L =

256, where

Rv(k) = Ry(k-1);

that is, data was whitened to the previous frame.

Step 2:

Least-squares multichannel linear prediction was used to remove tones.

Step 3:

Log-STSA spectral subtraction was applied to the first output channel. 32

Complex Blind Source Separation

s

(

k

)

A x

(

k

)

B y

(

k

) • • • • Signal Model :

x

(

k

) =

A s

(

k

) Both the

s i

(

k

) ’s in

s

(

k

) and the elements of

A

are

complex-valued.

Separating matrix

B

well.

is complex-valued as It appears that there is little difference from the real valued case…

Complex Circular vs. Complex Non Circular Sources

• • (Second-Order) Circular Source: The energies of the real and imaginary parts of

s i

(

k

) are the same. (Second-Order) Non-Circular Source: The energies of the real and imaginary parts of

s i

(

k

) are not the same. Circular Circular Non-Circular

• • •

Why Complex Circularity Matters in Blind Source Separation

Fact #1: It is possible to separate non-circular sources

by decorrelation alone

if their non circularities differ [Eriksson and Koivunen, IEEE Trans. IT, 2006] Fact #2: The

strong-uncorrelating transform

is a unique linear transformation for identifying non circular source subspaces using only covariance matrices. Fact #3: Knowledge of source non-circularity is required to obtain the best performance of a complex BSS procedure.

Complex Fixed Point Algorithm [Douglas 2007]

NOTE: The MATLAB code involves

both transposes and Hermitian transposes… and no, those aren’t mistakes!

Performance Comparisons

Complex BSS Example Original Sources Sensor Signals CFPA1 Outputs

16-elem ULA,  /4 Spacing 3000 Snapshots SINRs/elem: -17,-12,-5,-12,-12 (dB) .

DOAs( o ): -45,20,-15,49,35 Output SINRs (dB): 7,24,18,15,23 Complexity: ~3500 FLOPS per output sample

• • • •

Conclusions

Blind Source Separation provides unique capabilities for extracting useful signals from multiple sensor measurements corrupted by noise. Little to no knowledge of the sensor array geometry, the source positions, or the source statistics or characteristics is required.

Algorithm design can be tricky. Opportunities for applications in speech enhancement, wireless communications, other areas.

For Further Reading

My publications page at SMU: http://lyle.smu.edu/~douglas/puball.html

• • • It has available for download

82% of my published journal papers 75% of my published conference papers