Issuu on Google+

Information Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings Sree Harsha Yella, Fabio Valente

August 31, Interspeech 2011, Florence, Italy

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker Diarization

Speaker diarization addresses the task of “who spoke when�

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker Diarization

Speaker diarization addresses the task of “who spoke when� Estimation of number of speakers. Identification of speech segments corresponding to each speaker.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker Diarization

Speaker diarization addresses the task of “who spoke when� Estimation of number of speakers. Identification of speech segments corresponding to each speaker.

Common approaches HMM/GMM modeling Top-down splitting Bottom-up clustering

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker Diarization

Speaker diarization addresses the task of “who spoke when� Estimation of number of speakers. Identification of speech segments corresponding to each speaker.

Common approaches HMM/GMM modeling Top-down splitting Bottom-up clustering

Non parametric method Information Bottleneck framework.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker Diarization

Speaker diarization addresses the task of “who spoke when� Estimation of number of speakers. Identification of speech segments corresponding to each speaker.

Common approaches HMM/GMM modeling Top-down splitting Bottom-up clustering

Non parametric method Information Bottleneck framework.

Complementary nature of systems Like in ASR, diarization systems can be combined. August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Combining diarization systems

Piped approaches (initializing a system with the output of another) (Moraru et.al, 2002,2003). Does not influence every step in diarization.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Combining diarization systems

Piped approaches (initializing a system with the output of another) (Moraru et.al, 2002,2003). Does not influence every step in diarization.

Voting between outputs of multiple systems (Tranter, 2005). Performs late combination of outputs.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Combining diarization systems

Piped approaches (initializing a system with the output of another) (Moraru et.al, 2002,2003). Does not influence every step in diarization.

Voting between outputs of multiple systems (Tranter, 2005). Performs late combination of outputs.

Integrated approaches (Moraru et.al, 2003; Bozonnet et.al, 2010). Require changing some parameters/modules of individual diarization systems.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Combining diarization systems

Piped approaches (initializing a system with the output of another) (Moraru et.al, 2002,2003). Does not influence every step in diarization.

Voting between outputs of multiple systems (Tranter, 2005). Performs late combination of outputs.

Integrated approaches (Moraru et.al, 2003; Bozonnet et.al, 2010). Require changing some parameters/modules of individual diarization systems.

Current work Overcomes these problems by performing feature level combination.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


TANDEM features used in ASR p(Y |st ) HMM/GMM

st

Spectral features

Phoneme Posteriors

Log + PCA

TANDEM features

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


TANDEM features used in ASR p(Y |st ) HMM/GMM

st

Spectral features

Phoneme Posteriors

Log + PCA

TANDEM features

Diarization task is unsupervised.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


TANDEM features used in ASR p(Y |st ) HMM/GMM

st

Spectral features

Phoneme Posteriors

Log + PCA

TANDEM features

Diarization task is unsupervised. IB diarization output

c1

c2

c1

c1

c1

c2

c3

c3

c3

c3

c2

yL

HMM/GMM

Relevance variables

p(Y |st )

yl

Log + PCA

y2 y1 s1

s2

st

sN

Spectral features

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Outline of the talk

1

State-of-the art HMM/GMM diarization

2

Speaker diarization based on IB

3

Information Bottleneck features

4

Experimental setup and results

5

Conclusions

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Outline of the talk

1

State-of-the art HMM/GMM diarization

2

Speaker diarization based on IB

3

Information Bottleneck features

4

Experimental setup and results

5

Conclusions

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker diarization in an HMM/GMM system

Short time spectral features (MFCC) as input

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker diarization in an HMM/GMM system

Short time spectral features (MFCC) as input Speech/Non-speech detection

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker diarization in an HMM/GMM system

Short time spectral features (MFCC) as input Speech/Non-speech detection Uniform segmentation/Speaker change detection

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker diarization in an HMM/GMM system

Short time spectral features (MFCC) as input Speech/Non-speech detection Uniform segmentation/Speaker change detection Agglomerative Clustering using HMM/GMM speaker models with minimum duration

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker diarization in an HMM/GMM system

Short time spectral features (MFCC) as input Speech/Non-speech detection Uniform segmentation/Speaker change detection Agglomerative Clustering using HMM/GMM speaker models with minimum duration Nearest clusters according to a distance measure are merged

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker diarization in an HMM/GMM system

Short time spectral features (MFCC) as input Speech/Non-speech detection Uniform segmentation/Speaker change detection Agglomerative Clustering using HMM/GMM speaker models with minimum duration Nearest clusters according to a distance measure are merged

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker diarization in an HMM/GMM system

Short time spectral features (MFCC) as input Speech/Non-speech detection Uniform segmentation/Speaker change detection Agglomerative Clustering using HMM/GMM speaker models with minimum duration Nearest clusters according to a distance measure are merged Viterbi realignment to smooth cluster boundaries August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Speaker diarization in an HMM/GMM system

Short time spectral features (MFCC) as input Speech/Non-speech detection Uniform segmentation/Speaker change detection Agglomerative Clustering using HMM/GMM speaker models with minimum duration Nearest clusters according to a distance measure are merged Viterbi realignment to smooth cluster boundaries Iterates until a stopping criterion is satisfied August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Outline of the talk

1

State-of-the art HMM/GMM diarization

2

Speaker diarization based on IB

3

Information Bottleneck features

4

Experimental setup and results

5

Conclusions

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


IB Objective function

Consider a set of input variables X and associated relevance variables Y . The clustering representation C : maximizes the mutual information with respect to Y i.e., maximizes I (Y , C ) is compact i.e., minimize I (C , X )

Maximize F = I (Y , C ) − βI (C , X )

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


IB Objective function

Consider a set of input variables X and associated relevance variables Y . The clustering representation C : maximizes the mutual information with respect to Y i.e., maximizes I (Y , C ) is compact i.e., minimize I (C , X )

Maximize F = I (Y , C ) − βI (C , X ) The solution is obtained through: Agglomerative clustering

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Agglomerative IB (AIB)

Estimate P(Y |X )

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Agglomerative IB (AIB)

Estimate P(Y |X ) Initialization with every element of X as a singleton cluster

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Agglomerative IB (AIB)

Estimate P(Y |X ) Initialization with every element of X as a singleton cluster Two clusters (ci , cj ) that result in the minimum loss of IB function are merged

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Agglomerative IB (AIB)

Estimate P(Y |X ) Initialization with every element of X as a singleton cluster Two clusters (ci , cj ) that result in the minimum loss of IB function are merged The loss can be obtained in closed form (JS divergence) The relevance variable distributions p(Y |ci ), p(Y |cj ) are averaged to give p(Y |cnew ). August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Agglomerative IB (AIB)

Estimate P(Y |X ) Initialization with every element of X as a singleton cluster Two clusters (ci , cj ) that result in the minimum loss of IB function are merged The loss can be obtained in closed form (JS divergence) The relevance variable distributions p(Y |ci ), p(Y |cj ) are averaged to give p(Y |cnew ). The merging continues until model selection criterion is met. August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Comparison with HMM/GMM Clustering

Modeling Distance Output

HMM/GMM a separate GMM for each speaker c Modified BIC mapping X → C

IB relevance variables Y from a background GMM JS divergence mapping X → C and p(Y |C )

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Outline of the talk

1

State-of-the art HMM/GMM diarization

2

Speaker diarization based on IB

3

Information Bottleneck features

4

Experimental setup and results

5

Conclusions

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Information Bottleneck features c1

c2

x1

x2

c2

c1

c3

c3

yL

yl

y3 y2 y1 x3

x4

x5

x6

IB diarization output

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Information Bottleneck features c1

c2

c2

c1

c3

c3

yL

yl

y3 y2 y1 st1 x1

st2 x2

st3 x3

st4 x4

st5 x5

st6 x6

IB diarization output The frames corresponding to segment xj are represented as stj

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Information Bottleneck features c1

c2

c2

c1

c3

c3

yL

yl

y3 y2 y1 st1 x1

st2 x2

st3 x3

st4 x4

st5 x5

st6 x6

IB diarization output The frames corresponding to segment xj are represented as stj F = [p(Y |s11 ), . . . , p(Y |stj ), . . . , p(Y |sTN )], t = 1, . . . , T .

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Information Bottleneck features c1

c2

c2

c1

c3

c3

yL

yl

y3 y2 y1 st1 x1

st2 x2

st3 x3

st4 x4

st5 x5

st6 x6

IB diarization output The frames corresponding to segment xj are represented as stj F = [p(Y |s11 ), . . . , p(Y |stj ), . . . , p(Y |sTN )], t = 1, . . . , T . TANDEM processing can be applied on F The probabilities p(Y |stj ) are gaussianized by applying a logarithm. PCA is applied to de-correlate and reduce the dimensionality.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Information Bottleneck features c1

c2

c2

c1

c3

c3

yL

yl

y3 y2 y1 st1 x1

st2 x2

st3 x3

st4 x4

st5 x5

st6 x6

IB diarization output The frames corresponding to segment xj are represented as stj F = [p(Y |s11 ), . . . , p(Y |stj ), . . . , p(Y |sTN )], t = 1, . . . , T . TANDEM processing can be applied on F The probabilities p(Y |stj ) are gaussianized by applying a logarithm. PCA is applied to de-correlate and reduce the dimensionality.

The resulting matrix FIB is referred as Information Bottleneck features. August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Integration of MFCC and IB features Meeting Feature Extraction (MFCC) recording

MFCC

IB diarization

p(Y|C)

diarization output

Transformation (log + PCA)

HMM/GMM diarization

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Integration of MFCC and IB features Meeting Feature Extraction (MFCC) recording

MFCC

IB diarization

p(Y|C)

diarization output

Transformation (log + PCA)

HMM/GMM diarization

The integration can happen in two ways: Concatenating IB features to MFCC feature vectors (IB aug).

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Integration of MFCC and IB features Meeting Feature Extraction (MFCC) recording

MFCC

IB diarization

p(Y|C)

diarization output

Transformation (log + PCA)

HMM/GMM diarization

The integration can happen in two ways: Concatenating IB features to MFCC feature vectors (IB aug). Multistream modelling (IB multistr), where clustering is based on combined likelihood given by wmfcc log bcmfcc + wFIB log bcFIB . mfcc Where bc and bcFIB are GMMs trained on MFCC and FIB features and (wmfcc , wFIB ) are the combination weights.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Outline of the talk

1

State-of-the art HMM/GMM diarization

2

Speaker diarization based on IB

3

Information Bottleneck features

4

Experimental setup and results

5

Conclusions

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Experiments and Results

Test dataset: 24 meetings from NIST RT06/RT07/RT09 evaluation datasets. 19 MFCC features from beamformed audio are extracted.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Experiments and Results

Test dataset: 24 meetings from NIST RT06/RT07/RT09 evaluation datasets. 19 MFCC features from beamformed audio are extracted.

Speech/Non-speech detection is based on AMIDA system. Speech/Non-speech Error meeting ALL

Miss 7.3

FA 0.4

SpNsp 7.7

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Experiments and Results

Test dataset: 24 meetings from NIST RT06/RT07/RT09 evaluation datasets. 19 MFCC features from beamformed audio are extracted.

Speech/Non-speech detection is based on AMIDA system. Speech/Non-speech Error meeting ALL

Miss 7.3

FA 0.4

SpNsp 7.7

Tuning using a separate development set Optimal number of PCA components: 2 (covering more than 80% of PCA variance). (wmfcc , wFIB ) = (0.9, 0.1).

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Experiments and Results Diarization Error Rate(DER); sum of speech/non-speech error and speaker error.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Experiments and Results Diarization Error Rate(DER); sum of speech/non-speech error and speaker error. Speaker Error Baseline 12.0(-)

IB aug 13.5 (-12.5%)

IB multistr 9.7(+19%)

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Experiments and Results Diarization Error Rate(DER); sum of speech/non-speech error and speaker error. Speaker Error Baseline 12.0(-)

30

IB aug 13.5 (-12.5%)

IB multistr 9.7(+19%) Baseline IB_aug IB_multistr

−Speaker Error−−>

25

20

15

10

5

0 0 0 0 0 0 323 515 347 405 501 955 1130 500 000 400 400 090 090 103 153 051 900 500 500 000 500 1600 1000 4−0932 1 4−1 6−1 1−1 7−1 7−0 −1 −1 −1 −1 3− − 102 110 −1 − 12− 14− 15− 15− 6−1 8−0 3−1 4−1 8−1 8−1 10 021 020 022 030 110 408 425 623 027 128 129 509 509 611 611 021 021 111 111 112 112 200 200 200 200 2005 2005 2006 2006 2007 2007 0090 0090 2005 2005 2005 2006 2008 2008 2008 200420050200502005020051 VT VT VT CMU CMU CMU CMU EDI EDI EDI EDI EDI EDI IDI 2 IDI 2 NIST NIST NIST NIST NIST NIST NIST TNO VT

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Outline of the talk

1

State-of-the art HMM/GMM diarization

2

Speaker diarization based on IB

3

Information Bottleneck features

4

Experimental setup and results

5

Conclusions

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Conclusions

The paper proposes an effective method of combination of diarization systems using features.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Conclusions

The paper proposes an effective method of combination of diarization systems using features. The proposed combination method does not make any modifications to original systems.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Conclusions

The paper proposes an effective method of combination of diarization systems using features. The proposed combination method does not make any modifications to original systems. Two combination strategies were investigated with MFCC features.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Conclusions

The paper proposes an effective method of combination of diarization systems using features. The proposed combination method does not make any modifications to original systems. Two combination strategies were investigated with MFCC features. Evaluation results showed that multistream combination decreases the speaker error whereas simple augmentation increases the error.

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Thank You Questions?

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Agglomerative IB (AIB) Input: Distribution p(y |x) Trade-off parameter β

Output:

The loss in merging two clusters ci , cj : (p(ci ) + p(cj ))JS[p(Y |ci ), p(Y |cj )]

Cm : m-partition of X , m ≤ |X |

Initialization: C ≡X

Main Loop: While |C | > 1 {i, j} = arg mini ′ ,j ′ ∆F (ci , cj ) Merge {ci , cj } ⇒ cr in C

Model selection based on information theoretic criterion Minimum Description Length (MDL) Normalized Mutual Information (NMI)

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


IB principle applied to diarization

Input X → Fixed length segments of speech Relevance PLVariables Y → components of a background GMM f (s) = j=1 wj N (s, µj , Σj ) Relevance variable distribution p(y |x) estimated from: p(yi |sk ) = PLwi N (sk ,µi ,Σi ) ; i = 1, . . . , L j=1

wj N (sk ,µj ,Σj )

Output of IB diarization Hard partition of X into C clusters (p(ci |xj ) ∈ {0, 1}) p(Y |ci ); i = 1, . . . , |C |

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Evaluation Diarization Error(DER) is used as the metric for diarization P ref all seg {dur (seg )[max(N P

DER =

allseg

(seg ),Nsys (seg ))−Ncorrect (seg )]} dur (seg )Nref (seg )

Speech/no-speech error and speaker error

L1

L2

S1

S3

T1

T2

T3

no speech

L3

no speech T4

T5

L1/L3

S2 T6

T7

L1

S1 T8

T9

T10

Mapping S1 → L1, S3 → L2, S2 → L3 DER =

T 2+T 4+T 6+T 8+T 9 T 1+T 2+T 3+T 4+T 7+2∗T 8+2∗T 9+T 10

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


GCC-PHAT

Compute the TDOA between two channels si [n] and sj [n] The Generalized Cross-Correlation PHAse Transform is defined as: Si (f )S ∗ (f ) GPHAT (f ) = |Si (f )||Sj j (f ) | The TDOA of si w.r.t. sj is estimated as dPHAT (i , j) = arg maxd RPHAT (d) where RPHAT (d) is the inverse fourier transform of GPHAT (f )

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


IB objective function

Minimize I (C , X ) − βI (Y , C ) X p(c|x) I (C , X ) = p(x)p(c|x)log p(c)

(1)

x∈X ,c∈C

I (Y , C ) =

X

y ∈Y ,c∈C

p(c)p(y |c)log

p(y |c) p(y )

 p(c)   p(c|x) = Z (β,x) exp(−β · KL[p(y |x)||p(y |c)]) P p(y |c) = p(y |x)p(c|x) p(x) p(c)  Px  p(c|x)p(x) p(c) = x

(2)

(3)

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Normalized Mutual Information

1

NMI =

I (Y ,C ) I (Y ,X )

Monotonic function of number of clusters

0.8 −Normalized Mutual information−−>

Represents the mutual information preservedI(Y,C) by the clustering representation as fraction of initial value I(Y,X)

0.9

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

100

200

300 −Number of clusters−−>

400

500

600

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Minimum Description Length

Minimize the coding length of the representation FMDL = L(m) + L(X |m) N = N log + N[H(Y |C ) + H(C )] M

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


KL Realignment The initial segmentation is obtained from AIB clustering

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


KL Realignment The initial segmentation is obtained from AIB clustering P(Y |C ) is estimated from the segmentation

P(yj |ci ) =

X 1 p(yj |xt )p(xt ) p(ci ) x :x ∈c t

t

i

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


KL Realignment The initial segmentation is obtained from AIB clustering P(Y |C ) is estimated from the segmentation

P(yj |ci ) =

X 1 p(yj |xt )p(xt ) p(ci ) x :x ∈c t

t

i

Best segmentation is obtained from Viterbi segmentation copt = arg min c

X

KL[p(Y |xt )||p(Y |ct )]−log(act ct+1 )

t

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Objective function – Realignment Consider I (X , Y ) − I (C , Y ) X p(y , c) p(x, y ) − p(y , c) log p(x)p(y ) p(y )p(c) y ,c

=

X

p(x, y ) log

=

X

p(x, y , c) log

X

p(y |x)p(c|x)p(x) log

=

X

p(x)

X

p(c|x)

=

X

p(x)

X

p(c|x)KL (p(Y |x)||p(Y |c))

X

p(x, c)KL (p(Y |x)||p(Y |c))

x,y

x,y ,c

=

p(x, y )p(c) p(y , c)p(x)

x,y ,c

x

x

=

c

X

p(y |x) p(y |c)

p(y |x) log

y

p(y |x) p(y |c)

c

(4)

x,c

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Sequential Clustering

AIB is a greedy algorithm Sequential Information Bottleneck(SIB) refines the objective function in a given partition 1

2

3

Sample current partition randomly and select a sample and is represented as a separate cluster This singleton cluster is merged with a new cluster that results in minimum loss of mutual information Step 1,2 are repeated for all samples till convergence

We use SIB to refine the output produced with AIB

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


SIB Results

MFCC MFCC+TDOA 4 feature

AIB 17.1 9.9 6.7

AIB +SIB 16.6 8.6 6.0

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


Viterbi Realignment

MFCC MFCC+TDOA 4 feat

Before realign. 24.7 11.6 8.3

After realign. 19.1 9.9 6.7

August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


JS Divergence I (C , Y ) =

XX c

p(c)p(y |c) log

y

p(y |c) p(y )

∆ICY = I (C b , Y ) − I (C a , Y ) Let ci and cj are merged together to obtain c¯ X X p(y |cj ) p(y |ci ) + p(cj ) p(y |cj ) log ∆ICY = p(ci ) p(y |ci ) log p(y ) p(y ) y y −p(¯ c)

X

p(y |¯ c ) log

y

p(y |¯ c) p(y )

p(¯ c ) = p(ci ) + p(cj ) p(y |¯ c )p(¯ c ) = p(y |ci )p(ci ) + p(Y |cj )p(cj ) ∆ICY = p(¯ c )JSΠ [p(y |ci )||p(y |cj )] August 31, Interspeech 2011, Florence, Italy Sree Harsha Yella, Fabio Valente (Idiap Research Feature Institute) level combination of diarization systems / 33


talk-is2011