?> X OC + - * P 3.
> OC , >3. C&^`_!a4 '>) X X . C , > C , >3. C67 '> ) . C - ^`_!a , > X C Z : 4 , /> . C < ' >/. C37
Again in the E step we have to consider two kinds of latent variables, and that respectively designate the cluster (i.e. the speaker) and the gaussian component:
where means average w.r.t. . Note that & is always understood to be conditioned on . It can be shown that when the penalty term reduce to ?!"# where is the number of parameters i.e. the free energy becomes the Bayesian Information Criterion (BIC). To find the optimum & and & c an EMlike algorithm is proposed in  based on the following steps:
^`_!a2'>) X C Z &Jc ^`_!a2'>) X . C &J /
Iteratively applying eq.(15) and eq.(16) it is possible to estimate variational posteriors for parameters and hidden variables. If belongs to a conjugate family, posterior distribution & will have the same form as . An interesting property of VB learning is that extra degrees of freedom are not used i.e. the model prunes itself. There are two possible opinions about the correctness of the model self pruning: on the one hand it is not satisfactory because prediction will not take into account uncertainty that models with extra parameters can provide (see ), on the other hand it can be used to find the optimal model while learning the model itself, initializing it with a lot of parameters and letting the model prune parameters that are not used.
J ' * ,+- ! "#d A /. 4 A X A Q 0.1 ! "# ' ' 2 3 &
4. Speaker clustering using VB In this section we derive formulas that can be used to estimate parameters in model (4). Before applying the EM-like algorithm previously described, we have to define prior probabilities on parameters. So let us define the following probabilities that belong to the conjugate family.
6 3:J K [4 57 3:J U 8B5 9B57 "! 9B57 9B57 $# % '& (17) where 3: designates a Dirichlet distribution, a Nor-
mal distribution and # a Wishart distribution. The advantage in using probability functions that belong to the conjugate family is that posterior probability will have the same analytical form as priors. So let us introduce the parameters posterior probabilities.
3 :J K 4 57 3 :J U 5 8 5 9 57 57 "! 75 9 57 9 57 (# % 5 )& 57
Figure 1 shows a direct graph that represent the model. It is now possible to apply the EM-like algorithms that consists in iteratively applying equations (15) and (16).
Developing 19, it is possible to derive formulas similar to formulas (5) and(6) but computed on the base of parameter expected values instead of parameter values. We and 5 4 . will designate them with 5 4
A Q H A H G B 5& C > T 6 A Q H A H G 6 7 8
AH G 6 A H G 6 7
2A A Q -65 A -J
T6 6 TI 9;: -J
a GU Z = G < ' M > T \ < ' M ?A @ T G \ . >&@ A Q ED A Q C Z G >%@ A QED A Q C ? T bGF 6 T 7 A H T A HG Q , > A Q - 5 A -) C - 6 7 A Q H A H G T FT 6 6 P K GQ NHJI RHJI 7 A Q H A HG 6 7 A H G , > A -) C -
G 6 7 A HG
where # is the dimension of acoustic vectors. Parameters expected values can be computed as follows:
`^ _!aH6 G KT ^`_!a 6 G UT ^`_!a Z 6 G T ZF G
-JI L> KNM G C I > F G KNM G C T T -JI L> KNO G C I > F G KNO G C T T -QFT HJPT I I > > @ G 9 . 5 C ? C `^ _!a R G 9 a ^`_Da ? T - @ G RTS G I
where U is the digamma function. In the M step, we know that posterior distributions will have the same form of prior distributions. Reestimation formulas for parameters are given by:
bA HJI A HG (28) T Ab HJI QN HJI 6 A HG 6 A Q H A H G (29)
A HJI HJI 6 A HG Q T Ab HJI b QN HJI 6 N A HG 6 A Q H T A H G @ A Q (30)
A HJI HJI 6 A H G 6 A H A H G bA HJI HJQN I 6 A HG 6 A H Q T A H G >%@BA Q Y T G C &> @BA Q Y T G C b QN A HJI Q Q HJI 6 A H G 6 A H T A H G b Q b N (31)
K TG U TG Y GT Z G-
and hyperparameter reestimation formulas are given by:
G T K O G T D G T X G T R G T
F b WLG 9EKNMV A HdT I W T G 9WT K O)V W G Y T G 9 9X V V D V TW G D W T G 9YT XAV T T T XAV > Y G Z V C > Y G D V C T W G D 9 W TG Z G b W G 9 DV V 9 W G @ : \[ -/. [ ' -/. 5 4 A -J 5 4 A Q -65 2A -J
5 [ -/. 4 A -J . @
(32) (33) (34) (35)
9 R V