HAL Id: hal-00416675
https://hal.archives-ouvertes.fr/hal-00416675
Submitted on 29 Oct 2014HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Organizing Gaussian mixture models into a tree for
scaling up speaker retrieval
Jamal Rougui, Marc Gelgon, D. Aboutajdine, Noureddine Mouaddib, M.
Rziza
To cite this version:
Jamal Rougui, Marc Gelgon, D. Aboutajdine, Noureddine Mouaddib, M. Rziza. Organizing Gaussian mixture models into a tree for scaling up speaker retrieval. Pattern Recognition Letters, Elsevier, 2007, 28 (11), pp.1314-1319. �hal-00416675�
a,b a b a b
a
S1, . . . , SM
< O(M)
M M k Sk(x) = mk ! i=1 wikNki(x) Ni k(x) µik Σik wi k D M M S1 S2 S12
p(D|S12) p(D|S1) p(D|S2) S12 p(D|S12) p(D|S1) p(D|S2) m12 S12 m1+ m2 S1 S2 S12 ESk[ ln p(D|Sk) ] − ESk[ ln p(D|S12) ] , k = 1, 2 " S12 " S12 = arg min S # − $ S1(x) ln S12(x) dx − $ S2(x) ln S12(x) dx % S KL(S1+2∥S12) S1+2(x) 12(S1(x) + S2(x)) " S12 = arg minS & − $ S1+2(x) ln S12(x) S1+2(x) dx ' " S12 = arg minS & − m1!+m2 i w1+2i $ N1+2i (x) ln S12(x) dx ' S12 Ni 1+2
KLm KL S1+2 S12 " S12 = arg minS [KLm(S1+2∥S12)] = arg min S &m1+m2 ! i=1 wi1+2 minm12 j=1 KL(N i 1+2∥N j 12) ' KLm(Sk∥S12) = mk ! i=1 wki minm12 j=1 KL(N i k∥N j 12), k = 1, 2 (µ1, Σ1) (µ2, Σ2) 1 2(log |Σ2| |Σ1|+ T r(Σ −1 2 Σ1) + (µ1− µ2)TΣ−21(µ1− µ2) − δ) δ π m1 + m2 S1 m12 (< m1+ m2) S12 S1+2 S12 KLm S m1+ m2 m12 S1+2 S12 π0
π0 M× M S1 S2 KLm(S1||S2) + KLm(S2||S1) S1+2 S12 log2(M) KLm
S12 ˆ π0 it= 0 S12 ˆ πit " S12 it = arg min S12∈S m12 KLm(S1+2, S12,ˆπit) Sm12 m12 Mc S12 j ˆ w12j = ! i∈π−1(j) w1+2i ˆ µj12 = ( i∈π−1(j)w i 1+2µi1+2 ˆ wj12 ˆ Σj12 = ( i∈π−1(j)w i 1+2(Σi1+2+ (µi1+2− ˆµ j 12)(µi1+2− ˆµjr)T) ˆ w12j π−1 (j) ˆπ−1,it (j) S1+2 j S12 ) S12it πit+1 {1, . . . , m1+ m2} {1, . . . , m12} S1+2 S)12it ˆ
πit+1 = arg min
π KLm(S1+2, " S12, π) i S1+2 j S)it 12
πit+1(i) = arg min
j KL(N i 1+2||N j 12) πit+1= πit
Bin. tree to N−Tree transformation
Binary tree of the GMMs speaker using similarity criterion
Grouping a GMMs Speaker according to bin. tree map Tree level ! Sp ∈ parents ! Sc ∈ Sp KLm(Sp∥Sc) KLm(Sparent∥Schild)
Sp {S1, S2, . . .} Sp log p(D|Sp) ≈ log p(D|Sc) log p(D|Sp) k log p(D|Sk) log p(D|Sp) log p(D|S˜ k) ≈ log p(D|Sp) + KL(Sp||Sk) , k= 1, 2, . . . KL(Sp||Sk) KL(Sp||Sk) KLm log p(D|S˜ k),
KLm
KLm KLm
M inERR = min
k KL(Sp||Sk),
[log p(D|Sp) + MinERR
log p(D|Sp) + MaxERR]
• • KLm KLm KLm KLm KLm KLm