HAL Id: hal-01722007
https://hal.sorbonne-universite.fr/hal-01722007
Submitted on 2 Mar 2018
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Sparse Coding of Pitch Contours with Deep Auto-Encoders
Nicolas Obin, Julie Beliao
To cite this version:
Nicolas Obin, Julie Beliao. Sparse Coding of Pitch Contours with Deep Auto-Encoders. Speech
Prosody, Mar 2018, Poznan, Poland. �hal-01722007�
Nicolas Obin 1 , Julie Beli˜ao 2
1 IRCAM, CNRS, Sorbonne Universit´e, Paris, France
2 Unbabel, Lisbon, Portugal
Abstract
This paper presents a sparse coding algorithm based on deep auto-encoders for the stylization and the clustering of pitch contours. The main objective of the proposed algorithm is to learn a set of pitch templates that can be easily interpreted by humans and whose combination can approximate efficiently the observed pitch contours. The proposed learning architec- ture is based on deep auto-encoders, commonly used to learn non-linear and low-dimensional latent representations that ap- proximate the observed data. The proposed deep architecture is based on stacked auto-encoders and the sparsity of the net- work is investigated in order to learn a more robust and ge- neral representation of the pitch contours (dropout, denoising auto-encoder, sparsity regularization). The deep auto-encoding of the pitch contours is illustrated and discussed on the TIMIT American-English speech database † with comparison of other existing stylization and clustering algorithms.
Index Terms: speech prosody, pitch contour, sparse coding, deep auto-encoders
1. Introduction
The representation of pitch contours is a fundamental task in the modeling of speech prosody, from the construction of linguistic models for the understanding of speech prosody and its interaction with other linguistic domains to the learning of generative models for text-to-speech synthesis applications [1].
The computation of such representations can be decomposed into: the estimation of the fundamental frequency (F 0 ) (alter- natively, “pitch”) of the speech signal and its segmentation into speech units, the stylization of the pitch variations by removing their undesired variability, and its encoding into meaningful labels [2, 3] or templates [4, 5, 6] that can be interpreted by humans and learned by machines.
On the one side, stylization of pitch contours is generally achieved by preserving the desired variability of the pitch variations, generally described as their slow time variations.
This can be simply done by low-pass filtering the pitch variations [2] but more generally by decomposing the pitch variations onto a set of previously defined bases with some desired properties: typically smoothed or slowly time-varying bases such as polynomial [7, 8, 9], cosinusoidal [10], and B-splines [11]. Mathematically speaking, this decomposition is equivalent to a dimensionality reduction of the observed pitch contours since their are represented into the space defined by the bases, whose number is smaller than the number of values of the pitch contours. Besides, this decomposition may in turn be interpreted as pitch templates from which any observed pitch contours can be reconstructed by linear combination of
†. The resource produced in this paper is freely available on : https://github.com/nicolasobin/.
the bases. The obtained stylization can serve as a “denoising”
prior to some advanced encoding of the pitch contours [2]
or can be used directly as encoding [12, 13]. On the other side, clustering has been used for the prototyping and the visualization of pitch contours. State-of-the-art pitch contour clustering algorithms include: k-means and weighted k-means [14, 15], self-organizing maps (SOM, [16, 17]), functional principal component analysis (FPCA, [18]), and recurrent neural networks (RNN, [5]). Clustering can be seen as a dimensionality reduction technique, in which the bases used for decomposition are no longer fixed a priori but learned from the data. The reduction is achieved by assigning each pitch contour to a cluster, and reducing its variability to the centroid of this cluster. The centroids are representative of the cluster and assumed to encode the meaningful pitch information, and will further be referred to as pitch templates. One important assumption about these clustering techniques is that it is usually a hard clustering: each data is associated to one-and-only-one centroid (k-means, SOM), while the observed pitch contours may actually be the result of a combination of some latent and underlying pitch templates.
This paper establishes a sparse coding algorithm based on
deep neural network (DNN) for the modeling and the repre-
sentation of pitch contours. The main originality of the propo-
sed approach is to learn the bases used for the decomposition
of pitch contours from the speech database, by assuming that
the observed pitch contours result from a (linear or non-linear)
combination of the underlying learned pitch templates. To do
so, the learning of the pitch templates is achieved by deep auto-
encoders, whose principle is to determine a latent representa-
tion of the pitch contours in a low-dimensionality space which
best approximates the observed pitch contour. A deep enco-
der is a deep neural network used for dimensionality reduction
to learn a latent representation of the data in a low dimension
space which best approximates the data. The deep auto-encoder
allows to learn non-linear relationships between the observed
data and the latent representation of the hidden layers. Deep
auto-encoders can be learned by pre-training using restricted
Boltzman machines (RBM-AE [19]) or stacked auto-encoders
(SAE, [20]). Besides, sparsity of the network is generally en-
couraged [21], specializing subparts of the network to some
specific aspects of the data. Sparse networks provides a better
interpretation of the network, compression of the data, and may
also be a way to prevent over-fitting and increase robustness to
noise. Sparsity can be obtained by sparsity regularization [22],
dropout [23], and denoising auto-encoders [24]. In this paper,
sparse auto-encoders will be used in order to improve the inter-
pretability of the pitch templates learned by the network.
The remaining of the paper is organized as follows: in Sec- tion 2 we describe the deep auto-encoder architectures used to encode, with a focus on the sparsity of the network. The pro- posed algorithms are illustrated and discussed in Section 3 on a task of pitch contour stylization and clustering from the TIMIT database [25].
2. Deep Coding of Pitch Contours
The proposed algorithm for the coding of pitch contours is based on deep auto-encoders, a family of unsupervised deep neural networks commonly used for the learning of latent repre- sentation of data in some low-dimensional space which concen- trates the meaningful information used to encode the data. This section presents the fundamentals of a deep auto-encoder and some of its variants, including sparsity regularization and de- noising auto-encoder.
2.1. Auto-encoder
auto-encoder W 1
W' 1
L AE ( x x , x x 0 0 )
q1q1q1q1q1q1q1q1q1q2q2q2q2q2q2q2q2q2q2q2q2q2q2q2q3q3q3q3q3q3q3q3 s1
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
s2s3 ot+1ot+2ot+d
...
ot+d-1116 C HAP T E R 8. D IS C R E T E M O D E LLI NG O F S P E E C H P R O S O D Y
8 .3 .1 Se g m e n ta l H M M s
S egm en tal HM M s [R u ss el an d M o or e, 1985, Le v in son , 1986, G al es an d Y ou n g, 1993, O st en d or f et al ., 1996] w er e in tr o d u ce d in sp ee ch re cogn it ion in w h ic h st at e se q u en ce s ar e ex p lic it ly re p re se n te d as se gm en ts w it h an ex p lic it m o d el lin g of th e se gm en t st at e- o cc u p an cy . S egm en tal HM M is a ge n er al iz at ion of h id d en M ar k ov m o d el (HM M ) th at ad d re ss es tw o p rin ci p al lim it at ion s of th e con v en tion al h id d en M ar k ov m o d el : 1) st at e d u rat ion m o d el lin g, an d 2) as su m p tion of con d it ion al in d ep en d en ce of th e ob se rv at ion se q u en ce gi v en th e st at e se q u en ce .
Le t d efi n e o =[ o 1 ,. .., o T ] an ob se rv at ion se q u en ce of le n gt h T , q =[ q 1 ,. ..q T ] th e as so ci at ed st at e seq u en ce, wh er e q t [1 ,N ], s =[ s 1 ,. ..s M ] th e as so ci at ed se gm en t se q u en ce of le n gt h M , an d d =[ d 1 ,. ..d M ] th e cor re sp on d in g se gm en t d u rat ion s.
A se gm en tal h id d en M ar k ov m o d el is d efi n ed in a sim ilar m an n er to th e con v en tion al h id d en M ar k ov m o d el w it h a re for m u lat ion of th e st at e se q u en ce in to se gm en ts an d th e ad d of an ex p lic it st at e d u rat ion :
=( , A , B , D ) (8. 19)
wh er e:
• is th e in it ial st at e p rob ab ili ty d is tr ib u tion : = { ⇥ i } N i =1
⇥ i = P ( s 1 = i ) i [1 ,N ] (8. 20)
• A is th e se gm en t tr an sit ion p rob ab ili ty d is tr ib u tion : A = { a i,j } N i,j =1
a i,j = P ( s n = j | s n 1 = i ) t [1 ,T ] (8. 21) i, j [1 ,N ]
• B is th e ou tp u t p rob ab ili ty d is tr ib u tion : B = { b i ( s n ) } N i =1
b i ( s n )= P ( o [ t ( n 1 )+1 : t ( n )] | s n = i ) t [1 ,T ] (8. 22) i [1 ,N ]
• D is th e se gm en t d u rat ion p rob ab ili ty d is tr ib u tion : D = { b i ( s n ) } N i =1
[
116 C HAP T E R 8. D IS C R E T E M O D E LLI NG O F S P E E C H P R O S O D Y 8 .3 .1 Se g m e n ta l H M M s S egm en tal HM M s [R u ss el an d M o or e, 1985, Le v in son , 1986, G al es an d Y ou n g, 1993, O st en d or f et al ., 1996] w er e in tr o d u ce d in sp ee ch re cogn it ion in w h ic h st at e se q u en ce s ar e ex p li ci tl y re p re se n te d as se gm en ts w it h an ex p li ci t m o d el li n g of th e se gm en t st at e- o cc u p an cy . S egm en tal HM M is a ge n er al iz at ion of h id d en M ar k ov m o d el (HM M ) th at ad d re ss es tw o p ri n ci p al li m it at ion s of th e con v en ti on al h id d en M ar k ov m o d el : 1) st at e d u rat ion m o d el li n g, an d 2) as su m p ti on of con d it ion al in d ep en d en ce of th e ob se rv at ion se q u en ce gi v en th e st at e se q u en ce . Le t d efi n e o =[ o 1 ,. .., o T ] an ob se rv at ion se q u en ce of le n gt h T , q =[ q 1 ,. .. q T ] th e as so ci at ed st at e seq u en ce, wh er e q t [1 ,N ], s =[ s 1 ,. .. s M ] th e as so ci at ed se gm en t se q u en ce of le n gt h M , an d d =[ d 1 ,. .. d M ] th e cor re sp on d in g se gm en t d u rat ion s. A se gm en tal h id d en M ar k ov m o d el is d efi n ed in a si m il ar m an n er to th e con v en ti on al h id d en M ar k ov m o d el w it h a re for m u lat ion of th e st at e se q u en ce in to se gm en ts an d th e ad d of an ex p li ci t st at e d u rat ion : =( , A , B , D ) (8. 19) wh er e: • is th e in it ial st at e p rob ab il it y d is tr ib u ti on : = { ⇥ i } N i =1 ⇥ i = P ( s 1 = i ) i [1 ,N ] (8. 20) • A is th e se gm en t tr an si ti on p rob ab il it y d is tr ib u ti on : A = { a i,j } N i,j =1 a i,j = P ( s n = j | s n 1 = i ) t [1 ,T ] (8. 21) i, j [1 ,N ] • B is th e ou tp u t p rob ab il it y d is tr ib u ti on : B = { b i ( s n ) } N i =1 b i ( s n )= P ( o [ t ( n 1 )+1 : t ( n )] | s n = i ) t [1 ,T ] (8. 22) i [1 ,N ] • D is th e se gm en t d u rat ion p rob ab il it y d is tr ib u ti on : D = { b i ( s n ) } N i =1 [
d
q1q1q1q1q1q1q1q1q1q2q2q2q2q2q2q2q2q2q2q2q2q2q2q2q3q3q3q3q3q3q3q3 s1
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
s2s3 q1q1q1q1q1q1q1q1q1q2q2q2q2q2q2q2q2q2q2q2q2q2q2q2q3q3q3q3q3q3q3q3 s1
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
s2s3 q1q1q1q1q1q1q1q1q1q2q2q2q2q2q2q2q2q2q2q2q2q2q2q2q3q3q3q3q3q3q3q3 s1
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
s2s3 116CHAPTER8.DISCRETEMODELLINGOFSPEECHPROSODY
8.3.1SegmentalHMMs
SegmentalHMMs[RusselandMoore,1985,Levinson,1986,GalesandYoung,1993,Ostendorfetal.,1996]wereintroducedinspeechrecognitioninwhichstatesequencesareexplicitlyrepresentedassegmentswithanexplicitmodellingofthesegmentstate-occupancy.SegmentalHMMisageneralizationofhiddenMarkovmodel(HMM)thataddressestwoprincipallimitationsoftheconventionalhiddenMarkovmodel:1)statedurationmodelling,and2)assumptionofconditionalindependenceoftheobservationsequencegiventhestatesequence.
Letdefineo=[o1,...,oT]anobservationsequenceoflengthT,q=[q1,...qT]theassociatedstatesequence,whereqt[1,N],s=[s1,...sM]theassociatedsegmentsequenceoflengthM,andd=[d1,...dM]thecorrespondingsegmentdurations.
AsegmentalhiddenMarkovmodelisdefinedinasimilarmannertotheconventionalhiddenMarkovmodelwithareformulationofthestatesequenceintosegmentsandtheaddofanexplicitstateduration:
=(,A,B,D)(8.19)
where:•istheinitialstateprobabilitydistribution:={⇥i}Ni=1
⇥i=P(s1=i)i[1,N](8.20)
•Aisthesegmenttransitionprobabilitydistribution:A={ai,j}Ni,j=1
ai,j=P(sn=j|sn1=i)t[1,T](8.21)
i,j[1,N]
•Bistheoutputprobabilitydistribution:B={bi(sn)}Ni=1
bi(sn)=P(o[t(n1)+1:t(n)]|sn=i)t[1,T](8.22)
i[1,N]
•Disthesegmentdurationprobabilitydistribution:D={bi(sn)}Ni=1
P(o[t+1:t+d]|s,d)to
(8.24)
P(d|s)(8.25)
[ 116CHAPTER8.DISCRETEMODELLINGOFSPEECHPROSODY
8.3.1SegmentalHMMs
SegmentalHMMs[RusselandMoore,1985,Levinson,1986,GalesandYoung,1993,Ostendorfetal.,1996]wereintroducedinspeechrecognitioninwhichstatesequencesareexplicitlyrepresentedassegmentswithanexplicitmodellingofthesegmentstate-occupancy.SegmentalHMMisageneralizationofhiddenMarkovmodel(HMM)thataddressestwoprincipallimitationsoftheconventionalhiddenMarkovmodel:1)statedurationmodelling,and2)assumptionofconditionalindependenceoftheobservationsequencegiventhestatesequence.
Letdefineo=[o1,...,oT]anobservationsequenceoflengthT,q=[q1,...qT]theassociatedstatesequence,whereqt[1,N],s=[s1,...sM]theassociatedsegmentsequenceoflengthM,andd=[d1,...dM]thecorrespondingsegmentdurations.
AsegmentalhiddenMarkovmodelisdefinedinasimilarmannertotheconventionalhiddenMarkovmodelwithareformulationofthestatesequenceintosegmentsandtheaddofanexplicitstateduration:
=(,A,B,D)(8.19)
where:•istheinitialstateprobabilitydistribution:={⇥i}Ni=1
⇥i=P(s1=i)i[1,N](8.20)
•Aisthesegmenttransitionprobabilitydistribution:A={ai,j}Ni,j=1
ai,j=P(sn=j|sn1=i)t[1,T](8.21)
i,j[1,N]
•Bistheoutputprobabilitydistribution:B={bi(sn)}Ni=1
bi(sn)=P(o[t(n1)+1:t(n)]|sn=i)t[1,T](8.22)
i[1,N]
•Disthesegmentdurationprobabilitydistribution:D={bi(sn)}Ni=1
P ( o
[t+1:t+d]|s , d) to
( 8 .2 4 )
P ( d |s) ( 8 .2 5 )
[
x 0
x z
denoising auto-encoder
W 1 W' 1
L DAE ( x x , x x 0 0 )
q1q1q1q1q1q1q1q1q1q2q2q2q2q2q2q2q2q2q2q2q2q2q2q2q3q3q3q3q3q3q3q3 s1
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
s2s3 ot+1ot+2ot+d
...
ot+d-1116 C HAP T E R 8. D IS C R E T E M O D E LLI NG O F S P E E C H P R O S O D Y
8 .3 .1 Se g m e n t a l H M M s
S e gm e n t al HM M s [R u s s e l an d M o or e , 1985, Le v in s on , 1986, G al e s an d Y ou n g, 1993, O s t e n d or f e t al ., 1996] w e r e in t r o d u c e d in s p e e c h r e c ogn it ion in w h ic h s t at e s e q u e n c e s ar e e x p li c it ly r e p r e s e n t e d as s e gm en ts w it h an e x p li c it m o d e ll in g of t h e s e gm e n t s t at e -o c c u p an c y . S e gm e n t al HM M is a ge n e r al iz at ion of h id d e n M ar k o v m o d e l ( HM M ) t h at ad d r e s s e s tw o p r in c ip al li m it at ion s of t h e c on v e n t ion al h id d e n M ar k o v m o d e l: 1) s t at e d u r at ion m o d e ll in g, an d 2) as s u m p t ion of c on d it ion al in d e p e n d e n c e of t h e ob s e r v at ion s e q u e n c e gi v e n t h e s t at e s e q u e n c e .
Le t d e fi n e o =[ o 1 ,. .., o T ] an ob s e r v at ion s e q u e n c e of le n gt h T , q =[ q 1 ,. .. q T ] t h e as s o c iat e d s t at e seq u en ce, wh er e q t [1 ,N ], s =[ s 1 ,. .. s M ] t h e as s o c iat e d s e gm e n t s e q u e n c e of le n gt h M , an d d =[ d 1 ,. .. d M ] t h e c or r e s p on d in g s e gm e n t d u r at ion s .
A s e gm e n t al h id d e n M ar k o v m o d e l is d e fi n e d in a s im il ar m an n e r t o t h e c on v e n t ion al h id d e n M ar k o v m o d e l w it h a r e for m u lat ion of t h e s t at e s e q u e n c e in t o s e gm e n t s an d t h e ad d of an e x p li c it s t at e d u r at ion :
=( , A , B , D ) ( 8. 19)
wh er e:
• is t h e in it ial s t at e p r ob ab il it y d is t r ib u t ion : = { ⇥ i } N i =1
⇥ i = P ( s 1 = i ) i [1 ,N ] ( 8. 20)
• A is t h e s e gm e n t t r an s it ion p r ob ab il it y d is t r ib u t ion : A = { a i,j } N i,j =1
a i,j = P ( s n = j | s n 1 = i ) t [1 ,T ] ( 8. 21) i, j [1 ,N ]
• B is t h e ou t p u t p r ob ab il it y d is t r ib u t ion : B = { b i ( s n ) } N i =1
b i ( s n )= P ( o [ t ( n 1 ) +1 : t ( n )] | s n = i ) t [1 ,T ] ( 8. 22) i [1 ,N ]
• D is t h e s e gm e n t d u r at ion p r ob ab il it y d is t r ib u t ion : D = { b i ( s n ) } N i =1
[
116 C HAP T E R 8. D IS C R E T E M O D E LLI NG O F S P E E C H P R O S O D Y 8 .3 .1 Se g m e n t a l H M M s S e gm e n t al HM M s [R u s s e l an d M o or e , 1985, Le v in s on , 1986, G al e s an d Y ou n g, 1993, O s t e n d or f e t al ., 1996] w e r e in t r o d u c e d in s p e e c h r e c ogn it ion in w h ic h s t at e s e q u e n c e s ar e e x p li c it ly r e p r e s e n t e d as s e gm en ts w it h an e x p li c it m o d e ll in g of t h e s e gm e n t s t at e -o c c u p an c y . S e gm e n t al HM M is a ge n e r al iz at ion of h id d e n M ar k o v m o d e l ( HM M ) t h at ad d r e s s e s tw o p r in c ip al li m it at ion s of t h e c on v e n t ion al h id d e n M ar k o v m o d e l: 1) s t at e d u r at ion m o d e ll in g, an d 2) as s u m p t ion of c on d it ion al in d e p e n d e n c e of t h e ob s e r v at ion s e q u e n c e gi v e n t h e s t at e s e q u e n c e . Le t d e fi n e o =[ o 1 ,. .., o T ] an ob s e r v at ion s e q u e n c e of le n gt h T , q =[ q 1 ,. .. q T ] t h e as s o c iat e d s t at e seq u en ce, wh er e q t [1 ,N ], s =[ s 1 ,. .. s M ] t h e as s o c iat e d s e gm e n t s e q u e n c e of le n gt h M , an d d =[ d 1 ,. .. d M ] t h e c or r e s p on d in g s e gm e n t d u r at ion s . A s e gm e n t al h id d e n M ar k o v m o d e l is d e fi n e d in a s im il ar m an n e r t o t h e c on v e n t ion al h id d e n M ar k o v m o d e l w it h a r e for m u lat ion of t h e s t at e s e q u e n c e in t o s e gm e n t s an d t h e ad d of an e x p li c it s t at e d u r at ion : =( , A , B , D ) ( 8. 19) wh er e: • is t h e in it ial s t at e p r ob ab il it y d is t r ib u t ion : = { ⇥ i } N i =1 ⇥ i = P ( s 1 = i ) i [1 ,N ] ( 8. 20) • A is t h e s e gm e n t t r an s it ion p r ob ab il it y d is t r ib u t ion : A = { a i,j } N i,j =1 a i,j = P ( s n = j | s n 1 = i ) t [1 ,T ] ( 8. 21) i, j [1 ,N ] • B is t h e ou t p u t p r ob ab il it y d is t r ib u t ion : B = { b i ( s n ) } N i =1 b i ( s n )= P ( o [ t ( n 1 ) +1 : t ( n )] | s n = i ) t [1 ,T ] ( 8. 22) i [1 ,N ] • D is t h e s e gm e n t d u r at ion p r ob ab il it y d is t r ib u t ion : D = { b i ( s n ) } N i =1 [
d
q1q1q1q1q1q1q1q1q1q2q2q2q2q2q2q2q2q2q2q2q2q2q2q2q3q3q3q3q3q3q3q3 s1
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
s2s3 q1q1q1q1q1q1q1q1q1q2q2q2q2q2q2q2q2q2q2q2q2q2q2q2q3q3q3q3q3q3q3q3 s1
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
s2s3 q1q1q1q1q1q1q1q1q1q2q2q2q2q2q2q2q2q2q2q2q2q2q2q2q3q3q3q3q3q3q3q3 s1
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
4. 2 S e gm e n t al HM M 3 4 . 2 Se g m e n t a l H M M 4. 2. 1 I n t r o d u c t i on T h e s t at e d u r at i on i s e x p l i c i t l y m o d e l l e d t h r ou gh t h e s e gm e n t s e gm e n t s e q u e n c e s . o =( o 1 ,. .., o
T) : ob s e r v at i on s e q u e n c e q =( q 1 ,. .., q
T) : s t at e s e q u e n c e s =( s 1 ,. .., s
M) : s e gm e n t s e q u e n c e wh er e: s = q [1:
d] T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 7) = ar g m ax
s[1:M]m ax
d[1:M]P ( o [1:
T] | s [1:
M] , d [1:
M] ) | {z }
observation probabilityP ( d [1:
M] | s [1:
M] ) | {z }
segment probabilityP ( s [1:
M] ) | {z } s e g m e n t t r a ns i t i o n pr o ba bi l i t y
( 8) 4. 2. 2 T r ai n i n g T h e ob s e r v at i on p r ob ab i l i t i e s o bs e r v a t i o n =( , B ) an d t h e s e gm e n t al p r ob ab i l i t i e s seg m en t a l = ( S , S
A) ar e t r ai n e d s e p ar at e l y . T h e ob s e r v at i on p r ob ab i l i t i e s ar e t r ai n e d w i t h r e s p e c t t o c on v e n t i on al c on t e x t - d e p e n d e n t HM M s . T h e s e gm e n t al p r ob ab i l i t i e s ar e t r ai n e d w i t h a n or m al d i s t r i b u t i on . 4. 2. 3 S y n t h e s i s T h e op t i m al s e gm e n t s e q u e n c e ¯s gi v e n an ob s e r v at i on s e q u e n c e o i s gi v e n b y : ¯ ( s , d ) = ar g m ax
s,dP ( s , d | o ) ( 9) T h e op t i m al s e q u e n c e i s d e t e r m i n e d u s i n g a r e f or m u l at i on of t h e c on v e n t i on al Vi t e r b i Al gor i t h m ( V A) .
s2s3 116CHAPTER8.DISCRETEMODELLINGOFSPEECHPROSODY
8.3.1SegmentalHMMs
SegmentalHMMs[RusselandMoore,1985,Levinson,1986,GalesandYoung,1993,Ostendorfetal.,1996]wereintroducedinspeechrecognitioninwhichstatesequencesareexplicitlyrepresentedassegmentswithanexplicitmodellingofthesegmentstate-occupancy.SegmentalHMMisageneralizationofhiddenMarkovmodel(HMM)thataddressestwoprincipallimitationsoftheconventionalhiddenMarkovmodel:1)statedurationmodelling,and2)assumptionofconditionalindependenceoftheobservationsequencegiventhestatesequence.
Letdefineo=[o1,...,oT]anobservationsequenceoflengthT,q=[q1,...qT]theassociatedstatesequence,whereqt[1,N],s=[s1,...sM]theassociatedsegmentsequenceoflengthM,andd=[d1,...dM]thecorrespondingsegmentdurations.
AsegmentalhiddenMarkovmodelisdefinedinasimilarmannertotheconventionalhiddenMarkovmodelwithareformulationofthestatesequenceintosegmentsandtheaddofanexplicitstateduration:
=(,A,B,D)(8.19)
where:
•istheinitialstateprobabilitydistribution:={⇥i}Ni=1
⇥i=P(s1=i)i[1,N](8.20)
•Aisthesegmenttransitionprobabilitydistribution:A={ai,j}Ni,j=1
ai,j=P(sn=j|sn1=i)t[1,T](8.21)
i,j[1,N]
•Bistheoutputprobabilitydistribution:B={bi(sn)}Ni=1
bi(sn)=P(o[t(n1)+1:t(n)]|sn=i)t[1,T](8.22)
i[1,N]
•Disthesegmentdurationprobabilitydistribution:D={bi(sn)}Ni=1
P(o[t+1:t+d]|s,d)to
(8.24)
P(d|s)(8.25)
[ 116CHAPTER8.DISCRETEMODELLINGOFSPEECHPROSODY
8.3.1SegmentalHMMs
SegmentalHMMs[RusselandMoore,1985,Levinson,1986,GalesandYoung,1993,Ostendorfetal.,1996]wereintroducedinspeechrecognitioninwhichstatesequencesareexplicitlyrepresentedassegmentswithanexplicitmodellingofthesegmentstate-occupancy.SegmentalHMMisageneralizationofhiddenMarkovmodel(HMM)thataddressestwoprincipallimitationsoftheconventionalhiddenMarkovmodel:1)statedurationmodelling,and2)assumptionofconditionalindependenceoftheobservationsequencegiventhestatesequence.
Letdefineo=[o1,...,oT]anobservationsequenceoflengthT,q=[q1,...qT]theassociatedstatesequence,whereqt[1,N],s=[s1,...sM]theassociatedsegmentsequenceoflengthM,andd=[d1,...dM]thecorrespondingsegmentdurations.
AsegmentalhiddenMarkovmodelisdefinedinasimilarmannertotheconventionalhiddenMarkovmodelwithareformulationofthestatesequenceintosegmentsandtheaddofanexplicitstateduration:
=(,A,B,D)(8.19)
where:•istheinitialstateprobabilitydistribution:={⇥i}Ni=1
⇥i=P(s1=i)i[1,N](8.20)
•Aisthesegmenttransitionprobabilitydistribution:A={ai,j}Ni,j=1
ai,j=P(sn=j|sn1=i)t[1,T](8.21)i,j[1,N]
•Bistheoutputprobabilitydistribution:B={bi(sn)}Ni=1
bi(sn)=P(o[t(n1)+1:t(n)]|sn=i)t[1,T](8.22)
i[1,N]
•Disthesegmentdurationprobabilitydistribution:D={bi(sn)}Ni=1