• Aucun résultat trouvé

A Hidden Markov Model based Speech Recognition Approach to Automated

12.3 Proposed Approach

12.4.2 Decoding Phase

Once the pre-computation phase has carefully been completed, the decoding process becomes pretty simple and elegant. An input speech signal comprising ofn observa-tion vectors, which in our case are the XOR of two unknown sequences of vectors, is then fed as input to the recognizer. Every path from the start node to the end node in the recognition which passes through exactlynemitting states is a potential

sil

ao

k

l

l

dh

START_SIL

m

ax

eh

ax

aa

oh

END_SIL

Fig. 12.3 Recognition network

recognition hypothesis. Each of these paths has a log probability which is computed by summing the log probability of individual transition in the path and the log prob-ability of each emitting state generating the corresponding XORed vector. Within the model, transitions are determined from the model parameters (ai j), between two models the transitions are regarded as constant and in case of large recognition networks the transition between end words are determined by language models like-lihoods attached to the word level networks. The job of the decoder is to find those paths through the network which have the highest log probability.HTKtoolsHVite can be used for this purpose.

12.4.3 Experimental Results

The performance analysis of the recognizer can be done using the HResultstool ofHTK. It reads in a set of label files output by the recognition tool (HVitein our case) and compares them with the corresponding reference transcription files. For the analysis of speech recognition output the comparison is based on a dynamic pro-gramming based string alignment procedure. The experimental results with respect to different acoustic features extraction mechanisms are depicted in Table 12.1. The best accuracy results were presented by the Mel Frequency Cepstral Coefficients (MFCC) with delta and acceleration coefficients-a 39 dimensional vector compris-ing of 12 first MFCC coefficients, the null MFCC coefficient which is proportional to the total energy in the frame, 13 Delta coefficients estimating the first order deriv-ative of MFCC coefficients and 13 acceleration coefficients estimating the second order derivatives. This is perfectly inline with the conventional speech recognition accuracies with respect to the acoustic features.

Table 12.1 Recognition accuracies of different acoustic features

SNo Feature extraction mechanism Recognition accuracy (%)

1. Linear predictive coefficients 65.93

2. Linear predictive reflection coefficients 69.06 3. Linear predictive cepstral coefficients 72.09

4. Mel frequency cepstral coefficients 74.72

(MFCC)

5. Linear predictive cepstral + delta coefficients 77.15 6. Mel frequency cepstral + delta + acceleration

coefficients

79.96

12.5 Future Work and Conclusion

This chapter presents the implementation of plaintext XORs of the key reuse prob-lem of stream ciphers for speech signals encoded with modern encoding techniques.

The text based plain text XORs have been discussed in the literature for quite some time now and the techniques have matured quite well. Conventional speech recogni-tion tools such asHTKcan effectively be employed for the automated cryptanalysis of two time pads in case of speech signals. Two main approaches for achieving this have been discussed while experimental results for the training part modifi-cation have been presented. The decoding part modifimodifi-cation can be taken up as a future work. Detailed complexity analysis of the training and decoding parts of the recognition technique also needs to be looked into in the future assignment.

References

1. Shannon, C.E.,A mathematical theory of communication. Bell System Technical Journal, 27, 379–423, July, 1948.

2. Mason, J., Watkins, K., Eisner, J., and Stubblefield, A.,A natural language approach to auto-mated cryptanalysis of two time pads. In 13th ACM Conference on Computer and Communi-cations Security, November, 2006, Alexandria, Virginia, USA.

3. Wu, H.,The misuse of RC4 in Microsoft Word and Excel,Cryptology ePrint Archive, Report 2005/007, 2005. http://eprint.iacr.org.

4. Borisov, N., Goldberg, I., and Wagner, D.,Intercepting mobile communications: The insecu-rity of 802.11, MOBICOM 2001, 2001.

5. Kohno, T.,Attacking and repairing the WinZip encryption scheme, In 11th ACM Conference on Computer and Communications Security, pp. 72–81, October, 2004.

6. Schneier, B., Mudge, B., and Wagner, D.,Cryptanalysis of Microsoft PPTP Authentication Extensions (ms-chapv2). CQRE’99, 1999.

7. Rabiner, L.R.,A tutorial on hidden Markov models and selected applications in speech recog-nition, Proceedings of the IEEE, 77(2), 257–286, February, 1989.

8. Raj, B., Migdal, J., and Singh, R.,Distributed speech recognition with codec parameters,IEEE Automatic Speech Recognition and Understanding 2001, Cambridge, MA, USA, December, 2001.

9. Gales, M.J.F., Jia, B., Liu, X., Sim, K.C., Woodland, P.C., and Yu, K.,Development of the CUHTK 2004 RT04F Mandarin conversational telephone speech transcription system. Pro-ceedings of ICASSP 2005, I, 841–844, March, 2005.

10. Benson, R.L. and Warner, M., VENONA: Soviet espionage and the American response 1939–1957. Central Intelligence Agency, Washington, DC, 1996.

11. Wright, P.,Spy Catcher. Viking, New York, NY,1987.

12. Rubin, R., Computer methods for decrypting random stream ciphers. Cryptologia, 2(3), 215–231, July, 1978.

13. Dawson, E. and Nielsen, L.,Automated cryptanalysis of XOR plaintext strings. Cryptologia, 20(2), 165–181, April, 1996.

14. Goldburg, B., Dawson, E., and Sridharan, S.,The automated cryptanalysis of analog speech scramblers, EUROCRYPT’91, Springer LNCS 457, pp. 422, Germany, April, 1991.

15. Carmen P.M., Ascension G.A., Diego F.G.C., and Fernando D.M.,A comparison of front-ends for bitstream-based ASR over IP, Signal Processing, 86, 2006.

16. Choi, S.H., Kim, H.K., and Lee, H.S.,Speech recognition using quantized LSP parameters and their transformations in digital communications, Speech Communication, April, 2000.

17. Kim, H.K., Cox, R.V. and Rose, R.C.,Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments, IEEE Transactions on Speech and Audio Processing, August, 2002.

18. Narayanan, A. and Shmatikov, V.,Fast dictionary attacks on human-memorable passwords using time-space trade-off. In 12th ACM Conference on Computer and Communications Security, pp. 364–372, Washington, DC, November, 2005.

19. Song, D.X., Wagner, D., and Tian, X.,Timing analysis of keystrokes and timing attack on SSH.

In 10th USENIX Security Symposium, Washington, D.C., USA, August, 2001.

20. Lee, D.,Substitution deciphering based on HMMs with application to compressed document processing.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1661–

1666, December, 2002.

21. Zhuang, L., Zhou, F., and Tygar, J.D.,Keyboard acoustic emanations revisited. In 12th ACM Conference on Computer and Communications Security, pp. 373–382, Washington, DC, No-vember, 2005.

22. Karlof, C. and Wagner, D.,Hidden markov models cryptanalysis. Cryptographic Hardware and Embedded Systems – CHES’03, Springer LNCS 2779, pp. 17–34, 2003.

23. Young, S.J., Evermann, G., Hain, T., Kershaw, D., Moore, G.L., Odell, J. J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P.C.,The HTK Book. Cambridge University, Cam-bridge, 2003. http://htk.eng.cam.ac.uk.

24. Godfrey, J.J., Holliman, E.C., and McDaniel J.,SWITCHBOARD: Telephone speech corpus for research and development, Proceedings of ICASSP, San Francisco, 1992.