International Science Index


Speaker Independent Quranic Recognizer Basedon Maximum Likelihood Linear Regression

Abstract:An automatic speech recognition system for the formal Arabic language is needed. The Quran is the most formal spoken book in Arabic, it is spoken all over the world. In this research, an automatic speech recognizer for Quranic based speakerindependent was developed and tested. The system was developed based on the tri-phone Hidden Markov Model and Maximum Likelihood Linear Regression (MLLR). The MLLR computes a set of transformations which reduces the mismatch between an initial model set and the adaptation data. It uses the regression class tree, as well as, estimates a set of linear transformations for the mean and variance parameters of a Gaussian mixture HMM system. The 30th Chapter of the Quran, with five of the most famous readers of the Quran, was used for the training and testing of the data. The chapter includes about 2000 distinct words. The advantages of using the Quranic verses as the database in this developed recognizer are the uniqueness of the words and the high level of orderliness between verses. The level of accuracy from the tested data ranged 68 to 85%.
[1] Al-Diri, B., "A Large Vocabulary Speech Recognition Model for Arabic," Master Thesis, University of Jordan, 2002.
[2] Aulama, M., "Arabic Vowel Phonemes Detection and Categorization in Speech Processing," Master Thesis, University of Jordan, 2001.
[3] Bahl, L. Balakrishnan, S. Bellegarda, J. Franz, M. Gopalakrishnan, P. Nahamoo, D. Novak, M. Padmanabhan, M. Picheny, M and Roukos, S., "Performance of the IBM Large Vocabulary Continuous Speech Recognition System on the ARPA Wall Street Journal Task," IEEE Inter. Conf. on Acoustics, Speech and Signal Processing, vol. 1., pp. 41-44, 1995.
[4] Bateman, D. Bye, D. and Hunt, M., "Spectral Constant Normalization and Other Techniques for Speech Recognition in Noise," Proc. IEEE. Inter. Conf. Acoustic. Speech Signal Process, vol.1, pp. 241-244, 1992.
[5] Baum, L.E., "An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov processes," Inequalities, vol.3, pp. 1-8, 1972.
[6] Christensen, B. Maurer, J. Nash, M. and Vanlandingham, E., "Accessing the Internet via the Human Voice,", 2001.
[7] Davis S. and Mermelstein P., "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Trans. on Acoustics, Speech and Signal Processing, vol. ASSP-28, no. 4, pp. 357-366, 1980.
[8] Davis, K. Biddulph, R. and Balashek, S. "Automatic Recognition of Spoken Digits," Journal of the Acoustical Society of America, vol.24, pp. 637-642, 1952.
[9] Deller J., Proakis G. and Hansen J., "Discrete-Time Processing of Speech Signals," The Institute of Electrical and Electronics Engineers Inc., New York, 2nd edition, 2000.
[10] Furui S., "Speaker Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum," IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 34, no.1, pp. 52-59, 1986.
[11] Gauvain, J. Lamel, L. and Adda-Decker, M. "Developments in Continuous Speech Dictation Using ARPA WSJ Task," IEEE Inter. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp.65-68, 1995.
[12] Gold B. and Morgan N., "Speech and Audio Signal Processing: Processing and Perception of Speech and Music," John Wiley & Sons, Inc., New York, 2000.
[13] Jelinek, F. "A Fast Sequential Decoding Algorithm Using a Stack," IBM J. Res. Develop., vol.13, pp. 675-685, 1969.
[14] Jelinek, F. Bahl, L. R. and Mercer, R. L., "Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech," IEEE Trans. Information Theory, vol. IT-21, pp. 250-256, 1975.
[15] Majali, S., "A Model for a Limited Domain of Arabic Speech Recognition Using Artificial Neural Network," Master Thesis, University of Jordan, 1999.
[16] Markowitz, J., "Using Speech Recognition", Prentice Hall, MA, 1st edition, USA, 1996.
[17] Mourtaga, E., M. Abdallah, A. Sharieh, and S. Serahn, "Quranic Based Speaker-Dependent Recognition Using Triphone/HMM Model," accepted in AMSE, 2005.
[18] Pallett, D. Fiscus, J. Fisher, W. Garofolo, J. Lund, B. Martin, A. and Przybocki, M., "1994 Benchmark Tests for the ARPA Spoken Language Program," DARPA Spoken Language Systems Technology Workshop, pp. 5-36, 1995.
[19] Rabiner L., "Fundamentals of Speech Recognition," PTR Prentice-Hall Inc., New Jersey, 1993.
[20] Ursin, M., "Triphone Clustering in Finish Continuous Speech Recognition," Master Thesis, Helsinki University of Technology, 2002.
[21] Woodland, P. Leggetter, C. Odell, J. Valtchev, V. and Young, S., "The 1994 HTK Large Vocabulary Speech Recognition System," IEEE Inter. Conf. on Acoustics, Speech, and Signal Processing, vol.1, pp.73-76, 1995.
[22] Young, S. Evermann, G. Hain, T. Kershaw, D. Moore, G. Odell, J. Ollason, D. Povey, D. Valtchev, V. Woodland, P., "The HTK Book (for HTK Version 3.2.1)," Cambridge University, Engineering Department, 2002.