You are on page 1of 4

Automatic Detection of Some Tajweed Rules

Dahlia Omran Sahar Fawzi Ahmed Kandil


Systems and Biomedical Engineering Information Technology & Computer Systems and Biomedical Engineering
Faculty of Engineering, Cairo Science Faculty of Engineering, Cairo
university Nile university university
Giza, Egypt Giza, Egypt Giza, Egypt
cerona77_omran@hotmail.com sfawzi@nu.edu.eg ahkandil@eng1.cu.edu.eg

Abstract – correct understanding of the Holy Quran is an patterns [9-10]. CNN proved reliability in handling speech
essential duty for all Muslims. Tajweed rules guide the reciter to signals and improving the speaker invariance of the acoustic
perform Holy Quran reading exactly as it was uttered by model [11].
Prophet Muhammad peace be upon him. This work focused on
the recognition of one Quranic recitation rule. Qalqalah rule is
applied to five letters of the Arabic Alphabet The main objective of our research is to construct a
(Baa/Daal/Jeem/Qaaf/Taa) having sukun vowelization. The Quranic recitation rule detection system. which is the
proposed system used the Mel Frequency Cepstral Coefficients Qalqalah rule. CNN was used as the speech recognition
(MFCC) as the feature extraction technique, and the model. The proposed system should help read Quran verses
Convolutional Neural Networks (CNN) model was used for correctly, especially for readers with limited knowledge of
recognition. The available dataset consists of 3322 audio samples Tajweed rules and non-Arabic speakers.
from different surahs of the Quran for four professional readers
(Sheihk) AlHussary, AlMinshawy, Abdel Baset, and Ayman
Swayed. The best results were gained using Ayman Swayed II. LITERATURE REVIEW
audio samples with a validation accuracy of 90.8%.
Alagrami et al. [1] worked on four Tajweed rules
Keywords— Quranic recitation rules, Qalqalah rule, the Mel (Edgham Meem, Ekhfaa Meem, Tafkheem Lam, Tarqeeq
Frequency Cepstral Coefficients, Convolutional Neural Networks Lam). The used dataset contained a total of 657 recordings.
(CNN) All audio segmentations were manually performed with an
average sample duration of 4 seconds. The applied feature
I. INTRODUCTION extraction technique basically employs 70 filter banks as the
The Holy Quran is the main source of guidance for all main method. Support Vector Machine (SVM) was adopted
Muslims. Delivering the precise meaning of the holy Quran for Classification. In the test stage, every model of the system
words is a very crucial issue. Although the holy Quran is used 30% of the data and the validation accuracy was 99%.
presented in classic Arabic, it is completely different from
any other Arabic content. For an accurate reading of the holy Damer N.A. et al [12], considered eight Tajweed rules and
Quran, recitation rules must be followed. These rules also conducted several feature extraction techniques such as Mel-
called Tajweed rules, apply certain pronunciation manners, Frequency Cepstral Coefficients (MFCC), Wavelet Packet
articulation positions, and intonation characteristics for Decomposition (WPD), and Linear Predictive Code (LPC).
letters in specific situations. These rules include merging two Different classification techniques were used such as k-
letters’ sounds, applying high stress when pronouncing a Nearest Neighbors (KNN), Support Vector Machines (SVM),
letter, prolonged letter pronunciation for a specific duration, and Multilayer Perceptron (MLP) neural networks. They
and many more rules [1]. Tajweed rules are hard to localize concluded that MFCC achieved the highest accuracy in the
and to be applied, especially for non-Arabic speakers [2]. feature extraction phase and SVM scored the highest
Teaching Tajweed rules may be considered a complicated classification accuracy of 94.4%. This was obtained when all
and confusing task. It needs prolonged sessions of direct features except for the LPC features were applied to SVM.
contact between the instructor and his/her students. Tajweed
rules should be applied to deliver the correct Hassan et.al [13] created a system for Holy Quran's
pronunciation/meaning of the Quran and consequently to Tajweed rules recognition using sphinx tools. In the proposed
maintain its integrity and authenticity [3]. system, the MFCC technique was used to find the most
informative audio features and Hidden Markov Model
Automatic Speech Recognition (ASR) is an interactive (HMM) was used for classification. The system scored 85-
process of recognition of human spoken words by machines 90% accuracy when tested with a small chapter of the Quran.
based on the embedded information in those words [4-5].
ASR enables the machine to receive, interpret and translate E-Hafiz system was created in [14] to help ordinary
audio signals (words or orders) and react accordingly [6]. readers to recite Quran correctly by training them on Tajweed
Deep Neural Networks (DNN) are promising and rules depending on expert readers. The (MFCC) technique
efficient techniques used for ASR [7]. DNN has profoundly was applied to extract the features of recorded voices of
revolutionized the field of speech recognition by applying specific verses. These features were used to build a model of
different kinds of models such as Recurrent Neural Networks speech recognition using Vector Quantization (VQ). This
(RNN), Convolutional Neural Networks (CNN), and model was used to compare readers’ trials against experts’
transform networks [8]. reference readings. Any mismatch on the word level was
highlighted.
CNN showed promising results in pattern recognition and
prediction so they are used to detect and allocate different

979-8-3503-0030-7/23/$31.00 ©2023 IEEE


Another speech recognition system for Tajweed Rule 3. Fixed pre and post edges were added to each of the
Checking was proposed using Vector Quantization (MFCC- sample so the core information of the sample is gained,
VQ), and Mel Frequency Cepstral Coefficient hybrid centralized and preserved during the training phase.
algorithm [15]. The system performance was tested using 4. Samples were collected from stereo audio files, with
Qalqalah phonemes. The result for hybrid MFCC-VQ when 22050 Hz sampling rate.
tested against the original MFCC recoded 86.928%, 5. In addition to the five sets of Qalqalah letters, there is a
90.495%, and 64.683% real-time factor improvement for sixth set named No contains a dataset of the set of
males, females, and children respectively. Qalqalah letters but without the Qalqalah characteristics.

Mahdy et al., [16] developed a speech-enabled Computer- B. Data Pre-processing


Aided Pronunciation Learning system named HAFSS which Samples must be preprocessed to be used by the model.
is used in teaching the holy Quran recitation. A classification • Convert all audio samples into the Comma-Separated
algorithm was implemented testing the duration of phoneme Values (CSV) files form.
for recitation errors' detection related to phoneme durations. • Convert all audio samples from .mp3 form to .wav form.
The system accuracy was 62.4% measured as pronunciation • The stereo channels of all dataset files to be averaged and
errors, 22.4% "Repeat Request" was issued, and 14.9% false saved as a single channel audio mono.
acceptance of total errors.
C. Features Extraction
Features were mainly extracted as Mel-Frequency
III.METHODOLOGY Cepstral Coefficients (MFCC) and their derivatives. Twelve
The main stages conducted through this research to MFCC were obtained, delta, delta-delta were also obtained
reach our objectives are represented in Fig.1 and added to the features set and the energy of the signal
sample. The features generated from each audio signal sample
was used as one input for the speech recognition model [17].
Data Data Feature Recognition
Acquiring Preprocessing Extraction Model The dataset shall be divided into training and testing sets
with a (80:20) ratio [18-20].

Fig.1 The proposed system block diagram D. The Recognition Model


Convolutional Neural Network (CNN) deep learning
model was built and used for recognition of the required rule.
A. Dataset Preparation
The model is constructed using 4 layers, each of them
The basic step in our work is acquiring an adequate consists of a convolutional layer which include two
number of high-quality audio signals to build our reference operations linear operation (convolution) and non-linear
dataset. operation (activation), followed by a pooling layer. By the
Mushaf Tajweed was used to allocate verses with the needed end of the fourth layer, a global average pooling layer (flatten
Tajweed rule. Mushaf Tajweed is a printed version of the holy layer) and finally a fully connected dense layer [21].
Quran that highlights Tajweed rules with certain colors (color- Convolutional Neural Network layout is represented in Fig.2
coded rules).
Samples were acquired from records of the professional
reciters AlHussary, Ayman Swaied, AlMenshawy, and Abdel
baset. All these records follow the narration of Hafss on the
authority of Asim. The composed dataset consists of 3322
samples of different readers. Each reader was denoted a
certain ID number as represented in table 1.

TABLE I. DATASET COMPONENTS ACCORDING TO READER (SHEIHK)


Readers’ ID
Number of Samples Records of Sheihk
Number
1 653 AlHussary Fig. 2. Convolutional Neural Networks layout [22]
2 1432 Ayman Swayed
• Convolutional Neural Networks Model
3 633 AlMinshawy
CNN architecture has the highest performance when
4 604 Adbel Baset manipulating two-dimensional data such as images, that is
why all used audio signals are reshaped lnto 2D form [23].
Sixteen filters are used by first convolutional layer, the
Samples’ characteristics: second one uses 32 filter, 64 filters are used by the third
1. Qalqalah letters samples were spotted manually using one and the fourth layer uses 128 filters. Filters in all four
oceanaudio 8.3. convolutional layers are of size 2. In all four layers, the
2. Each Sample was set to have equal durations to ensure used activation function is the Rectified Linear Unit
capturing all sound characteristics and articulation. (ReLU) [21].
• Pooling layer
It involves using a 2D size 2 Max-pooling filter sliding Reader’s ID
over each feature map. Max-pooling involves considering Accuracy
1 2 3 4
each patch maximum value and discarding other values
√ √ 0.856
[11].
• The Dropout is set to 0.2 in all layers, to prevent the √ √ 0.740
overfitting [24]. √ √ 0.818
• Flatten layer √ √ 0.766
Before the final layer, a flatten layer is created using the √ √ 0.893
global average pooling that extremely down sample the √ √ 0.726
feature map to one dimension array by obtaining each
feature map elements’ average [21].
• Dense layer TABLE 5. RESULTS USING SAMPLES OF THREE RECITERS
It is a fully connected layer in which all inputs are
connected to all outputs and it is the final layer of the
model, it uses a soft-max activation function [25-26] Reader’s ID Accuracy

IV RESULTS 1 2 3 4
√ √ √ 0.790
Several experiments were conducted representing different √ √ √ 0.796
combinations of reciters’ samples. Six classes are used to √ √ √ 0.771
represent the five Qalqalah letters and the absence of Qalqalah √ √ √ 0.855
case. The highlighted raw in each table represents the highest
validation accuracy TABLE 6. RESULTS USING SAMPLES OF FOUR RECITERS

The first experiment was conducted by training the model Reader’s ID Accuracy
with combined audio samples of the four reciters and testing 1 2 3 4
the model with one, two, three of four reciters’ samples. Table √ √ √ √ 0.811
2 represents the results of the first experiment.

TABLE 2. TESTING WITH DIFFERENT RECITERS’ COMBINATIONS Table 7 introduces a comparison between the obtained results
and the results of Ismail et al. [15].

MFCC-VQ Approach Our Proposed Work


Reader’s ID Accuracy
1 2 3 4 Features
√ 0.84 Extraction MFCC-VQ MFCC
Techniques
√ √ 0.89
√ √ 0.75 Classification
Codebook
Convolutional Neural
Model Network
√ √ 0.78
√ √ √ 0.85 45 speakers (20 males - Four professional
Dataset 20 females -5 children) reciters
√ √ √ √ 0.78 1750 samples 3322 samples
description
A sequence of experiments was conducted by including Surah Al-Ikhlas and 10 Most holy Quran
Qalqalah phoneme Surahs
different combinations of reciters’ samples for training and
testing. Table 3 to 5 represent the results these experiments. 87%, 90.5%, and 64.7%
Accuracy for males, females, and 85 – 91%
children respectively
TABLE 3. RESULTS USING THE SAMPLES OF ONE RECITER
V. DISCUSSION
Reader’s ID Accuracy
In the case of training the model by samples of the four
1 2 3 4
reciters, the best accuracy of 89% was acquired when testing
√ 0.83 with Sheihk AlHussary and Sheihk Ayman Swayed samples.
√ 0.908
While using the same reciter samples for training and testing,
√ 0.68
the highest accuracy of 90.8% was obtained with Sheihk
√ 0.88
Ayman Swayed.
Using samples of two reciters, the highest accuracy of 89.3%
was acquired with the combination of Sheihk Ayman Swayed
and Sheihk Abdel Bast samples.

In the case of combining the samples of three reciters the


TABLE 4. RESULTS USING SAMPLES OF TWO RECITERS highest accuracy was 85.5% gained with samples of
AlHussary, Ayman Swaeied, and Abdel Bast. Using the [12] N. Damer, M. Al-Ayyoub, I. Hmeidi I, “Automatically Determining
Correct Application of Basic Quranic Recitation Rules”. The
sample for all four reciters, the accuracy reached 81.1%. International Arab Conference on Information Technology Yassmine
Hammamet, Tunisia, 22-24 December 2017.
Comparing the results obtained with the closest previous [13] H. Tabbal, W. El Falou and B. Monla, “Analysis and implementation
research conducted by Ismail et al. [15], the proposed model of a "Quranic" verses delimitation system in audio files using speech
results were superior even though the dataset covers the recognition techniques” .2nd International Conference on Information
& Communication Technologies,pp. 2979-2984, doi:
different Qalqalah rule through the whole Holy Quran. 10.1109/ICTTA.2006.1684889, 2006.
[14] A. Muhammad ,Z. ul Qayyum, W M. Mirza,S. Tanveer, A M
VI. CONCLUSION .Martinez-Enriquez, A. Syed , “E-Hafiz: Intelligent System to Help
The results show the ability of the CNN to recognize the Muslims in Recitation and Memorization of Quran”. Life Science
Journal, 2012;9(1), 2012.
Qalqalah pattern in different verse and surahs of the holy
[15] Ahsiah Ismail 1, Mohd Yamani Idna Idris 2, Noorzaily Mohamed Noor
Quran with promising accuracy. Increasing the number of 3, Zaidi Razak4, Zulkiflimohd Yusoff, “MFCC-VQ Approach for
training samples may result in better accuracy. Qalqalah Tajweed Rule Checking” .Malaysian Journal Of Computer
Science. Vol. 27(4), 2014
[16] Sh . Mahdy , A. Samir, O. Abd-Elhamid ,S. Hamid, M. Rashwan, M.
Shahin, W. Nazih, “Computer Aided Pronunciation Learning System
REFERENCES Using Speech Recognition Techniques”. International Conference on
[1] A. Alagrami, M. Eljazzar. "SMARTAJWEED: Automatic Recognition Speech and Language Processing INTERSPEECH 2006 -
of Arabic Quranic Recitation Rules" .the International Conference on ICSLPSeptember 17-21, Pittsburgh, Pennsylvania, 2006.
Computer Science, Engineering And Applications, 2020. [17] U. Kiran, “MFCC Technique for Speech Recognition” .Part of the Data
[2] I. Shafaf , F. A. Abdul Rahim, and A. Zaaba Automatic Tajweed Rules Science Blogathon, 2021.
Recognition using k-Nearest Neighbour (k-NN). International Journal https://www.analyticsvidhya.com/blog/2021/06/mfcc-technique-for-
of Recent Technology and Engineering (IJRTE)ISSN: 2277-3878, speech-recognition
Volume-8, Issue-2S11,2019. [18] J. Brownlee, “Train-Test Split for Evaluating Machine Learning
[3] M. Ammar, M. Sunara, M. Salamb “Quranic Verses Verification using Algorithms. Machine Learning Mastery in Python Machine Learning”
Speech Recognition Techniques”, Jurnal Teknologi (Sciences & https://machinelearningmastery.com/train-test-split-for-evaluating-
Engineering) 73:2, 99–106, 2015. machine-learning-algorithms/, 2020.
[4] V. Vineet, K.P. Aditya, and P.Y. Satya, “Speech Recognition using [19] D E. Birba, “A Comparative study of data splitting algorithms for
Machine Learning”, IEIE Transactions on Smart Processing and machine learning model selection”. Dissertation, KTH Royal Institute
Computing, Volume. 10, No. 3, June 2021. of Technology computer science and engineering department
[5] V. Këpuska and H. Elharati, “Robust Speech Recognition System Stockholm, Sweden ,2020.
Using Conventional and Hybrid Features of MFCC, LPCC, PLP, [20] V. Aruchamy, “How To Do Train Test Split Using Sklearn In Python
RASTA-PLP and Hidden Markov Model Classifier in Noisy – Definitive Guide. Stack Vidhya, Machine Learning-python, 2021.
Conditions”, Journal of Computer and Communications, 1-9,2015. https://www.stackvidhya.com/train-test-split-using-sklearn-in-python/
[6] D. Nagajyothi, P. Siddaiah, “Speech Recognition Using Convolutional [21] R. Yamashita, M. Nishio, Kinh Gian Do R, K. Togashi ,“Convolutional
Neural Networks”, International Journal of Engineering & neural networks: an overview and application in radiology”. Insights
Technology,7 (4.6),133-137,2018. into Imaging , 9:611–629, 2018.
[7] H. et al .Geoffrey, ”Deep Neural Networks for Acousting Modeling in [22] Developer Breach https://www.developersbreach.com/convolution-
Speech Recognition”, IEEE Signal Processing Magazine, pp 82-97, neural-network-deep-learning/
2012.
[23] K I. Taher ,A M. Abdulazeez, “Deep Learning Convolutional Neural
[8] I. Papastratison, “Speech Recognition: a review of the different deep Network for Speech Recognition: A Review “. International Journal of
learning approaches”, https://theaisummer.com/speech-recognition/ , Science and Business Volume: 5, Issue: 3 Page: 1-14 2021.
2021-07-14, 2021.
[24] E. Rady, A. Hassen, NM. Hassan , M. Hesham, “Convolutional Neural
[9] S. Bhatia, A. Devi, RI. Alsuwailem, A. Mashat ,”Convolutional Neural Network for Arabic Speech Recognition”. Egyptian Journal of
Network Based Real Time Arabic Speech Recognition to Arabic Language Engineering, Vol. 8, No.1, 2021
Braille for Hearing and Visually Impaired”, Frontiers in Public Health
10:898355. doi: 10.3389/fpubh.2022.898355, 2022. [25] J. Brownlee, “Dropout Regularization in Deep Learning Models with
Keras”. Machine Learning Mastery in Deep Learning, 2022.
[10] J. Heaton, Ian Goodfellow, Y. Bengio, A. Courville, "Deep learning:
The MIT Press”, In Genetic Programming and Evolvable https://machinelearningmastery.com/dropout-regularization-deep-
Machines,19:305–307, 800 pp, ISBN: 0262035618, 2018. learning-models-keras/
[11] O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, D.Yu, [26] M. S. Abdo, S. A. Fawzi, “Arabic Speech Segmentation Into Syllables Using
“Convolutional Neural Networks for Speech Recognition”. Neural Networks”. Journal Of Engineering And Applied Science, Vol. 63, No. 5,
IEEE/ACM Transactions On Audio, Speech, And Language Oct. 2016, Pp. 371-389, 2016.
Processing, Vol. 22, No. 10, 2014.

You might also like