You are on page 1of 9

Proceedings

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences
19-22 Safar 1435 Hijri (22-25 December 2013) Al-Madinah Al-Munawwarah, Saudi Arabia

NOORIC1435/2013

Volume: 1
IT Research Center for the Holy Quran and Its Sciences (NOOR) Taibah University Conference Sponsors and Collaborators

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences 22 25 December, 2013, Madinah, Saudi Arabia.

Copyright 2013 Taibah University, All Rights Reserved.

CITATION: In the proceedings of the Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences (NOORIC1435/2013), Volume 1 & 2, AlMadinah Al-Munawwarah, Kingdom of Saudi Arabia. 19-22 Safar 1435 Hijri (22-25 December 2013).

PUBLISHER: IT Research Center for the Holy Quran and Its Sciences (NOOR), Taibah University, P.O. Box 30002, Al-Madinah Al-Munawwarah 41477, Kingdom of Saudi Arabia. Safar 1435 Hijri (December 2013).

ISSN: 1658-6476

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences 22 25 December, 2013, Madinah, Saudi Arabia.

ii

NOORIC1435/2013
The Taibah University International Conference on Advances in Information Technology for the Holy Quran and its Sciences (NOORIC) aims to gather worldwide researchers, academic scholars and industrial practitioners that represent a wide variety of IT specializations for the service of the Holy Quran and its sciences. The conference shall provide a platform for sharing and exploring high-quality and recent research results, industrial/applied practices and future trends in the domain of Information Technology for the service of the Holy Quran and Its Sciences. NOORIC is technically co-sponsored by the IEEE and IEEE Computer-Society. Contributions from scientists, scholars and industry practitioners of novel and high-quality research and case studies were submitted in the following research themes: IT for the Science of the Holy Quran Recitations (Qira'at). IT for the Science of Tafseer of the Holy Quran. IT Security for the Holy Quran and Its Sciences. Virtual Learning Environment for the Holy Quran and Its Sciences. Standardization and Quality Assurance of IT applications and products of the Holy Quran and Its Sciences. The focus of NOORIC1435/2013 is on the following specific research topics: Speech and Image Processing Computer Based Education Pattern Recognition eLearning Artificial Intelligence Blended and Collaborative Learning Cloud Computing Innovative Online Learning Systems Semantic and Web Ontology Advanced Technologies for People Knowledge Management with Special Needs Web- and Grid-based Simulation Multimedia Computing Data and Text Mining Computer Games Information Retrieval and Management Computer Animation Indexing, storage and data-analysis techniques Quality Assurance Information Security Standardization Internet Security

Accepted research papers were considered for oral and poster presentations, with all accepted papers being published in this conference proceeding and in IEEE Xplore Digital Library, if applicable. In addition to the technical sessions featuring original contributions from authors of accepted papers, the conference also includes a number of invited talks and panel discussions sessions that feature presentations from internationally renowned scientists and scholars. Finally, NOORIC`1435/2013 is also hosting a Forum (Multaqa) for researchers and interested stakeholders in Information Technology for the service of the Holy Quran and Its Sciences during the first day of the conference program.

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences 22 25 December, 2013, Madinah, Saudi Arabia.

iii

Similarity Evaluation of English translations of the Holy Quran


Mohd Zamri Murah Center for Articial Intelligence Technology Fakulti Teknologi Sains Maklumat Universiti Kebangsaan Malaysia, MALAYSIA zamri@ftsm.ukm.my

AbstractIn this paper, similarity evaluation of English translations of the Holy Quran using four computational methods is performed. The computational methods are bag of words, term-frequency, tdf and latent semantic indexing. We used twenty-one translation pairs from seven English translations (Hilali, YusufAli, Sahih, Shakir, Arberry, Pickthall, Maududi) in our experiments. The similarity measures were evaluated pairwise. Based on our results, seven translations pair have high similarity measures; (Hilali, YusufAli), (Hilali, Sahih), (Sahih, Shakir), (Hilali, Pickthall), (Pickthall, Shakir), (Hilali, Shakir), (Shakir, Arberry). We have nine translation pairs with low similarity measures; (Hilali, Maududi), (Maududi, Pickthall), (Maududi, YusufAli), (Hilali, Arberry), (Arberry, YusufAli), (Maududi, Arberry), (Pickthall, YusufAli), (Shahih, YusufAli), (Pickthall, Sahih). These results from a computational perspective offer new insights into the similarity evaluation between the English translations based on computation methods. Also, we concluded the translations of (Hilali, Sahih, Shakir, YusufAli, Pickthall) could be clustered into one group, and the translations of (Maududi, Arberry) into another group. These results could be used for classication of English translations and for comparing future translations. Keywords-similarity measures; Quran translations

determine whether one English translation is similar or not with another English translation based on given criterias. Similarity measure is dened as how an object similar with another object. The objects can be images, documents, sentences or others. There are many methods to calculate similarity measures between objects. In this paper, we use similarity measures from a natural language processing where these measures are concern with similarity between objects that involve sentences, words or terms. The similarity measure is a number between range (0.0, 1.0). A high similarity measure indicates that the two objects are similar based on certain features. For instance, a similarity measure of 0.80 between two sentences indicate that both very similar, and a similarity measure of 0.05 indicates otherwise. II. R ELATED W ORKS Many people have reviewed the English translations of the Quran. Kidwai[1] has reviewed and commented on many English translations. Mohammed[4] provided some critical views of many English translations. Some of the translations were for a specic group of people such as Shia[5] or Ahmadiah[6]. Many factors such as ideological bias, Arabic uency[7], mastery of English, and knowledge of Islamic Law inuenced the translations of the Holy Quran. Because of these numerous factors, each translation is different and unique. Nassimi[2] has done a thematic comparison of the English translations of the Holy Quran. These studies have shown that exist similarities and differences between the English translations. In this paper, we evaluate the similarity between the English translations using a computational method called a similarity measure[8]. A similarity measure has many applications such as image retrieval, information retrieval and query analysis. For instance, we can use a similarity measure to compare two documents to determine whether the two documents are similar or not. Another example is in query formation. For instance, when a user query beautiful car, the method can propose a similar query such as high performance vehicle or expensive ride. The similarity measures in this paper are based on similarity measures from computational linguistic or natural language processing domain. There are many similarity measures for documents, sentences, and words[8]. One method

I. I NTRODUCTION The Quran has been translated into many languages to benet those who neither speak nor understand Arabic. Currently, there are more than 60 English translations of the Quran [1]. There have been many studies about the English translations based on themes[2], content analysis, morphological analysis[3] and other aspects. These results from these studies were based on experts opinions or views. In this paper, we use a computational linguistic approach to evaluate similarity of English translation of the Holy Quran. In a computational approach, the English translations are evaluate using quantitative methods. Its offer an alternative method to analyze the English translations. The results from these analysis will provide new insights about the English translations. The results might support or contradict other studies about English translations. The research question is; Given two English translations of a versefrom the Holy Quran, how similar are the translations? In this paper, we evaluate similarity of seven English translations based on four computational models from natural language processing domain; bag of words, termfrequency, tdf and latent semantic indexing. The results

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 25, 2013, Madinah, Saudi Arabia

1 - 248

is the bag of words (bows) model. This model is based on the concept of a set theory from mathematics eld. In bows, the set contains unordered words without duplications. The similarity measure is a ratio between the intersection and the union between the two sets. No external information is used in the model. Another method is the term-frequency model. This model projects the terms and their frequencies into vectors in a vector space. Then, the similarity measure are calculated using Cosine Distance of the vectors. Another method is the term-frequency/ inverse document frequency model. Unlike the two previous model, this model uses a corpus. The basic concept is that a corpus will provide a better representation of the terms used in the model. This model used the term-document matrix to derive the weight for each term. These weights will be projected into a vector space. Then, the similarity measure are calculated using Cosine Distance of the vectors. The latent semantic indexing model uses the concept that there are latent or hidden relationships between the terms in the documents[9][10]. These latent relationships can be calculated using a Single Value Decomposition (SVD) approach. This model also uses a corpus. This model extends the tdf model by using different weights for the terms in the vector space. In this model, the term-document matrix is reduced in its dimension. III. M ETHODS We described the methods to measure the similarity. In our study, we used seven English translations. Each translation has 6236 translated verses within 114 surah. We measured similarity at three different levels; whole translation, surah and verse. Let ti be the translations where i = {1, . . . , 7}. We calculated similarity measure pairwise. Since we have N = 7 N (N 1) = 21 number of translations, thus we have N 2 = 2 pairs. Let s be the random variable for surah and si be the ith surah where i = {1, . . . , 114}. Let vijt be the translated English verse , where i is the surah number, j is the verse number, and t is the translation. A. Bag of words model (bows) Bag of words consists of an unordered set of terms or words. Given two translated verses vijk and vijl , the similarity measure Sij between two translations (l, k ) is; Sij (tl , tk ) = set(vijl ) set(vijk ) set(vijl ) set(vijk )

B. term-frequency (tf) In a term-frequency model, we derived term frequency vectors for the translated verses vijl and vijk . We then calculated the similarity measure between the vector using Cosine Distance; Sij (tl , tk ) = vijl vijk |vijl ||vijk |

In this model, the similarity measure depends on both the terms and the frequency of the terms in the translations. C. term-frequency/inverse document frequency (tdf) This model uses a corpus. In this study, each verse will have N translations. This will be used as a corpus for the verse. From the corpus, we derived the weight w for each term in the translations. w = tf idf = tf log N ni

The tf refers to term frequency, which is the frequency of a term within a translation. The idf refers to inverse document frequency. Given two translated verse vijl and vijk , we derive the weight for each term in the translations (l, k ). Then, the terms vectors are projected into the vector space using the weights. We used Cosine Distance to measure the similarity between the vectors. D. latent semantic indexing (lsi) In latent semantic indexing, we assumed there are latent (hidden) relationships between the terms in the translations. First, we created a corpus of N translations for each verse. Then, we derived the term-document matrix from the corpus. We derived SV D (Single Value Decomposition) from the term-document matrix. We constructed the terms vector for the translation tl and tk based on the corpus. We used Cosine Distance to measure the similarity between the vectors. E. surah similarity measures Let s be a random variable for a surah. Surah similarity measure is the average similarity measure for si . si (tl , tk ) = 1 J
J

Sij (tl , tk )
j =1

where J is the number of verses in surah i, (l, k ) are the translations. F. average similarity measures The average similarity measure for a pairwise translations is tl , tk ; 114 1 S (tl , tk ) = si (tl , tk ) 144 i=1

The value of Sij (tl , tk ) is between (0.01.0). A similarity measure of 1.0 indicates a perfect similarity where both translated verses use the same terms. In this model, only the terms used are important.

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 25, 2013, Madinah, Saudi Arabia

1 - 249

IV. E XPERIMENTS A. Dataset The dataset consists of seven (N = 7) English translations obtained from http://tanzil.net. The English translations are Hilali[11], Shakir[12], Pickthall[13], Sahih[14], YusufAli[15], Maududi[16] and Arberry[17]. Each translation consists of 6236 translated verses. We created 21 pairs of translations. B. Steps in the experiments 1) For each pair of translations, we calculate the similarity measures of each verse based on the four methods. 2) The similarity measures from step 1 are divided into ve ranges; (0.0 0.2), (0.2 0.4), (0.4 0.6), (0.6 0.8), (0.8 1.0). For each range, we calculate the number of translated verses with similarity measure within these ranges. 3) Then, we calculate the average similarity measures of each surah. 4) Then, we calculate the average similarity measure for a pairwise translations. C. Tools All experiments used Python/gensim programing language. D. Example We evaluated the similarity of the rst verse in surah alFatihah (v11 ). Given two English translations from Hilali(t1 ) and Pickthall(t2 ); t1 =In the Name of Allah, the Most Benecent, the Most Merciful t2 =In the name of Allah, the Benecent, the Merciful. We compared t1 and t2 using bows. The set for t1 is (benecent, name, of, allah,most,merciful, in, the) and the set for t2 is (benecent, name, of, allah, merciful, in, the). Thus, the similarity measure S (t1 , t2 ) = 7/8 = 0.875. This indicated that 0.875% of terms in t1 were similar to the terms in t2 . Second, we used the tf model to compare t1 and t2 . In this model, we calculated the frequency of each term (benecent, name, etc.) for translations (t1 , t2 ). The term-frequency table as shown in table I. From the table, we derived the terms vectors and calculated the similarity measure using Cosine Distance. The similarity measure S (t1 , t2 ) = 0.701. Thus, This indicated that based on terms and frequencies features, t1 has a similarity measure 0.701 with t2 . For tdf and lsi model, we used N = 4 translated verse as a corpus. In tdf model, we calculated the tdf values for each term from table I. The tdf values is shown in table II. Based on the tdf values, we projected the terms vectors into a vector space and calculated the similarity measure using

Table I T ERM - FREQUENCY FOR TERMS FROM FOUR TRANSLATIONS (H ILALI , P ICKTHALL , Y USUFA LI , S HAKIR ) FOR THE FIRST VERSE IN SURAH AL -FATIHAH terms allah benecent merciful most name gracious Hilali 1 1 1 2 1 0 Pickthall 1 1 1 0 1 0 YusufAli 1 0 1 2 1 1 Shakir 1 1 1 0 1 0

Table II tdf FOR TERMS FROM FOUR TRANSLATIONS (H ILALI , P ICKTHALL , Y USUFA LI , S HAKIR ) FOR THE FIRST VERSE IN SURAH AL -FATIHAH terms allah benecent merciful most name gracious Hilali 0 0.2031 0 0.9791 0 0 Pickthall 0 1 0 0 0 0 Yusuf Ali 0 0 0 0.7071 0 0.7071 Shakir 0 1 0 0 0 0

Cosine Distance. Thus, we have S (t1 , t2 ) = 0.2032. In lsi model, we assumed there are latent (hidden) relationships between the terms in the translations. In this method, the term-document matrix (table I) is broken into 3 separate matrices using Single Value Decomposition. Based on lsi, we have S (t1 , t2 ) = 0.4472 for the rst verse of surah alFatihah for (Hilali/Pickthall) translation pair. Thus, for the rst verse in surah al-Fatihah using Hilali/Pickthall pair of translation, we have four similarity measures; bows(0.875), tf (0.701), tfdf (0.2032), and lsi(0.4472). We can repeat the same process for other translations pairs. From these results, we could observed which translation pair have the highest similarity measures. Next, we used Hilali/YusufAli translation pair to evaluate the similarity of English translation of the surah alFatihah. The results are shown in table III. From table III, we observed that bows indicated low similarity between Hilali/YusufAli (0.4173). This was because Hilali uses extra words to explain certain terms. We observed based on tf, there was a high average similarity between Hilali/YusufAli (0.6223). We observed based on tdf, there were two verses with low similarity; verse 2(0.052) and verse 4(0.097). The lsi model indicated a high similarity between Hilali/YusufAli(0.7574). The average similarity measures are bows(0.4173), tf (0.6223), tdf (0.4586) and lsi(0.7574). These averages could be used to compare Hilali/YusufAli translation with other translations pairs at a surah level. V. R ESULTS AND D ISCUSSION A. similarity at verse level We have 21 translation pairs from 7 English translations of the Holy Quran. For each translation pair, we measure the

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 25, 2013, Madinah, Saudi Arabia

1 - 250

Table III S IMILARITY MEASURES FOR SURAH AL -FATIHAH FOR Hilali/YusufAli Verse 1:1 1:2 1:3 1:4 1:5 1:6 1:7 ave. min max range bows 0.875 0.2917 0.375 0.1875 0.2632 0.5 0.4286 0.4173 0.1875 0.8750 0.6875 tf 0.7071 0.6180 0.6455 0.6626 0.4221 0.7303 0.5708 0.6223 0.4221 0.7303 0.3282 tdf 0.8459 0.052 0.7963 0.097 0.7329 0.3885 0.2971 0.4586 0.052 0.8459 0.7939 lsi 0.8744 0.8225 0.7103 0.8524 0.4168 0.9881 0.6372 0.7574 0.4168 0.9881 0.5613

Table V T HE NUMBER OF VERSES WITH SIMILARITY MEASURES BETWEEN FIVE DIFFERENT RANGES USING tdf AND lsi MODELS . Range of similarity measure (0.81.0) tdf Hi-Ma Hi-Pi Hi-Sa Hi-Sh Hi-Ar Hi-YA Ma-Pi Ma-Sa Ma-Sh Ma-Ar Ma-YA Pi-Sa Pi-Sh Pi-Ar Pi-YA Sa-Sha Sa-Ar Sa-YA Sh-Ar Sh-YA Ar-YA 33 67 59 44 25 98 57 51 54 85 40 48 53 59 26 46 45 32 63 28 25 lsi 164 188 226 183 166 232 199 179 192 229 154 170 174 175 159 201 171 181 195 168 168 (0.60.8) tdf 12 86 53 21 8 133 11 23 45 47 18 17 18 28 36 48 22 19 41 25 13 lsi 11 27 26 16 3 45 4 12 8 13 4 14 21 7 11 25 11 12 14 7 10 (0.40.6) tdf lsi (0.20.4) tdf 462 863 1088 634 309 1518 299 627 768 706 518 431 551 717 748 992 545 485 781 517 377 lsi 99 201 282 160 87 356 64 117 136 90 85 100 129 96 91 161 99 119 124 121 77 (0.0 0.2) tdf 5662 5004 4820 5416 5856 4064 5820 5426 5244 5265 5604 5661 5522 5285 5251 4990 5556 5617 5142 5566 5758 lsi 5961 5758 5628 5830 5961 5481 5953 5899 5874 5884 5981 5936 5890 5929 5950 5788 5931 5901 5880 5912 5973

Table IV T HE NUMBER OF VERSES WITH SIMILARITY MEASURES BETWEEN FIVE DIFFERENT RANGES USING bows AND tf MODELS . Range of similarity measure (0.8 - 1.0) bows Hi-Ma Hi-Pi Hi-Sa Hi-Sh Hi-Ar Hi-YA Ma-Pi Ma-Sa Ma-Sh Ma-Ar Ma-YA Pi-Sa Pi-Sh Pi-Ar Pi-YA Sa-Sh Sa-Ar Sa-YA Sh-Ar Sh-YA Ar-YA 25 94 82 53 19 175 44 63 56 59 31 48 53 44 28 60 39 41 56 30 19 tf (0.6 - 0.8) bows 260 577 592 415 171 1047 199 339 327 323 224 313 456 343 264 546 331 277 487 320 138 tf 3041 3003 3254 3299 2860 3123 2454 3075 3113 2764 2683 3132 3134 2962 2922 3358 3052 2984 3065 2921 2619 (0.4 0.6) bows 2143 2828 3135 2821 1776 3063 1767 2232 2577 2116 1924 2461 2948 2432 2282 3151 2469 2278 2869 2516 1682 tf 2009 1675 1185 1483 2152 1404 2470 1892 1828 2112 2294 1820 1787 1967 2121 1206 1871 2051 1677 2078 2349 (0.20.4) bows 3301 2342 2122 2578 3588 1746 3620 3158 2904 3215 3507 3010 2466 2986 3193 2233 2963 3180 2477 2933 3682 tf 513 453 269 389 615 341 775 466 474 625 688 470 462 542 557 251 445 469 438 521 741 (0.00.2) bows tf 507 395 305 369 682 205 606 444 372 523 550 404 313 431 469 246 434 460 347 437 715 93 878 43 71 117 74 167 85 86 117 149 82 77 109 130 62 94 105 86 132 168

258 1018 1485 994 492 1294 370 718 735 618 422 741 766 656 506 1359 774 627 970 584 359

67 17 216 62 216 74 121 47 38 19 423 122 49 16 109 29 125 26 133 20 56 12 79 16 92 22 147 29 175 25 160 61 68 24 83 23 209 23 100 28 63 8

Hi=Hilali, Ma=Maududi, Pi=Pickthall, Sa=Sahih Sh=Shakir, Ar=Arberry, YA=YusufAli

Hi=Hilali, Ma=Maududi, Pi=Pickthall, Sa=Sahih Sh=Shakir, Ar=Arberry, YA=YusufAli

similarity for each verse using four similarity measures. We have divided the similarity measures into 5 ranges for ease of intrepretation; (1.0 0.8), (0.8 0.6), (0.6 0.4), (0.4 0.2), (0.2 0.0). For example, table IV showed the number of verses that have similarity measures between these ranges using bows and tf models. In table IV, The rst column shows similarity results from a bows model. For example, when we compared HiMa translations, we have 25 verses that have similarity measures between (0.8 1.0), and so on. Based on bows, Hi-Ma have a low similarity, which indicates that both HiMa uses different terms in their translations. The second column shows results from a tf model. For example, when we compared Hi-Ma, we have 258 verses that have similarity

measures between (0.8 1.0), and so on. Based on tf, Hilali-Maududi has a high similarity based on terms and frequencies features. In a bows, we only compared the terms used in the translations. A high similarity indicates both translations used the same terms, and vice versa. If we counted the number of verses with similarity measure greater than 0.6, the top similar translations for bows are Hi-YA(1222), HiSa(674), Hi-Pi(671). In a tf l, we used both the terms and the terms frequencies. A high similarity indicates that both translations used the same terms with the same frequencies. If we counted the number of verses with similarity measure greater than 0.6, the top similar translations are Hi-Sa(4739), Sa-Sh(4717), and Hi-YA(4417). Based on these two methods, we concluded that the translation pairs that have high similarity are Hi-YA, Hi-Sa, Hi-Pi, and Sa-Sh. These results indicate these transations pairs use same words in the translations. The table V showed the number of verses with similarity measures between ve different ranges using tdf and lsi models. It is important to note that tdf and lsi models are based on corpus. Thus, it is inappropriate to compare table IV and table V on verse frequencies. In a tdf model, we used tdf as a weight for each term in the translations. The top similar translations are based on the number of verses that have similarity measures greater than 0.60 are Hi-YA(251), Hi-Pi(153) and Ma-Ar(132). For lsi model, the

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 25, 2013, Madinah, Saudi Arabia

1 - 251

top similar translations are Hi-YA(277), Hi-Sa (252), Ma-Ar (242). We observed that, on verse level, Hi-YA have high similarity based on all four models. Other translations that have high similarity on verse level are Hi-Sa, Sa-Sh, Hi-Sa, HiPi. It is a surprise to see Ma-Ar translation to has a high similarity based on tdf and lsi. From these results, we could divided the translations into many groups. For example, one group could have (Hilali,YusufAli, Sahih), and other group could have (Pickthall, Shakir, Maududi) and the third group has (Arberry). Eachgroup indicated how each translation similar to other translations. B. surah similarity measures We calculated the average surah similarity in table VI and table VII. For example, when we compared Hi-Ma translation, we had 0 surah with average surah similarity between (1.0-0.8), 0 surah for range (0.8-0.6), and so on. This indicated that if we considered the translations at a surah level, there were very few translations of a surah that were similar. The reason maybe because there are many verses in a surah. Some translations of the verses in the surah have high similarity measures, and others have low similarity measures. This indicated an inconsistent of similarity measures within a surah between the translations. These results supported the hypothesis that a translation is a difcult thing to do. If a translation is easy, similarity measures are almost identical for each of the verses. The top similar translations for each model; bows (Hi-YA, Sa-Sh), tf (Hi-Sa, Sa-Sh, Hi-YA), tdf (Hi-Sa, Hi-YA), lsi (Hi-YA, Ma-Pi). This results indicated that, at surah level, translations of Hilali, Yusuf Ali, Sahih, Shakir are highly similar and translations of Maududi, Pickthall, Arberry have low similarity. C. average similarity measures for translations The average similarity measure is the similarity measure for the whole translations pairwaise. These are shown in table VIII. The top ve similar translations are shown in table IX and the bottom ve similar translations are shown in table X. VI. C ONCLUSION In this paper, we have evaluated the similarity of seven English translations of the the Holy Quran using four different methods from computational linguistic. The signicance of our study is that the similarity measures are based on computational linguistic, rather than based on human experts evaluations. These results have a potential use in grouping the translations into different groups. These methods could be extended to evaluate the similarity of the Arabic verses in the Holy Quran.

Table VI T HE NUMBER OF SURAH WITH THE AVERAGE SIMILARITY MEASURE BETWEEN THE RANGES USING bows AND tf MODELS . Range of similarity measure (0.81.0) bows tf Hi-Ma Hi-Pi Hi-Sa Hi-Sh Hi-Ar Hi-YA Ma-Pi Ma-Sa Ma-Sh Ma-Ar Ma-YA Pi-Sa Pi-Sh Pi-Ar Pi-YA Sa-Sha Sa-Ar Sa-YA Sh-Ar Sh-YA Ar-YA 1 1 1 1 2 (0.60.8) bows tf 1 0 0 2 1 1 2 1 62 72 91 72 57 91 37 74 74 60 45 66 70 71 58 87 73 64 76 62 36 (0.4-0.6) bows 16 70 77 62 7 101 13 37 61 37 14 38 71 44 27 81 46 25 74 40 8 tf 52 40 23 39 51 23 74 40 40 51 66 43 38 41 53 27 40 47 35 49 76 (0.20.4) bows tf 98 42 37 52 105 13 100 77 53 77 100 73 42 69 86 33 68 89 38 73 105 1 2 5 0 3 0 1 2 3 4 5 2 3 0 1 3 1 3 2 (00.2) bows tf 1 2 1 1 1 1 1 1 -

Hi=Hilali, Ma=Maududi, Pi=Pickthall, Sa=Sahih Sh=Shakir, Ar=Arberry, YA=YusufAli Table VII T HE NUMBER OF SURAH WITH THE AVERAGE SIMILARITY MEASURE BETWEEN THE RANGES USING tdf AND lsi MODELS . Range of similarity measure (0.81.0) tdf Hi-Ma Hi-Pi Hi-Sa Hi-Sh Hi-Ar Hi-YA Ma-Pi Ma-Sa Ma-Sh Ma-Ar Ma-YA Pi-Sa Pi-Sh Pi-Ar Pi-YA Sa-Sh Sa-Ar Sa-YA Sh-Ar Sh-YA Ar-YA 1 lsi (0.60.8) tdf 2 2 3 1 1 5 2 4 1 2 2 3 1 3 1 1 2 2 4 2 lsi 1 (0.40.6) tdf 49 85 94 79 34 96 20 57 77 79 79 46 61 60 22 81 57 46 69 49 24 lsi 1 1 1 1 1 1 1 2 1 1 1 3 1 (0.20.4) tdf 61 25 17 32 77 13 92 53 36 33 33 64 49 50 87 32 55 66 38 61 89 lsi 3 2 8 6 6 9 9 4 3 7 4 2 3 3 8 7 2 4 4 7 3 (0.00.2) tdf 2 2 2 2 1 3 1 2 2 1 lsi 110 111 105 107 107 105 105 110 110 106 109 110 110 110 106 106 112 110 107 106 111

Hi=Hilali, Ma=Maududi, Pi=Pickthall, Sa=Sahih Sh=Shakir, Ar=Arberry, YA=YusufAli

ACKNOWLEDGMENT The author would like to thank two anonymous referees for their valuable comments. This research was conducted

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 25, 2013, Madinah, Saudi Arabia

1 - 252

Table VIII T HE AVERAGE SIMILARITY MEASURES FOR ALL TRANSLATIONS USING


FOUR DIFFERENT METHODS

methods bows Hi-Ma Hi-Pi Hi-Sa Hi-Sh Hi-Ar Hi-YA Ma-Pi Ma-Sa Ma-Sh Ma-Ar Ma-YA Pi-Sa Pi-Sh Pi-Ar Pi-YA Sa-Sh Sa-Ar Sh-YA Sh-Ar Sh-YA Ar-YA 0.37 0.42 0.43 0.41 0.35 0.47 0.36 0.38 0.40 0.38 0.36 0.39 0.42 0.39 0.38 0.43 0.39 0.38 0.41 0.39 0.34 tf 0.61 0.64 0.69 0.65 0.60 0.67 0.57 0.62 0.62 0.60 0.58 0.63 0.63 0.61 0.60 0.68 0.63 0.61 0.64 0.61 0.57 tdf 0.093 0.133 0.147 0.109 0.070 0.187 0.072 0.105 0.118 0.114 0.092 0.088 0.098 0.109 0.109 0.131 0.092 0.092 0.122 0.095 0.075 lsi 0.054 0.068 0.081 0.064 0.052 0.088 0.054 0.058 0.060 0.063 0.048 0.055 0.058 0.055 0.052 0.068 0.055 0.057 0.061 0.055 0.051

[3] M. S. Jumeh, The loss of meaning in translation: its types and factors with reference to ten english translations of the meaning of the quran, Ph.D. dissertation, The University of Wales, Lampeter, 2006. [4] K. Mohammed, Assessing english translations of the quran, Middle East Quarterly, 2005. [5] A. B. Husain, The Holy Quran: A Translation with Commentary: According to Shia Traditions and Principles. Moayyedul-Uloom Association, Madrasatul Waizeen, 1931. [6] N. Robinson, Sectarian and ideological bias in muslim translations of the quran, Islam and Christian-Muslim Relations, vol. 8, no. 3, pp. 261278, 1997. [7] Q. O. Al-Azzawi and Z. H. Alwan, Translation assessment of three english translations of words of patience in some selected verses, Journal of Advanced Social Research, vol. 3, no. 6, 2013. [8] D. Jurafsky and J. H. Martin, Speech and Language Processing. Pearson Prentice Hall, 2008. [9] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391407, 1990. [10] S. T. Dumais, G. W. Furnas, T. K. Landauer, S. Deerwester, and R. Harshman, Using latent semantic analysis to improve access to textual information, in Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1988, pp. 281285. [11] M. Al-Hilali and M. Khan, The Noble Quran: English Translation of the meaning and commentary. King Fahd Complex for the printing of Holy Quran. Madinah, KSA, 1985. [12] M. S. Shakir, The Glorious Quran: With translation and transliteration, 2nd ed. Qum: Ansariyan Publications, 2001. [13] M. Pickthall, The Meaning of the glorious Koran: An explanatory translation. London: Random House, 1992, vol. 105. [14] S. International, The Qurn: Arabic text with corresponding English. Jeddah, Saudi Arabia: Abul-Qasim Pub. House, 1997. [15] A. Yusuf Ali, The Holy Quran: Translation and Commentary. Dar Al-Qiblah, Jeddah, KSA, 1982. [16] S. A. A. Maududi, Tafheemul Quran. Delhi, India: Maktabah Jamaat Islami, 1965. [17] A. J. Arberry, The Koran Interpreted: A Translation. Simon and Schuster, 1996.

Hi=Hilali, Ma=Maududi, Pi=Pickthall Sh=Shakir, Ar=Arberry, YA=YusufAli, Sa=Sahih Table IX T HE TOP THREE AVERAGE SIMILARITY MEASURES FOR TRANSLATIONS bows Hi-YA Hi-Sa Sa-Sh Hi-Pi Pi-Sh tf Hi-Sa Sa-Sh Hi-YA Hi-Sh Hi-Pi tdf Hi-YA Hi-Sa Hi-Pi Sa-Sh Sh-Ar lsi Hi-YA Hi-Sa Hi-Pi Sa-Sh Hi-Sh

Table X T HE BOTTOM THREE AVERAGE SIMILARITY MEASURES FOR


TRANSLATIONS

bows Hi-Ma Ma-Pi Ma-YA Hi-Ar Ar-YA

tf Ma-Ar Pi-YA Ma-YA Ma-Pi Ar-YA

tdf Sh-YA Pi-Sa Ar-YA Ma-Pi Hi-Ar

lsi Ma-Pi Hi-Ar Pi-YA Ar-YA Ma-YA

under a research grant ERGS/1/2013/ICT01/UKM/03/5 from Universitiy Kebangsaan Malaysia and Ministry of Education, Malaysia. R EFERENCES
[1] A. R. Kidwai, Translating the untranslatable: a survey of english translations of the quran, The Muslim World Book Review, vol. 7, no. 4, pp. 6671, 1987. [2] D. M. Nassimi, A thematic comparative review of some english translations of the quran, Ph.D. dissertation, University of Birmingham, 2008.

Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 25, 2013, Madinah, Saudi Arabia

1 - 253

You might also like