Professional Documents
Culture Documents
To cite this article: Sreerupa Das & Rajkumar Roychoudhury (2006) Readability modelling
and comparison of one and two parametric fit: A case study in Bangla*, Journal of
Quantitative Linguistics, 13:01, 17-34, DOI: 10.1080/09296170500500843
Taylor & Francis makes every effort to ensure the accuracy of all the information
(the “Content”) contained in the publications on our platform. However, Taylor
& Francis, our agents, and our licensors make no representations or warranties
whatsoever as to the accuracy, completeness, or suitability for any purpose
of the Content. Any opinions and views expressed in this publication are the
opinions and views of the authors, and are not the views of or endorsed by
Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and
Francis shall not be liable for any losses, actions, claims, proceedings, demands,
costs, expenses, damages, and other liabilities whatsoever or howsoever caused
arising directly or indirectly in connection with, in relation to or arising out of the
use of the Content.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan, sub-
licensing, systematic supply, or distribution in any form to anyone is expressly
forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Downloaded by [Northeastern University] at 16:06 09 November 2014
Journal of Quantitative Linguistics
2006, Volume 13, Number 1, pp. 17 – 34
DOI: 10.1080/09296170500500843
ABSTRACT
INTRODUCTION
The readability formulae for English may not be directly applicable for
Bangla. This is because, while European scripts are pseudo – phonetic,
Bangla is a syllabic script with glyphs representing clusters and ligatures.
That is, there are certain features or parameters in Bangla which need to
be incorporated in the index to give more accurate scores for Bangla texts.
This paper describes an attempt, perhaps for the first time, to bridge
this gap between Bangla and English. We have extracted a set of
parameters from the older Flesch index (Flesch, 1948) and, based on that,
created a miniature readability model for testing on Bangla documents.
Downloaded by [Northeastern University] at 16:06 09 November 2014
The goal of this present paper is to explore and analyse the ‘‘readability’’
of a few Bangla texts by a number of authors of high repute.
Since ‘‘difficulty level’’ is a qualitative concept, to draw any concrete
inference regarding its significance and interpretations in various
domains of the Bangla language, our primary need is to quantify; i.e.
to transform from qualitative to quantitative.
There are many factors in language structure which make a text
‘‘easy’’, ‘‘moderate’’ or ‘‘difficult’’ (Mikk, 1995, 1999). From the view-
point of a linguist, two such factors are average sentence length (total
words/total sentences) and number of syllables per 100 words (total
syllables/total words * 100).
Our ultimate aim is then to build a readability index via multiple
regression using those two factors. The steps to be taken towards this
goal are illustrated in Diagram A.
Sample Survey
This step involves collecting data, which are of two principal types in our
study. The first sets of data are the sample texts, which are drawn from the
CORPUS randomly. These are then given to a group of readers who share
a common educational and cultural background. Then we collect the
20 S. DAS & R. ROYCHOUDHURY
Downloaded by [Northeastern University] at 16:06 09 November 2014
Diagram A
Parameter Extraction
This step involves studying the various standard readability parameters.
Principally, one needs to investigate the correlation coefficients between
various parameters in the text. In this step we again draw some random
samples from the corpus, which are then given to the readers for their
responses. This step serves a two-fold purpose. First it makes smooth any
irregularities or discrepancies present in the data (more precisely it
corrects for the bias, if any, in the responses). Secondly, from the
responses, we can get a clear picture of which parameters reflect reading
ease or difficulty.
Underlying Principles
Defining and selecting a readability formula requires some attention
to the underlying question: What exactly constitutes a readable
document? Specifically, what features of the text play an important role
in determining readability? Many factors can be suggested as having an
influence on readability: the proportion of less frequent words, the type-
token ratio, word length, sentence length, frequency of personal
references, and so on (Bhattacharya, 1965). A survey of the features
that have been used in various readability formulae reveals the following
list of features:
Experiments have been carried out to study the correlation between such
factors and the readability scores observed in tests of reading
comprehension. Klare (1968) and others (McLaughlin, 1966) have
shown that the two most common variables in a readability formula are:
Table 1. Showing different readability indices along with their mathematical expressions.
S, total number of syllables; P, total number of words with three or more syllables; W,
total number of words; R, reading ease score in the range 0 (hard) to 100 (easy); T, total
number of sentences; G, grade level in the range 0 (very easy) to 12 (very hard).
READABILITY MODELLING FOR BANGLA 23
Extraction of Variables
Since Indian languages, especially Bangla on which we have started our
work is an inflectional language, certain features of it are worth noting.
Being inflectional in nature, the average word length (in terms of
syllables) can be longer than that of Western European languages like
English. The Bangla word Kariachilam (‘‘I had done’’) is five syllables
long, but it would not be classed with the ‘‘hard words’’, as it is not an
uncommon or difficult word at all. Since this is a general feature of the
language, the length of words in terms of syllables can easily be ignored
Downloaded by [Northeastern University] at 16:06 09 November 2014
EXPERIMENTAL PROCEDURE
Sampling Scheme
For the present study, we consider a set of seven Bangla documents (a
detailed description of the authors is given in the Appendix). Not taking
into consideration the content of the documents, they are arbitrarily
numbered from 1 to 7. From each sample, a page is randomly selected.
Next we select a paragraph, once again randomly from these pages. Below
we give the list of documents along with the names of the authors. The
names are given according to the numbering of the samples (see Table 2).
E 0 – 20 Very difficult
D 20 – 40 Difficult
C 40 – 60 Standard
B 60 – 80 Easy
A 80 – 100 Very easy
Model Building
Next, a model is built on the above observed score (y) following multiple
linear regression. Based on the above data, the model fitted becomes:
y ¼ 69:425 1:204ðx1 Þ þ 0:014ðx2 Þ ð1Þ
1 23 F 3 1 3 3 4 2 5
2 23 F 4 2 5 3 3 2 3
3 23 F 4 1 2 3 3 1 3
4 23 F 3 1 2 4 4 1 5
5 23 M 3 1 2 3 3 2 2
6 23 M 5 1 3 3 4 1 4
7 23 M 4 3 2 3 4 2 3
8 23 M 3 2 4 5 3 1 2
Downloaded by [Northeastern University] at 16:06 09 November 2014
9 23 M 4 1 3 4 3 1 2
10 23 M 3 1 3 5 3 1 2
11 24 F 4 2 3 4 3 1 5
12 24 F 4 1 3 2 3 1 4
13 24 F 4 1 2 2 2 1 3
14 24 F 3 1 4 4 4 2 3
15 24 F 4 1 2 5 3 1 3
16 24 F 3 2 3 4 4 1 5
17 2 M 2 2 3 2 3 2 3
18 2 M 4 1 3 3 3 1 2
19 26 F 3 1 3 2 3 1 3
20 26 M 3 1 2 4 3 1 4
21 29 F 4 3 3 3 3 3 3
22 31 F 4 1 3 2 3 1 3
23 31 M 4 2 3 4 3 2 3
24 32 M 2 2 4 3 2 4 5
25 33 F 3 1 2 2 2 2 4
26 35 F 3 1 4 5 4 2 4
27 36 F 4 1 4 3 3 1 3
28 37 F 2 2 3 3 3 1 4
29 45 F 3 2 3 3 3 1 4
30 46 M 3 2 3 3 3 1 4
31 47 F 2 1 3 1 4 1 4
32 53 F 3 3 3 3 3 2 4
33 55 F 4 1 3 3 2 1 3
34 63 M 4 3 3 4 3 1 3
35 66 F 4 1 4 5 3 2 3
Number of
Average syllables per
Observed score Total Total sentence Total 100 words
Sample (mean) words sentences length (ASL) syllables (NOSY)
number (y) (w) (t) x1 ¼ W/T (S) x2 ¼ S/W * 100
RESULTS
1 58 55.90698 58.59208
2 80.28 54.89941 62.47238
3 51.14 56.03511 58.16459
4 55.71 67.58832 51.24107
5 52.28 52.88495 61.25923
6 57.20 53.37981 59.47422
7 58.57 58.96056 62.20943
28 S. DAS & R. ROYCHOUDHURY
ACKNOWLEDGEMENT
The authors are grateful to the referees for their immensely constructive suggestions.
REFERENCES
Farr, J., Jenkins, J. J., & Paterson, D. J. (1951). Simplification of Flesch reading ease
formula. Journal of Applied Psychology, 35(5), 333 – 337.
Gunning, R. (1952). The Technique of Clear Writing. New York: McGraw Hill.
Hochhauser, M. (1997). Some overlooked aspects of consent form readability.
Institutional Review Bulletin: A Review of Human Subjects Research, 19(5),
5 – 9.
Hou, H. S. (1983). Digital Document Processing. USA: Wiley-Interscience Inc.
Jesperson, O. (1922). Language, Its Nature, Development and Origin. London: George
Allen and Unwin Ltd (reprint 1947).
Klare, G. R. (1968). The role of word frequency in readability. Elementary English, 45,
12 – 22.
Downloaded by [Northeastern University] at 16:06 09 November 2014
APPENDIX
Ashish Nandy
Ashish Nandy, a well known Indian intellectual, writes on cultural and
sociological topics. His important books include An Ambiguous Journey
to the City (Oxford University Press) and a biography of Pramathesh
Borua, a legendary figure in the history of Indian films.
32 S. DAS & R. ROYCHOUDHURY
Bani Basu
Bani Basu, the prolific Bangla writer, was born on 11 March 1939, in
Calcutta. She is one of the most talented and creative women writers of
Bangla literature. She graduated with English Honours from Scottish
Church College, Calcutta, and obtained her MA in English Literature
from the University of Calcutta. She sets her stories in contemporary
West Bengal and populates them with lifelike characters, providing
introspective glimpse at modern society.
She also translated works including The Sonnets of Sri Aurobindo
Downloaded by [Northeastern University] at 16:06 09 November 2014
(Srinvantu), the love stories of Somerset Maugham (1984, Rupa) and the
best stories of D. H. Lawrence (1987, Rupa). Her first novel was
Janmobhumi, Matribhumi published in Anandoloke in 1987. Her first
short story of its kind was published in Desh magazine in 1981. Her widely
read novels include Gandharbi, Pancama Purusha and Ashtama garbha.
Her short stories Svetpatharera Thala and Radhanagar reflect her versatility.
She is at present a lecturer in English in Bijoykrishna Girls’ College,
Howrah. She has received many important awards including Tarashankar
Puroshkar(1991), Sahitya Setu Puraskar (1995), Ananda and Siromoni
Award (1997).
The excerpt in the sample is taken from Maitreyo jatak which reflects
her stylized use of language, her strong sense of history and sociology
and her excellent craftsmanship.
Lila Majumdar
Lila Majumdar, one of our best and best-loved children’s writers in
Bengali was born in a famous Brahmo milieu in 1908. As a young
woman, she was a stellar student of English literature, topping the
Calcutta University MA. Her restless creativity did not allow her to
settle into the discipline of teaching, but she had distinguished stints of
school and college teaching, having been head-hunted by Rabindranath
Tagore. She wrote bestselling cookbooks and household hint books,
which are benchmarks of excellence in their field, worked successfully
for years in All India Radio (1956 – 1963), and took an active interest
in social welfare activities organized by pioneering civil society
organizations.
Her children’s books, such as Din Dupure, Padipisir Barmi Baksa, and
Halde Pakhir Palak are some of the best fantasy, adventure, and ghost
stories in Bangla; their sensitivity and zany imagination have kept readers
in thrall for decades. Another aspect of her oeuvre is the lesser known but
READABILITY MODELLING FOR BANGLA 33
district. He attended school in the village and for higher studies came to
Calcutta where he settled down. He was associated for long with the
Ananda Bazar group as a journalist. After his retirement, he dedicated
himself fully to creative literary writing.
In the beginning of his literary career he wrote poems. But his
versatility in Bengali short stories caught attention of critics as well as
general public. He is the author of over 170 books, his first published
work being Nil Gharer Nati. He has received many accolades including
the Ananda, Bibhuti Smriti awards, and the Narsimha Das award of
Delhi University. The sample is taken from Alik Manush that received
many awards including the Bhualka award (1990) and Sahitya Akademy
award and Bankim Purashkar (1994).
Sunil Gangopadhyay
Sunil Gangopadhyay, perhaps the most popular living Bangla author,
was born on 7 September 1934 at Faridpur, now in Bangladesh. He
received his Master’s degree in Economics from the University of
Calcutta in 1954. He is currently associated with Ananda Bazar group, a
major publishing house in Calcutta.
Author of well over 200 books Sunil Gangopadhyay excelled in
different genres. He is also the founder-editor of Krittibaas, a seminal
poetry magazine which became a platform for new generation of
poets. He is also known for his unique style in prose. Eka Ebong
Koyekjon is one of his well known works of fiction. Sei Somoy (Those
Days), a historical fiction written by him, received the Sahitya Akademy
Award in 1985. He has also written travelogues, children’s fictions,
novels and essays. Among his pen names are Nil Lohit, Sanataan
Pathak and Nil Upadhyay. The sample is taken from a children’s
adventure story.
34 S. DAS & R. ROYCHOUDHURY
Shyamdulal Kundu
He is an intellectual who writes articles occasionally about Bangla
grammar. The excerpt here is taken from an article, which states rules of
spelling in Bangla.
Somprokash Bandopadhyay
He is a computer specialist writing occasionally. The excerpt here is taken
from his book on computer science.
Downloaded by [Northeastern University] at 16:06 09 November 2014