You are on page 1of 13

Computers & Education 128 (2019) 132–144

Contents lists available at ScienceDirect

Computers & Education


journal homepage: www.elsevier.com/locate/compedu

An intelligent diagnostic framework: A scaffolding tool to resolve


T
academic reading problems of Thai first-year university students
Chayaporn Kaoropthaia, Onjaree Natakuatoonga, Nagul Cooharojananoneb,∗
a
Department of Educational Technology and Communications, Faculty of Education, Chulalongkorn University, Bangkok, 10330, Thailand
b
Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand

A R T IC LE I N F O ABS TRA CT

Keywords: To accommodate teaching an English class with varied language abilities, an intelligent diag-
Academic reading nostic framework (IDF) employing the twostep clustering (TSC) of data mining technique was
Data mining proposed. A tailormade diagnostic test on the 10 underlying academic reading skills was con-
Intelligent tutoring systems structed. Each skill was measured by four test items using a pass criterion of 75% (≥ 3 out of 4).
Post-secondary education
The TSC was performed on the skill scores and ten personal attributes of 297 first-year university
Teaching/learning strategies
students. The precluster step generated three subclusters. Further analysis (N= 221) created a
predictive solution of five clusters with 95.5% accuracy. A final analysis using Pearson's corre-
lation revealed four groups of positive relationships. Lead users from each type were then as-
signed self-tutoring lessons to learn for two weeks. The results revealed that 56% of lead users
had equal or higher scores and 68% of them passed an equal or higher number of skills than in
the pretest. Students' types disclosed by the TSC were thus able to predict and the IDF was able to
diagnose and scaffold most of the students in academic reading skills. Because the IDF was not so
powerful for lower-proficiency students, future research should focus more on those students.

1. Introduction

One of the most important problems that first-year university students encounter is a lack of academic reading ability. These
students are challenged with an abrupt transition from high school to university studies. They have to be faced with highly intense,
academic expectations, which normally require a lot of reading and independent study (Donald, 2002; Halpern, 1998). All these high
school graduates are expected to meet the standards of university academic environments (Pawan & Honeyford, 2009). Although
they might have been successful in reading at the high school level, the amount, contents and purpose of reading in university are still
a major problem for them (Freebody & Freiberg, 2011; Mulcahy-Ernt & Caverly, 2009).
In addition to the students' reading experience, the university instructors' expectations are also a problem. They usually believe
that all first-year university students must possess a sufficient English ability. Considering the situation in Thailand, the average
English Ordinary National Educational Test (O-NET) scores of high school graduates nationwide were very low at 25.4 ± 12.6% in
2013, 23.4 ± 11.6% in 2014 and 25.0 ± 12.5% in 2015 (NITES). At Mae Fah Luang University (MFU), where the participants of
this study were from, the students' average O-NET scores in 2013, 2014 and 2015 were 27.8 ± 11.8%, 33.6 ± 12.7% and
30.8 ± 11.7%, respectively. Importantly, the high standard deviations within a year and the variation in across year means reflect
the very large variation between students in their English proficiency. Each year, a large number of students were not able to pass the
English foundation course, leading to a considerably high percentage of students dropping out. In the 2015 academic year, for


Corresponding author.
E-mail addresses: ckpor19@gmail.com (C. Kaoropthai), nonjaree@chula.ac.th (O. Natakuatoong), Nagul.C@chula.ac.th (N. Cooharojananone).

https://doi.org/10.1016/j.compedu.2018.09.001
Received 7 July 2017; Received in revised form 4 September 2018; Accepted 7 September 2018
Available online 08 September 2018
0360-1315/ © 2018 Elsevier Ltd. All rights reserved.
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

instance, 375 (10.8%) out of 3488 students and 195 (25.8%) out of 755 students failed English 1 in the first and the second semester,
respectively. Because English is the medium of instruction at MFU, it is essential to resolve this problem. Accordingly, these students
need guidance and support to enhance their academic reading ability to enable them to cope with university study (Alexander, 2005;
Alvarez & Risko, 2009; Faridah & Michelle, 2009).
Because it is too complicated to include the first problem of students' lack of intense information and independent reading
experience, this study will first focus on the second problem of their insufficient English ability.
The way we read something will depend on our purpose. We read different texts in different ways. In academic reading, we need
to be flexible when we read (Charubusp, 2010; Goodman, 1967; Hermida, 2009; Oakhill, Cain, & Elbro, 2015; Scarcella, 2003). We
may need to read quickly to find relevant sections in order to then read carefully when we have found what we want. The reading
strategies, such as scanning to find the book or chapter, skimming to get the gist and careful reading of important passages are
essential. It is also important to learn about how texts are structured. Academic reading needs to be interactive. We have to work at
constructing the meaning from what we read. We construct the meaning using our knowledge of the language and knowledge of the
world, continually predicting and assessing.
Academic reading is reading focusing on fundamental comprehension skills essential for understanding, analyzing, and discussing
academic documents: textbook chapters, abstracts, journal articles, research reports etc. Flowerdew and Peacock (2001) dis-
tinguished macro- and micro-reading skills that students need to develop academic reading. The macro-skills include learners' ability
to use their background knowledge to make sense of new material and fit new knowledge into their schema. Significant micro-skills
are recognizing logical relationships, definitions, generalizations, examples, explanations, predictions, and distinguishing fact from
opinion.
As early as the 1940s, Davis (1944) proposed the nine basic comprehension skills in reading. These nine basic skills include:

1. Knowledge of word meanings


2. Ability to select the appropriate meaning for contextual settings
3. Ability to follow the organization of a passage and to identify antecedents and references in it
4. Ability to select the main thought of a passage
5. Ability to answer questions that are specifically answered in a passage
6. Ability to answer questions that are answered in a passage but not in the words in which the question is asked
7. Ability to draw inferences from a passage about its contents
8. Ability to recognize the literary devices used in a passage and to determine its tone and mood
9. Ability to determine a writer's purpose, intent, and point of view, i.e., to draw inferences about a writer (p.186)

As can be seen, many of these comprehension skills are tested in the TOEFL reading test. The following are the ten TOEFL reading
question types:
Basic Information and Inferencing questions.

1. Factual information questions


2. Negative factual information questions
3. Inference questions
4. Rhetorical purpose questions
5. Vocabulary questions
6. Reference questions
7. Sentence simplification questions
8. Insert text questions
Reading to Learn questions
9. Prose summary
10. Fill in a table (Educational Testing Service, 2006, p.20)

Whereas the TOEFL reading test aims to measure academic reading ability, the reading module of the IELTS has two different
versions: the Academic and the General Training. The differences between the two versions are in the topics and the source of the
reading passages. In the Academic Reading version, the topics are general interest topic written for a general audience, from journals,
magazines, books, and newspapers. In the General Training Reading version, on the other hand, the topics are basic social English
training topics, from notices, flyers, timetables, documents, newspaper articles, instructions, and manuals. The types of questions
used in the IELTS Reading module include:

1. Multiple-choice questions
2. Short-answer questions
3. Completing sentences
4. Completing notes, summary, tables, flowcharts
5. Labeling a diagram
6. Choosing headings for paragraphs or section of a text
7. Locating information

133
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

8. Identifying points of view


9. Identifying writer's claims
10. Classifying information
11. Matching lists of phrases (Lougheed, 2010, p.54)

A problem facing many language teachers is having to teach a large classes with varied proficiency levels. What strategies should
be employed to facilitate their teaching? Despite a variety of resolutions, one important step is a diagnostic process. In order to
scaffold students according to their different needs, an efficient diagnostic tool is essential. In response to this particular need, it is
important to create an effective diagnostic test and an intelligent procedure that can generate the most appropriate number of groups
within a big dataset according to the similarities of their problems and personal attributes (Kaoropthai, Natakuatoong,&
Cooharojananone, 2016). Because the problem of academic words and intense information addressed earlier requires a lot of in-
dependent reading experience (Donald, 2002; Halpern, 1998), it is too complicated to include in this study. Thus, the main objective
of this research was to fill the gap by constructing a diagnostic academic reading test and utilizing the data mining (DM) technique of
two-step clustering (TSC) to disclose reader subpopulations and figure out their patterns in achieving their academic reading test
scores. The results are then employed in scaffolding students using appropriate tailor-made self-tutoring lessons.
Because this paper reports the researchers' entire study, the core information in the first part of the paper will be generally the
same as that in the international conference paper entitled “Diagnosing the English as a foreign language (EFL) reading problems
using two-step cluster analysis”, which is the first part of the study, presented at ITHET 2016 (September 8–10, 2016) in Istanbul,
Turkey, published in IEEE Xplore.

2. Literature review

2.1. Educational data mining (EDM)

Data mining (DM) is a computer-based information system (CIBS) devoted to scan big data repositories, generate information, and
discover knowledge (Vlahos, Ferratt, & Knoepfle, 2004). DM reveals data patterns, organizes information of hidden relationships,
structures association rules, composed clusters, and contributes to many kinds of findings not easily produced by other CIBSs
(Howard, Yang, Ma, Maton, & Rennie, 2018; Peña-Ayala, 2014).
In the education arena, DM application has been employed for knowledge discovery, decision making, and recommendation (Asif,
Merceron, Ali, & Haider, 2017; Vialardi-Sacin, Bravo-Agapito, Shafti, & Ortigosa, 2009), and recognized as the educational data mining
(EDM) research field (Anjewierden, Kollöffel, & Hulshof, 2007). EDM is commonly used to design models, tasks, methods, and
algorithms for exploring big data from educational settings.
There are many EDM techniques that can be applied on educational data, which can provide valuable outcomes to assist in
addressing educational issues and problems (Yang & Li, 2018). EDM tasks that have been used for these purposes include: clustering,
prediction, classification, association, sequential pattern, text mining, and outlier detection (Romero & Ventura, 2007). Many EDM
tasks are related and employed to support each other. Clustering is process of grouping physical or abstract objects into classes of
similar objects. According to Klösgen and Zytkow (2002), clustering and classification are both classification methods. Clustering can
be defined as an unsupervised classification and classification is a supervised classification. Classification is also related to prediction.
Classification predicts class labels, whereas prediction predicts continuous-valued functions (Romero & Ventura, 2007).
Based on a meta-analysis (Mohamad & Tasir, 2013), the most popular technique for EDM is clustering, followed by classification,
sequential pattern, and association rule analysis. Back to a survey from 1995 to 2005, however, the association rule analysis was
employed in most of the studies on EDM (Romero & Ventura, 2007), as it doesn't require extensive expertise (Merceron & Yacef,
2007).
In EDM studies, predictive modeling is usually used in predicting student performance, and the most popular task to predict
student performance is classification. The techniques or algorithms applied for it include Decision Tree, Neural Networks, Naïve
Bayes, K-Nearest Neighbor, and Support Vector Machine (Shahiri, Husain, & Rashid, 2015). In this current study, Neural Network
approach using the Two-Step Cluster (TSC) analysis has been adopted for classification.
Neural network is one of the popular techniques used in EDM. The advantage of neural network is its ability to detect all possible
interactions between predictor variables (Gray, McGuinness, & Owende, 2014). It requires less formal statistical training to develop,
can implicitly detect complex non-linear relationships between independent and dependent variables, has the ability to detect all
possible interactions between predictor variables, and can be developed using multiple different training algorithms (Arsad,
Buniyamin,& Manan, 2014; Tu, 1996). Therefore, neural network technique is considered one of the best prediction methods. The
TSC analysis is an exploratory tool designed to reveal natural groupings or clusters within a dataset that would otherwise not be
apparent. The algorithm employed by this procedure has several desirable features that differentiate it from other clustering tech-
niques (IBM Corp., 2013). Because most datasets in the real world contain both categorical and numeric attributes, the traditional
clustering algorithm cannot handle this types of data effectively (Shih, Liu, & Hsu, 2010). The TSC analysis is appropriate for this
study because it can handle both continuous and categorized variables and it also generates the optimum number of clusters. It
divides data into groups or clusters of similar objects. Each group or cluster consists of objects that are similar to one another and
dissimilar to objects in other groups. In the first step of the procedure, it preclusters the records into many small subclusters. Then, it
clusters the subclusters from the precluster step into the desired number of clusters.

134
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

2.2. Scaffolding

In education, scaffolding can be defined as a process or technique through which a teacher or more knowledgeable peer adds
supports for learners in order to enhance learning or facilitate the mastery of tasks. The term scaffolding was introduced by Wood,
Bruner, and Ross (1976). The concept of scaffolding is similar to the work of Vygotsky (1978), who proposed the zone of proximal
development (ZPD), which is the difference between what a learner can do without help and what he or she can do with help. The
learner's ZPD is thus the area just above what learners can do without help but they can do it under certain support (Vygotsky, 1978).
In this paper, scaffolding is defined as the support that facilitates EFL learners to comprehend academic texts, to perform reading
comprehension tasks and to produce a meaningful output. This support or guidance is in the form of scaffolds or “the precise help that
enables a learner to achieve a specific goal that would not be possible without some kind of support” (Sharpe, 2006, p.212).
Scaffolding was used in this study because it was able to support the learning needs of individual learners based on the diagnostic
results of the IDF. It was aimed that scaffolding would be able to facilitate individual learners to acquire minimal academic reading
skills needed for their level before they could be further developed to the next level. The scaffold for each type of learners, in this
study, were the two or three particular academic reading skill self-tutoring lessons to support them to achieve their next level.

3. Research questions

The main objective of this study was to propose an IDF employing a DM-TSC approach to disclose reader subpopulations for the
purpose of scaffolding first-year university students in academic reading skills. For this aim, the two research questions were for-
mulated:

1. What should the structure and algorithm of the IDF be like?


2. To what extent is the IDF able to enhance the academic reading ability of first-year university students?

4. Materials and methods

4.1. Participants

The subjects for the first part of the study were 297 first-year university students who were studying English 1 (a foundation
English course) in the second semester of the academic year 2015 at MFU, a medium-sized university in northern Thailand. Although
a convenience sampling technique was adopted, it was evident that these students were from all academic areas, representing all
levels of English language proficiency. The data collected from these students were used in the DM-TSC analysis to determine the
reader subpopulations.
The subjects for the second part of the study were 570 first-year university students who were studying English 1 in the first
semester of the academic year 2016. The data from these students were used to determine the prediction accuracy of the DM process
and to draw 30 students as lead users to implement the IDF. The term lead user was used because the researchers adopted the concept
of piloting an invented product. Lead users can be defined as users of a new or enhanced product, service, or technology that
currently experience needs still unknown to the general public (von Hippel, 1986).

4.2. Procedures

This section describes the construct of academic reading comprehension and the two instruments used in the study and how they
were developed.

4.2.1. The construct of academic reading comprehension


Investigating the factors contributing to the comprehension in reading, 10 common reading skills were extracted and modified
from the construct of reading comprehension proposed by Davis (1944) and the reading question types in the TOEFL and the IELTS
(Educational Testing Service, 2006; Hawkey & Green, 2012; Hudson, 1996; Lougheed, 2010; van Bemmel & Tucker, 2010; Weir,
Hawkey, Green, Unaldi, & Devi, 2009). The 10 selected reading skills and their characteristics or abilities are displayed in Table 1.

4.2.2. The diagnostic academic reading test (DART)


The instrument used in this study was a tailor-made DART, which was constructed following a test development procedure in
Fig. 1 (adapted from Kaoropthai, Natakuatoong, & Cooharojananone, 2016). After the process of refining the test using the item
analysis, the final version of the academic reading test consisted of six reading passages with 40 multiple-choice questions. The test
measured the 10 skills or abilities essential for academic reading, with four items for each skill.
In assessing the learners' reading skills, a 75% (≥3 out of 4) pass criterion was adopted. Each skill passed by this criterion was
assigned “1”, and if failed “0” was assigned. The DM was employed to diagnose the reading problems. The data used were collected
from 297 first-year university students who took the DART and completed a questionnaire concerning their personal attributes. The
TSC method was then employed to disclose the reader subpopulations according to their problems and personal attributes.

135
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

Table 1
The characteristics of the 10 academic reading skills.
No. Skill Characteristic

1 Explicitly stated meanings Being able to find the answer or fact which is clearly stated
2 Vocabulary Being able to figure out the meaning of unfamiliar words using contextual clues
3 Facts and opinions Being able to differentiate facts from opinions
4 Pronoun reference Being able to identify which word, phrase or sentence the pronoun refers to
5 Sentence structure: core parts Being able to understand the meaning of the sentence using the grammatical knowledge of subject and predicate
6 Main idea Being able to identify the main idea and to differentiate major details from supporting details
7 Text structure Being able to identify how the information within a text is organized: chronological order, compare and contrast, cause
and effect, or problem and solution
8 Inferred meanings Being able to draw a conclusion on the basis of information within a text and reasoning
9 Author's purpose Being able to identify the author's reasons for writing: to inform, to entertain or to persuade
10 Author's viewpoint: tone Being able to identify the author's attitude toward a subject: admiring, concerned, disapproving, negative, objective, etc.

Fig. 1. Test construction procedure of the diagnostic academic reading test (DART).

4.2.3. Academic reading skill self-tutoring lessons (10 skills)


The second instrument necessary for the study was the academic reading skill self-tutoring lessons. These 10 self-tutoring lessons
were used as self-tutoring scaffolds for learners to learn by themselves. This type of scaffolding is called instructional scaffolding. The
lessons were prepared by the authors following the six steps as follows:

1. Reviewing the literature related to academic reading of EFL first-year university students
2. Defining the objectives and the contents of each of the 10 academic reading skill self-tutoring lessons (Table 2)
3. Determining the structure of the self-tutoring lessons: objectives, instructions, contents and answer key
4. Constructing the 10 self-tutoring lessons
5. Having five experts evaluate the Item-Objective Congruence Index (IOC) of the self-tutoring lessons
6. Revising the self-tutoring lessons according to the IOC results and suggestions from the experts.

The final version of the self-tutoring lessons consisted of 10 separate academic reading skill lessons. Based on the DART (pre-test)
scores and the DM results, certain two or three of these self-tutoring lessons, according to the students' weak points, will be assigned

Table 2
The objectives and contents of each of the 10 academic reading skill self-tutoring lessons.
No. Skill Objective Content

1 Explicitly stated To be able to find the answer or fact which is clearly stated Practice of scanning to look for specific information, such
meanings as names or numbers
2 Vocabulary To be able to figure out the meaning of unfamiliar words using Practice using different types of context clues to figure out
contextual clues the meanings of unfamiliar words
3 Facts and opinions To be able to differentiate facts from opinions Practice of identifying facts or opinions, making use of
verbs
4 Pronoun reference To be able to identify which word, phrase, or sentence the pronoun Practice of identifying words, phrases or sentences that the
refers to pronouns refer to
5 Sentence structure: To be able to understand the meaning of the sentence using the Practice of using the core parts to identify the subject and
core parts grammatical knowledge of subject and predicate the main verb of complicated sentences to understand
their meanings
6 Main idea To be able to identify the main idea and to differentiate major Practice of identifying the main idea of a paragraph or a
details from supporting details text
7 Text structure To be able to identify how the information within a text is Practice of identifying different text structures, such as
organized: chronological order, compare and contrast, cause and compare and contrast, cause and effect, problem and
effect, or problem and solution solution by skimming
8 Inferred meanings To be able to draw a conclusion on the basis of information within a Practice inferring the meaning of text that is not directly
text and reasoning stated
9 Author's purpose To be able to identify the author's reasons for writing: to inform, to Practice identifying the purpose of the author for writing a
entertain, or to persuade text
10 Author's viewpoint: To be able to identify the author's attitude toward a subject: Practice identifying tones of texts making use of the word
tone admiring, concerned, disapproving, negative, objective, etc. choice

136
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

Table 3
The initial three subclusters.
Cluster distribution

Cluster Score N %

1 4–10 51 17.2
2 1–5 170 57.2
3 0 76 25.6
Total 297 100

to each type of students to learn (see Table 7). For example, students in type Z, who passed only one skill (see also Table 8), will be
assigned to learn Skill 1 (Explicitly stated meanings) and Skill 5 (Sentence structure: core parts). The students will have two weeks to
practice the two or three skills from the self-tutoring lessons before taking the post-test (the DART).

4.3. Statistical analysis

This study employed management tools for big data. The TSC analysis was employed in this DM process by using the RapidMiner
program. Pearson's correlation was applied in order to reveal the relationship between different reading skills. For research question
2 (RQ2), percentage points were used to determine the effects of the IDF.

5. Results and discussion

This section presents and discusses the findings of the data analyses conducted to answer the research questions.

5.1. RQ1: what should the structure and algorithm of the IDF be like?

The precluster step was performed on the test scores for the 10 reading skills of the 297 students. Using the 75% pass criterion for
each skill, the highest possible score would be 10. This initial analysis generated three subclusters according to their scores, generally,
from high to low (Table 3). However, since the score of every member in Subcluster 3 was 0, it was excluded from further analysis.
The data from the two subclusters (N = 221) were then used as input for further clustering, which generated five clusters of
reader subpopulations, again, from higher to lower scores (Table 4).
In order to determine the relationship between the skills in each cluster, further analysis using Pearson's correlation was computed
to help reveal the readers' patterns of performance. Table 5 displays the positive and negative correlations between the skills within
each cluster.
Based on the positive correlations, the five final clusters can be labeled according to the nature of the skills involved in each
cluster. The five clusters are thus named: Inferential skill, Sentential skill, Interpretative skill, Literal skill, and Structural skill,
respectively. In Cluster 1 (Inferential skill), for example, there was a negative correlation between Skill#2 and Skill#9 but a positive
correlation between Skill#3 and Skill#8, and Skill#5 and Skill#6. We can, therefore, predict that readers who get Skill#2
(Vocabulary) right will tend to get Skill#9 (Author's purpose) wrong, and vice versa. On the other hand, readers who get Skill#3
(Facts and opinions)/Skill#5 (Sentence structure: core parts) right will tend to get Skill#8 (Inferred meanings)/Skill#6 (Main idea)
right, and vice versa.
The information in Table 5 reveals that there were clear relationships among the 10 skills. These relationships can be summarized
to reflect the readers' patterns of performance, as shown in Table 6.
Based in the positive relationships, the 10 reading skills can be divided into four groups:

Group 1: skills 1 & 5 (Explicitly stated meanings and Sentence structure: core parts)
Group 2: skills 3, 8, 5 & 4 (Facts and opinions, Inferred meanings, Sentence structure: core parts and Pronoun reference)

Table 4
The five final clusters.
Cluster distribution

Cluster Score N %

1 4–10 29 13.1
2 2–6 40 18.1
3 1–5 62 28.1
4 1–4 43 19.5
5 1–3 47 21.3
Total 221 100

137
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

Table 5
The relationships between different reading skills.
Cluster Pearson correlation

Correlation between skillsa: Direction Sig.

1 2&9 – 0.05
Inferential skill 3&8 + 0.05
5&6 + 0.05

2 2 & 6 – 0.01
Sentential skill 8 & 10 – 0.01
1 & 7 – 0.05
3 & 5 + 0.05

3 2 & 6 – 0.01
Interpretative skill 3 & 4 + 0.05
4 & 10 + 0.05
6 & 9 – 0.05

4 3&5 – 0.01
Literal skill 5&8 – 0.01
1&5 + 0.05

5 1&4 – 0.01
Structural skill 4&7 – 0.05
5&7 + 0.05

a
For skill number codes, see Table 1.

Table 6
The patterns of corresponding skills.
Skilla Relationship between skillsa

+ –

1 5 7, 4
2 9, 6
3 8, 5, 4 5
4 10 7
5 6 8
6 9
7 (1, 4)
8 10
9 (2, 6)
10 (4) (8)

a
For skill number codes see Table 1.

Group 3: skills 4 & 10 (Pronoun reference and Author's viewpoint.)


Group 4: skills 5 & 6 (Sentence structure: core parts and Main idea)

These four groups reflect that individual skills in each group have something in common, and so there are four patterns of
corresponding skills. Taking the negative relationships into consideration, there can also be 10 other patterns of inverse relationship.
However, for instructional purposes, only the four groups of positive relationship skills are implemented to facilitate students' self-
tutoring.
In conclusion, this IDF is an instructional framework to further develop a prototype instructional model. The answer to RQ1 (What
should the structure and algorithm of the IDF be like?) is summarized schematically in the algorithm flow (Fig. 2) and work flow
(Fig. 3) diagrams.
The algorithm flow diagram (Fig. 2) demonstrates the computation process used in the IDF. There were three major steps. The
First step involved the use of Genetic Algorithm with the first sample (N = 298) to classify learners into a desirable number of clusters
(10 clusters). In the second step, the Neural Network was employed with the same sample of 298 learners and their 10 selected
attributes to estimate the best prediction accuracy of the system (accuracy = 95%). In the last step, the Trained Neural Network was
applied to the second sample (N = 489) to finally classify them into learner types (10 clusters) based on the IDF. The final result was
used as a basis for assigning appropriate self-tutoring lessons to scaffold each type of learners.
The work flow procedures in the IDF can be depicted in Fig. 3. The first part was constructed to set conditions for a diagnosis
before testing the data. This part consists of (1) developing a diagnostic test, and (2) classifying learners into 10 types (10 clusters)

138
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

Fig. 2. Schematic diagram showing the algorithm flow procedure in the IDF.

according to their profiles (attributes), in order to determine self-tutoring lessons for each learner type. The second part is an
intelligent system using web-based testing to determine learner types based on their test scores and attributes. Ten academic reading
skill self-tutoring lessons were developed. The pre-clustering and two-step clustering at this stage were performed on the test score
and attribute data of the first sample (N = 298). The Trained Neural Network was used to analyze the test scores and attributes of the
second sample (N = 489) to classify their learner types and scaffold them with self-tutoring lessons each type needs.

5.2. RQ2: to what extent is the IDF able to enhance the academic reading ability of first-year university students?

After establishing the relationships between the 10 reading skills and discovering patterns of students' reading performance,
another group of 570 first-year university students in the first semester of the 2016 academic year were employed to test the
prediction accuracy of the DART. The 570 students took the DART and completed a questionnaire concerning ten personal attributes
of their: (1) university's school studied, (2) high school graduated, (3) high school study program, (4) university admission category,
(5) high school GPA score, (6) O-NET score, (7) experience abroad, (8) grade level when first learning English, (9) readings for
language enhancement, and (10) major types in language problems.
Because 81 of the 570 students passed either “0” or “10” skills, which had apparent patterns of performance, their data were
excluded from a subsequent analysis. Further analysis of the data from the rest 489 students revealed that there were 10 types of
students/readers: Z, X, E, Y, D, T, A, C, N and B (Table 7). The researchers planned to select 30 lead users from these 10 types. The
term lead users was used in order to signify that the IDF was a pilot product. However, because of the use of proportionate re-
presentatives, each of the last five types could have no more than one representative, so the researchers decided to use lead users from
only the first five types.
The purposive sampling technique was employed to select 30 lead users proportionately from Types Z (9), X (8), E (5), Y (5) and D
(3). These 30 lead users were all from three sections of students of which there were enough numbers of students from the five types.
They were assigned to do self-tutoring lessons of two or three skills, based on the DART scores and the DM results. The skills that lead
users practiced were as follows: Type Z, skills 1, 5; Type X, skills 2, 5; Type E, skills 2, 3, 5; Type Y, skills 5, 7; and Type D, skills 7, 8.
They had two weeks to practice the skills assigned to them before taking the post-test. After two weeks, only 25 lead users came and

139
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

Fig. 3. Schematic diagram showing the work flow in the IDF.

took the posttest. That the other five students did not come for the posttest is probably because they were not prepared for it.
The following table (Table 8) displays the lead users' scores (out of 40) and skills passed (out of 10) on the pretest and the posttest,
and their accomplishment states in terms of whether the posttest scores/skills passed were higher, lower or the same as in the pretest
(before the two weeks of assigned self-study).
Table 8 demonstrates that based on the pretest results, lead users in Types Z, X, and Y, who passed 1 to 3 skills (or score of 11–22),
can be classified as low proficiency learners. Lead users in Type E, who passed 4 to 7 skills (or score of 17–30), are considered
moderate to high proficiency learners. Lead users in the last type, Type D, who passed 4 to 5 skills (or score of 19–24), are moderate
proficiency learners.
Table 9 shows that there were lead users in four (Z, X, E, and D) out of five types who had equal or higher scores in the posttest
than in the pretest. Type Y was the only group where only one lead user had an equal or higher score in the posttest and three had
lower scores. Examining the lead users in Type Z, who had passed only one skill in the pretest or had the lowest level of English
language foundation in the five groups, it is evident that only three out of six lead users had equal or higher scores. In Type X, where

140
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

Table 7
Numbers of lead users from each type of students and skills to practice.
Type No. of students No. of rep. out of 10 types No. of rep. out of 5 types No. of lead users Skills to practicea

Z 129 7.91 8.62 9 1, 5


X 121 7.42 8.01 8 2, 5
E 81 4.97 5.41 5 2, 3, 5
Y 73 4.48 4.89 5 5, 7
D 45 2.76 3.01 3 7, 8
T 19 1.17 – – –
A 10 0.61 – – –
C 6 0.37 – – –
N 4 0.25 – – –
B 1 0.06 – – –
Total 489

a
For skill number codes, see Table 2.

Table 8
Lead users' comparative scores and skills passed.
Type Lead users' Pretest Posttest Score Skills passed

Score Skills passed Score Skills passed + 0 – + 0 –

Z 1 12 1 x x
N=9 2 14 1 11 1 ● ●
3 15 1 12 1 ● ●
4 11 1 11 1 ● ●
5 14 1 14 2 ● ●
6 12 1 13 2 ● ●
7 17 1 13 1 ● ●
8 14 1 x x
9 16 1 x x

X 10 16 2 18 2 ● ●
N=8 11 14 2 15 0 ● ●
12 21 2 29 6 ● ●
13 22 2 25 4 ● ●
14 16 2 x x
15 10 2 11 0 ● ●
16 13 2 8 1 ● ●
17 18 2 11 1 ● ●

E 18 27 6 24 6 ● ●
N=5 19 17 4 25 5 ● ●
20 26 5 34 9 ● ●
21 26 5 22 5 ● ●
22 30 7 30 7 ● ●

Y 23 21 3 12 2 ● ●
N=5 24 21 3 15 3 ● ●
25 17 3 15 2 ● ●
26 17 3 x x
27 18 3 20 2 ● ●

D 28 23 4 28 6 ● ●
N=3 29 24 5 30 5 ● ●
30 19 4 12 1 ● ●

Total 11 3 11 7 10 8

+ = higher 0 = the same - = lower x = missing.

all of them had passed two skills or possessed a little higher English language foundation than Type Z, five out of seven lead users had
equal or higher scores in the posttest. In Type E, the five lead users who had varied levels of English language foundations, passing
four to seven skills and so having moderate to good English language foundations, three lead users had equal and higher scores and
two had lower scores in the posttest. It is noted that though the latter gained lower posttest scores, both of them still maintained the
five and six skills passed for the posttest. In Type D, where the three lead users had passed four to five skills (so a moderate English
language foundation), one of them had a lower posttest score. Type Y, where all of the four lead users had passed three skills, was the
only type where there were more lead users who had lower posttest scores. Examining the number of skills passed, two of them also

141
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

Table 9
Numbers of lead users according to their comparative pretest-posttest scores.
Type No. of lead users Total

Equal or higher posttest score Lower posttest score

Z 3 3 6
X 5 2 7
E 3 2 5
Y 1 3 4
D 2 1 3
Total 14 (56%) 11 (44%) 25

dropped the number of skills passed to two.


In summary, 9 out of 17 lead users in the low proficiency groups (Types Z, X, and Y) had equal or higher posttest scores than in the
pretest, whereas 5 out of 8 in the higher proficiency groups (Types E and D) performed better in the posttest. Overall, there were 14
lead users (56%) who had equal or higher scores and 11 lead users (44%) who had lower scores in the posttest than in the pretest.
Table 10 displays the development of lead users who passed one to seven skills in the pretest. Out of these 25 lead users, 17 (68%)
then passed the same number or more skills while eight (32%) passed fewer skills than in the pretest. The data also revealed that all of
the 11 lead users who passed one, five, six and seven skills in the pretest then passed an equal number or more skills in the post-test.
This supports that the self-tutoring of two or three skills was able to scaffold the lead users' academic reading ability to a certain
extent, and helped them pass an equal number or more skills in the posttest. The results are consistent with the concept of educational
scaffolding that appropriate support can facilitate learners to do what they cannot do without help (Sharpe, 2006; Vygotsky, 1978;
Wood et al., 1976). Examining the 14 lead users who passed two, three and four skills in the pretest, although six of them then passed
an equal number or more skills in the posttest, eight passed fewer skills than in the pretest. Thus, the self-tutoring of two or three
skills might not be sufficient for lead users who initially only passed two, three or four skills in the pretest and so had low to moderate
foundations in English language. With regard to Vygotsky's (1978) zone of proximal development (ZPD), when the difference be-
tween what learners can do without help and what they can do with help is too much, learners will need additional support to help
before they can do it. The self-tutoring of two or three skills for two weeks, especially for lead users who had a limited foundation in
English language (i.e. passed only 2–4 of the 10 skills), might not be adequately effective because only six out of 14 (42.9%)
subsequently passed the same number or more skills in the posttest (after self-tutoring for 2 weeks). In spite of this drawback, overall
17 out of 25 lead users (68%) passed an equal number or more skills in the posttest, which supports that the self-tutoring program
was effective for lead users with a moderate to good English language foundation (passed 5–7 skills), but two weeks may be too short
for lead users with a rather limited English language foundation (2–4 skills passed).
To summarize, from the total score of 40, 14 lead users (56%) achieved an equal or higher score in the posttest, while 17 lead
users (68%) passed an equal number or more skills in the posttest compared to in the pretest. Thus, self-tutoring of two or three skills
for two weeks can be considered rather effective.

6. Conclusions

In order to diagnose the reading problems of EFL first-year university students, a tailor-made academic reading test was ad-
ministered to 297 students at MFU, a northern university in Thailand. Following the IDF procedure, the TSC analysis was performed
on skill scores of the 10 reading skills measured in the test. The precluster step generated three subclusters, and further clustering
revealed five subpopulations or reader patterns. A final analysis using Pearson's correlation indicated that there were four patterns of
corresponding skills to be used for students' self-tutoring.
For the second stage of this study, a group of 570 first-year university students in the first semester of the academic year 2016
were asked to take the DART and complete a questionnaire concerning their 10 attributes. The data were analyzed using DM

Table 10
Numbers of lead users according to their comparative pretest-posttest skills passed.
Skill passed No. of lead users Total

Equal or more skills passed Fewer skills passed

1 6 0 6
2 3 4 7
3 1 3 4
4 2 1 3
5 3 0 3
6 1 0 1
7 1 0 1
Total 17 (68%) 8 (32%) 25

142
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

techniques to test the accuracy of the IDF. The prediction accuracy of the system was highest (95.5%) when utilizing the three major
attributes of the university's school studied, high school graduated, and high school GPA score.
Further analysis of the data revealed 10 types of students/readers. Because only 30 lead users were employed to participate, the
last five types were not evaluated due to the small number of members in those groups.
The 30 lead users from the first five groups were assigned two or three reading skills for self-tutoring for two weeks before taking a
posttest. Five lead users did not show up to take the test, so the results reported were from 25 lead users. Examining the total score
gained, 14 lead users (56%) had equal or higher scores in the posttest and 17 lead users (68%) passed an equal number or more skills
than in the pretest.
We can, therefore, conclude that the IDF is able to diagnose the students' strengths and weaknesses, and predict what skills each
type of students urgently needs to learn to scaffold them one step further in their academic reading ability.
We have demonstrated that the IDF using the TSC data mining technique can be an effective tool to diagnose and scaffold first-
year university students. Because our large datasets contain both categorical and numeric attributes, it is appropriate to employ the
TSC analysis to analyze them. For mixed types of data, the TSC analysis should be used because it can handle both continuous and
categorized data. (Shih et al., 2010).
The advantages of the use of the artificial neural network (ANN) are its minimum requirement of formal statistical training to
develop, its ability to detect complex non-linear relationships between independent and dependent variables between predictor
variables, and its ability to use multiple different training algorithms to develop (Arsad, Buniyamin,& Manan, 2014; Gray,
McGuinness,& Owende, 2014; Tu, 1996).
Although the IDF appears to be successful in diagnosing and predicting what skills each type of students urgently needs to learn to
scaffold them, the overall results can be further enhanced. The scaffolding using two or three discrete-skills self-tutoring lessons
might not be sufficiently effective. The emphasis in the discrete-skills approach is on facilitating short-term reading comprehension;
study skills often are not taught, or are taught generically, without reference to specific academic assignments (Shih, 1992). In reality,
most first-year university students lack academic reading skills, because university-level reading greatly differs from high-school
reading (Hermida, 2009; Shih, 1992). Most students employ non-university strategies to read academic texts resulting taking a
surface approach to reading, while they should take a deep approach to reading (Hermida, 2009). Also, EAP reading instruction
should and can assist ESL students to make the transition from “learning to read” to “reading to learn” (Shih, 1992).
For future research, the following recommendations are made:

1. The amount of materials for self-tutoring of every skill must be sufficient to accommodate readers who have a limited English
language foundation. In addition, assorted types of exercises are recommended.
2. The duration of two weeks may not be sufficient for the self-tutoring of two or three skills, especially for those readers with a
limited English language foundation. It is also important to encourage learners to be self-regulated in performing the self-tutoring.
3. The use of proportionate lead users for each reader type may not be appropriate if the total number of lead users is only 30. If the
total number of participants is small, equal number of lead users for each reader type may be more desirable.
4. It is interesting to add other variables such as self-regulating, attitude, technology use, gender, etc., in the study.
5. Besides its use in academic reading, the IDF can be applied in different language areas and a wide range of academic disciplines.
6. Because one best way to learn reading is reading, it is interesting to implement free voluntary reading (FVR), reading because you
want to (Krashen, 2004), as a tool to enhance academic reading ability.
7. To scaffold students in their academic reading ability, the discrete-skills approach might not be sufficiently effective (Shih, 1992),
a deep approach to reading can be an auspicious tool to investigate (Hermida, 2009).

Acknowledgments

The authors express their gratitude to the 90th Anniversary of Chulalongkorn University Fund (Ratchadaphiseksomphot
Endowment Fund) for financial support of this research study.

References

Alexander, P. A. (2005). The path to competence: A lifespan developmental perspective on reading. Journal of Literacy Research, 37(4), 413–436.
Alvarez, M. C., & Risko, V. J. (2009). Motivation and study strategies. In R. F. Flippo, & D. C. Caverly (Eds.). Handbook of college reading and study strategy research (pp.
199–219). (2nd ed.). New York: Routledge.
Anjewierden, A. A., Kollöffel, B., & Hulshof, C. (2007). Towards educational data mining: Using data mining methods for automated chat analysis and support inquiry
learning processes. Paper presented at the International Workshop on Applying Data Mining in e-Learning (ADML).
Arsad, P. M., Bauniyamin, N., & Manan, J. A. B. (2014). Students' English language proficiency and its impact on the overall students' academic performance: An
analysis and prediction using Neural Network Model. WSEAS Transactions on Advances in Engineering Education, 11, 44–53.
Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate students' performance using educational data mining. Computers & Education, 113,
177–194. https://doi.org/10.1016/j.compedu.2017.05.007.
van Bemmel, E., & Tucker, J. (2010). IELTS to success: Preparation tips and practice tests (3rd ed.). Milton, Qld: John Wiley and Sons.
Charubusp, S. (2010). Effects of academic literacy-based intervention on Thai university students' English reading proficiency and reading self-efficacyBangkok, Thailand:
Chulalongkorn University (Unpublished doctoral dissertation).
Davis, F. B. (1944). Fundamental factors of comprehension of reading. Psychometrika, 9, 185–197.
Donald, J. G. (2002). Learning to think: Disciplinary perspectives. The Jossey-Bass Higher and Adult Education Series: ERIC.
Educational Testing Service (2006). The official guide to the new TOEFL iBT. New York: McGraw-Hill Education.
Faridah, P., & Michelle, H. (2009). In R. F. Flippo, & D. C. Caverly (Eds.). Academic literacy and the new college learner(2 ed.). New York: Routledge.

143
C. Kaoropthai et al. Computers & Education 128 (2019) 132–144

Flowerdew, J., & Peacock, M. (Eds.). (2001). Research perspectives on English for academic purposesCambridge: Cambridge University Presshttps://doi.org/10.1017/
CBO9781139524766.
Freebody, P., & Freiberg, J. M. (2011). The teaching and learning of critical literacy: Beyond the “show of wisdom”. In M. L. Kamil, P. D. Peardson, E. B. Moje, & P. P.
Afflerbach (Vol. Eds.), Handbook of reading research. Vol. 4. Handbook of reading research (pp. 432–452). New York: Routledge.
Goodman, K. S. (1967). Reading: A psycholinguistic guessing game. Literacy Research and Instruction, 6(4), 126–135.
Gray, G., McGuinness, C., & Owende, P. (2014). An application of classification models to predict learner progression in tertiary education. Paper presented at the 2014
IEEE international advance computing conference (IACC).
Halpern, D. F. (1998). Teaching critical thinking for transfer across domains: Disposition, skills, structure training, and metacognitive monitoring. American
Psychologist, 53(4), 449–455.
Hawkey, R., & Green, A. (2012). An empirical investigation of the process of writing Academic Reading test items for the international English Language Testing System Studies
in Language Testing. Cambridge University Press.
Hermida, J. (2009). The importance of teaching academic reading skills in first-year university courses. International Journal of Research & Review, 3, 20–30.
von Hippel, E. (1986). Lead users: A source of novel product concepts. Management Science, 32(7), 791–805. https://doi.org/10.1287/mnsc.32.7.791.
Howard, S. K., Yang, J., Ma, J., Maton, K., & Rennie, E. (2018). App clusters: Exploring patterns of multiple app use in primary learning contexts. Computers &
Education, 127, 154–164. https://doi.org/10.1016/j.compedu.2018.08.021.
Hudson, T. (1996). Assessing second language academic reading from a communicative competence perspective: Relevance for TOEFL 2000. Princeton, NJ: Educational
Testing Service.
IBM Corp (2013). IBM SPSS statistics for windows, version 22.0. Armonk, NY: IBM Corp.
Kaoropthai, C., Natakuatoong, O., & Cooharojananone, N. (2016). Diagnosing the English as a foreign language (EFL) reading problems using two-step cluster analysis.
Paper presented at the 2016 international conference on information technology based higher education and training (ITHET).
Klösgen, W., & Zytkow, J. M. (2002). The knowledge discovery process. In W. Klösgen, & J. M. Zytkow (Eds.). Handbook of data mining and knowledge discovery (10-21).
New York: Oxford University Press, Inc.
Krashen, S. D. (2004). The power of reading: Insights from the research: Insights from the research. ABC-CLIO.
Lougheed, L. (2010). Barron's IELTS: International English language testing system (2nd ed.). New York: Barron’s Educational Series.
Merceron, A., & Yacef, K. (2007). Revisiting interestingness of strong symmetric association rules in educational data. Paper presented at the International Workshop on
Applying Data Mining in e-Learning 2007.
Mohamad, S. K., & Tasir, Z. (2013). Educational data mining: A review. Procedia-Social and Behavioral Sciences, 97, 320–324.
Mulcahy-Ernt, P. I., & Caverly, D. C. (2009). Strategic study-reading. Handbook of college reading and study strategy research: vol. 2, (pp. 177–198).
NITES. O-NET (Ordinary National Educational Test). Retrieved from: http://www.niets.or.th/uploads/content_pdf/pdf_1500007940.pdf.
Oakhill, J., Cain, K., & Elbro, C. (2015). Understanding and teaching reading comprehension: A handbook. New York: Routledge.
Pawan, F., & Honeyford, M. A. (2009). Academic literacy. Handbook of college reading and study strategy research, Vol. 2, 26–46.
Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41(4), 1432–1462.
Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146.
Scarcella, R. (2003). Academic English: A conceptual framework. UC Berkeley: University of California Linguistic Minority Research Institute. Retrieved from https://
escholarship.org/uc/item/6pd082d4.
Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A review on predicting student's performance using data mining techniques. Procedia Computer Science, 72,
414–422.
Sharpe, T. (2006). ‘Unpacking’ scaffolding: Identifying discourse and multimodal strategies that support learning. Language and Education, 20(3), 211–231. https://doi.
org/10.1080/09500780608668724.
Shih, M. (1992). Beyond comprehension exercises in the ESL academic reading class. Tesol Quarterly, 26(2), 289–318.
Shih, M.-J., Liu, D.-R., & Hsu, M.-L. (2010). Discovering competitive intelligence by mining changes in patent trends. Expert Systems with Applications, 37(4),
2882–2890. https://doi.org/10.1016/j.eswa.2009.09.001.
Tu, J. V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of Clinical
Epidemiology, 49(11), 1225–1231.
Vialardi-Sacin, C., Bravo-Agapito, J., Shafti, L., & Ortigosa, A. (2009). Recommendation in higher education using data mining techniques. Paper presented at the
international conference on educational data mining (EDM)(2nd, Cordoba, Spain, july 1-3, 2009), Cordoba, Spain.
Vlahos, G. E., Ferratt, T. W., & Knoepfle, G. (2004). The use of computer-based information systems by German managers to support decision making. Information &
Management, 41(6), 763–779.
Vygotsky, L. S. (1978). In V. J. Steiner, M. Cole, S. Scribner, & E. Souberman (Eds.). Mind in society: Development of higher psychological processes. Cambridge, MA:
Harvard University Press.
Weir, C., Hawkey, R., Green, A., Unaldi, A., & Devi, S. (2009). The relationship between the academic reading construct as measured by IELTS and the reading
experiences of students in their first year of study at a British university. International English Language Testing System (IELTS) Research Reports 2009: vol. 9, (pp. 97–
156).
Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. The Journal of Child Psychology and Psychiatry and Allied Disciplines, 17(2), 89–100.
Yang, F., & Li, F. W. B. (2018). Study on student performance estimation, student progress analysis, and student potential prediction based on data mining. Computers &
Education, 123, 97–108. https://doi.org/10.1016/j.compedu.2018.04.006.

144

You might also like