You are on page 1of 10

2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

Pages: 224 - 233


Discovering Bloom Taxonomic Relationships between Knowledge Units Using
Semantic Graph Triangularity Mining

Fatema Nafa, Javed I. Khan, Salem Othman, and Amal Babour


Department of Computer Science, Kent State University
Kent, Ohio, USA
Email: fnafa, Javed, sothman,ababour@kent.edu

Abstract—Inferring Bloom’s Taxonomy among knowledge and attitudes, and a psychomotor taxonomy that addressed
units is important and challenging. This paper proposes a the motor skills related to learning. One of the cognitive
novel method that can identify the revised Bloom’s Taxonomy taxonomies [4] is known as Bloom’s Taxonomy. Bloom’s
levels among knowledge units in the Semantic Cognitive Taxonomy has been applied in the field of computer
Graph (SCG) by using a graph triangularity. The method science for various purposes such as managing course
determines significant relationships among knowledge units design [6], measuring the cognitive difficulty levels of
by utilizing triangularity of knowledge units in the computer computer science materials [8], and structuring
science domain. We share an experiment that evaluates and assessments [9]. Bloom’s Taxonomy has also been used in
validates the method on three textbooks. The performance
grading instead of grading on a curve [10]. Additionally,
analysis shows that the method succeeds in discovering the
hidden associations among knowledge units and classifying
from the mining perspective, there has been some
them. interesting research about extracting relations among
concepts. Relations could be replaced by a synonym,
Keywords— Semantic Graph; Text Mining; Knowledge relationships, or a hypernym, association, etc. [11] [12].
Unit; Graph Mining; Bloom Taxonomy. These relations are successfully used in different domains
and applications [3].
I.INTRODUCTION Our work is related to the extraction of the semantic
relations and graph triangularity mining of knowledge
Text is the most significant repository of human
units. The representation of the extracted relationship is a
knowledge. Discovering useful knowledge from a large text
graph. There has been some research on graphical text
is known as text mining. Text mining has been applied in a
representation such as Concept Graphs [13] and ontology
great number of fields in several topics. One of its goals is
[14]. The authors proposed Concept Graph Learning to
to try to improve the quality of textbooks [1] and extract
present relations among concepts from prerequisite
dependency between knowledge units from a textbook [2].
relations among courses. Manually predicating the
Besides, understanding a new knowledge unit often relies
relationships among knowledge units based on CSCD is not
on the understanding of the current knowledge units [3][4].
a good solution to this problem, because it is very time-
Connecting the dots based on cognitive skills among
consuming, and requires experts in the domain of computer
knowledge units in a textbook is a cognitive problem. A
science. Using graph triangularity mining is a promising
learner is not expected to understand the textbook based on
way to reach this goal.
the given ordered knowledge units. Thus, a shared language
This paper presents a method for mining cognitive
to provide a highlighted learning map of a textbook based
skills among the knowledge units in a textbook. The
on cognitive skills is needed.
method is based on using graph tringluarity to discover
The taxonomy idea was first introduced by Benjamin
relationships among knowledge units. The method has
Bloom [4]. Bloom identified three domains of educational
flexibility of giving the new sequential ordering of the
activities: the cognitive domain (mental skills), the affective
knowledge units in a textbook. According to the
domain (growth in feelings or emotional areas), and the
experimental evaluations, the method can efficiently
psychomotor domain (physical skills) [5]. The cognitive
identify the relationship type among knowledge units.
domain is divided into six levels: 1) knowledge, 2)
Building an automatic technique to assist in organizing
comprehension, 3) application, 4) analysis, 5) synthesis, and
knowledge units based on the level of cognitive skills will
6) evaluation. The Bloom model was modified in 2001 by
provide a new learning trajectory path for the learners and
Anderson and a team of cognitive psychologists [6].
will help in circumventing their deficit in learning any
Significant changes were made to the Bloom’s Taxonomy
textbook.
model. The original taxonomy of educational objectives is
The rest of paper is organized as follows. The problem
referred to as Bloom’s Taxonomy [3] and Anderson’s work
definition is presented in section II. Section III provides
is known as Revised Bloom’s Taxonomy [6]. Revised
information about our method as well as it a description of
Bloom’s was modified to the Computer Science based
the method steps in detail. The experiment step and
Cognitive Domain (CSCD) [7] to make it appropriate for
evaluation of the method are explained in Section IV.
the concept domain in computer science. For this paper,
Section V demonstrates the conclusion and future work
only the cognitive domain is used.
To the best of our knowledge, there has been a II.PROBLEM DEFINITION
significant amount of research two perspectives. From the
linguistics perspective, theorists developed three different In this section, we introduce some terms used in this
taxonomies to represent the three domains of learning: a paper and, then define problem.
cognitive taxonomy focused on intellectual learning, an Concept (C): Captures the most important terms in a text
effective taxonomy concerned with the learning of values that describe a specific domain.

978-1-5090-5154-0/16 $31.00 © 2016 IEEE 224


DOI 10.1109/CyberC.2016.52
Semantic Cognitive Graph (SCG): It is a directed Graph III. OVERVIEW OF THE METHOD
SCG = (C, V) where C (concept) represents nodes and V In this section, we present our proposed method. For
(verb) represents labels of the relationship among concepts. different purposes, we divide the method into five steps: A)
Knowledge Unit (KU): is defined as the smallest part of The Text pre-processing step which is an important step to
knowledge in a given domain and consist of a group of convert the original textbook to a new structured normalize
sentences kui= {s1, s2…sm}. textbook, where the normalize text can serve as an input for
Cognitive skill (CS): Describes the recall or recognition the other steps. B) Semantic Graph Extraction step which
of specific facts, procedural patterns, and concepts that is a technique based on the verb, where the verb expresses
serve in the development of intellectual abilities and skills. the action in the sentence, and handles the simple and the
Bloom Taxonomy (βi): is a classification system developed complex sentence structures (e.g. clauses and conjunctive
in 1956 by education psychologist Benjamin Bloom to sentences). C) The Relationships indexing step which is a
categorize intellectual skills and behaviors that are way of indexing the normalized textbook in order to
important for learning [2]. retrieve the sentence(s) corresponding to a given
Problem Definition concept(s). D)The Relationships ordering step which is a
Given 1) a textbook TB that contains a set of knowledge way of ordering the extracted concepts to imitate the
units KU = {ku1, ku2… kun} where n is the number of prerequisite relationships defined by Bloom Taxonomy [4].
knowledge units in TB. Each knowledge unit is a group of E) Semantic Cognitive Graph Tringluarity Extraction step
sentences kui= {s1, s2…sm} and each sentence is composed which is a technique for grouping a set of concepts
of a sequence of concepts si= {c1,..,co}; 2),the Computer- according to a target concept that represents a knowledge
Science based Cognitive Domain (CSCD) (Understanding, unit. Fig.2 presents the steps (defined above) of the system
Analyzing, Applying-Evaluating, and Creating) levels are whereas the input is a textbook in Portable Document
denoted as {B1, B2, B3, B4} respectively [7]. Each level has Format (PDF) denoted as TB and the output is a set of
B B B B
a subset of measurable verbs. The objective of this work is subgraphs knowledge units, G={G1, G2,…, Gn} where n is
to find out the interconnection function ƒ(x): KU→βi which the number of the clusters, clustered according to CSCD.
maps knowledge units (KU) according to Computer- Now let us describe each step of our method in details.
Science based Cognitive Domain (CSCD) βi= {B1, B2, B3, Step#1 text preprocessing

B4} using the subset of measurable verbs. To handle this Extract TOC OF TB
problem, we create a semantic graph GT from the given Textbook.pdf
Step#3 Relationships
indexing
textbook TB in order to find the relationships among Convert pdf to TB
Step#2 Textual Graph Extraction
Step#4 Bloom Taxonomy Levels
Relationships
concepts in the knowledge unit then relationships among Convert TB Indexing
Relationships
ordering
to Knowledge Unit T
knowledge units themselves. Mining Semantic-Cognitive
Graph Triangularity will output the hidden links among Stop Word Filtering(T) Knowledge Unit
Clustering

knowledge units according to CSCD levels. Stemming(T) Knowledge Unit1


Knowledge Unit2
For example, consider that from a textbook two
knowledge units (KU#1and KU#2) have been chosen, and Tokenizing(T) Relationships Extraction

we need to find a relationship between knowledge units


Parsing(T) Normalized(T`)
based on CSCD level. As described in the problem Step#5 Bloom Graph Visualization

definition we have two main problems:


1) Converting a textbook to semantic graph GT is a Figure 2. Overview of the method.
directed graph GT = (C, V) where C (concept) represents
nodes, and V (verbs) represents labels of the relationship A. Text Preprocessing
among concepts.
In this step, we first determine the input which is a
2) Finding out the relationship (X) among knowledge textbook TB. For getting the desired set of clusters, the TB
unit#1 and knowledge unit#2 based on (CSCD) βi= {B1, is treated as a set of Knowledge Units KU= {ku1,..kun}
B2, B3, B4}. Fig.1 represents sub-part of GT for two where each Knowledge Unit kui has a set of sentences,
knowledge units. kui={s1,..,sm}, and each sentence si is a set of concepts
si={c1,…,co}. For each sentence, we first remove the stop
words using 1) the standard stop words list [15]: which
consists of a set of words that have general communication
and do not add information, and 2) the specific stop words
list: which contains a set of words that have no discriminate
value within the domain concepts context. We build the
specific stop words list manually, because there is no stop
list related to the domain under study.
Second, we extract adjacent phrase concepts of size two
that appear in the sentence known as bigram [16]. Bigrams,
such as “Binary-Tree”, help to acquire domain concepts
that are strongly associated with the knowledge unit that
Figure 1. Two Knowledge Units and Relationships Between them.
resulted in a higher performance as they provide more
direct semantic information. Third, the stemming is
performed on the sentence to remove and replace word

225
suffixes in order to arrive at the common root form of a there is a verb ‫ݒ‬௜ between them in the sentence,‫ݏ‬௜ then we
specific concepts. Stemming is accomplished using Porter measure൫ܿଵǡ ܿଶ ൯distance from ‫ݒ‬௜ using equation (1). If the
Stemmer [17]. Fourth, we split each sentence into a distance is greater than or equal to α threshold it saves the
sequence of tokens where tokens are individual concepts. tripleሾ‫ܥ‬ଵ ǡ ܸǡ ‫ܥ‬ଶ ሿ as a graph structure where ‫ܥ‬ଵ ܽ݊݀‫ܥ‬ଶ
While, there are a number of tokenization models for represent nodes and ‫ݒ‬௜ represents an edge between them.
sentences, however, Natural Language Toolkit (NLTK) We maintain two sets: leader noun (‫ܮ‬ே ) and follower noun
tokenizer [18] has been used because of its efficiency of (‫ܨ‬ே ) of‫ܥ‬௜ . The set ‫ܮ‬ே consists of concepts which occur
memory and high-speed processing. before ‫ݒ‬௜ in sentences. Similarly, the set ‫ܨ‬ே consists of a set
Finally, we parse each sentence to get its part-of-speech of concepts that occur after ‫ݒ‬௜ in the sentences. This
(verb, noun, adjective, etc.) using a set of formal grammar algorithm cannot identify the direction of the relationship
rules, that help to gain an understanding of the precise ‫ܥ‬ଵ՜ ‫ܥ‬ଶ or ‫ܥ‬ଶ՜ ‫ܥ‬ଵ .Thus, our assumption is ‫ܥ‬ଵ ‫ܮ א‬ே and
meaning of the sentence. The Stanford parser has been used ‫ܥ‬ଶ ‫ܨ א‬ே are used to ensure the direction of the relationship
[19]. The following example explains the preprocessing fromܿଵ toܿଶ .
steps.
‫ܦ‬௜௝ ൌ ቀሺܸ௜ െ ‫ܥ‬௜ ሻ ൅ ൫ܸ௜ െ ‫ܥ‬௝ ൯ቁ െ ቀʹ ‫ כ‬ሺܸ௜ െ ‫ܥ‬௜ ሻ ‫ כ‬൫ܸ௜ െ ‫ܥ‬௝ ൯ቁሺͳሻ
i) A sentence S has been chosen from the textbook [20].

S: “The heap sort algorithm starts by using build max heap Algorithm 2: Semantic Graph Extraction
to build a max heap on the input array.”
ii) Stop-words removed from S. S: “The heap sort Input :( T` :text as string)
algorithm starts by using build max heap to build a max Output: GT (N,L) where GT is a Directed Graph
heap on the input array.” Def Extraction(S: Sentences, alpha: integer, Tag :string):
For Each S in T:
iii) Bigrams extracted from S. S: “heapsort algorithm
starts using build maxheap build maxheap inputarray.” Tokens = nltk.word_tokenize(S)
iv) Stemming performed on S. S: “heap sort algorithm Tag= nltk.pos_tag(Tokens)
starts use build max heap build max heap input array.” For Each Concept C in Tag:
v) Tokenizing completed on S. S: ['heap', 'sort', Position = Tag. Index(C)
'algorithm', 'start’, ‘use', 'build', 'max', 'heap', 'build', 'max', // Check the Tag of Concept C in the Tag
'heap', 'input', 'array'] If C[1][0]= "V"or C[1][0]= "N" then:
vi) Parsing performed on S. S: [('heap', 'NN'), ('sort', count=count+1
'NN'), ('algorithm', 'NN'), ('start', 'NNS'), ('use', 'VBG'),
check[C].Tag= C[1][0]= "V" :
('build', 'JJ'), ('max', 'NN'), ('heap', 'NN'), ('build', 'VB'),
('max', 'NN'), ('heap', 'NN'), ('input', 'NN'), ('array', 'NN')]. Vlist[].append(C[0],count)
Algorithm-1 in Fig.3 demonstrates the preprocessing steps. Else
Nlist[].append(C[0],count)
Algorithm-1: Text preprocessing Return (Vlist, Nlist, Position)
Input: TB // Check the Position of Concept in the sentence
Output: T’ Def Check_Pos (Vlist, Nlist, Position):
1. Read(TB) LN=[ ] // list for the Leader nouns
2. Separate(TB) into (T) FN=[ ] // list for the Follower nouns
3. For each t ‫ א‬T do: Vlist=[ ] // list of Verbs
4. Separate(t) into (S) For Each Noun n in Nlist:
5. For each s ‫ א‬S do: // Check the positions of Leader noun and the
6. Remove Stop-words(s) Follower noun for the verb in the sentence
7. Generate Bigrams(s) If n in Nlist < Position :
8. Stem(s)
LN. Append(n)
9. Tokenize(s)
10. Tag(s) Else
11. T=Join(s) FN. Append(n)
12. Return T’ End if
Figure.3. Text Preprocessing Algorithm. End for
Return [LN, Verb , FN]
B. Sementic Graph Extraction Def Distance (LN : list, FN : list, Verb: list, α: integer):
From the previous step, we get normalized knowledge Distance=999 , α >=0.5
If Len( LN )>0 and Len( FN )>0
units with needed information. For each sentence in each
For C1 in FN:
knowledge unit, we extract the relationships, in the form of
If C1[0]!= C2[0]:
concept-verb-concept, among concepts semantically For C2 in FN :
connected and close to each other in the same knowledge If C1[0]!= C2[0]:
unit. Lyons and other [20] structural linguists hold that D1=V[1]- C1[1]
“words cannot be defined independently from other words. D2=V[1]- C2[1]
A word’s relationship with other words is part of the D=((D1+D2)-(2*(D1*D2)))
meaning of the word”. If D > α
As shown in algorithm 2 in Fig.4 the input is a D.Remove
normalized knowledge unit and the output is a textual graph Else
GT. The algorithm starts by searching each sentences ܵ௜ in Triple. Append(( C1[0],VV[1], C2[0],D))
ܶ ᇱ  to extract a candidate pair of the concepts൫ܿଵǡ ܿଶ ൯. If Return Triple
Figure.4. Semantic Graph Extraction Algorithm.

226
C. Relationships indexing Algorithm 3: Relationships Indexing
Input :( TOC,TB,[ଵ ଵ ଶ ሿ)
In order to get an index of a sentence(s) that contains a Output: Index Format for [Ci,Vi,Cj]
tripleሾ‫ܥ‬ଵ ǡ ܸǡ ‫ܥ‬ଶ ሿ, we use a specific algorithm as illustrated in Def Knowledge Unit (P):
Fig.4 for indexing and build an index format-based Knowledge Unit=open(TOC.txt','r')
dictionary. The algorithm converts T’ into a set of sentences dic={}
ܵ ൌ ሼ‫ݏ‬ଵ ǡ ‫ݏ‬ଶ ǡ ǥ Ǥ ǡ ‫ݏ‬௡ ሽ then saves the table of content (TOC) of the lastfound=''
textbook including page numbers as a key to keep track of For line in Knowledge Unit :
sent=line.split():
the hierarchy of knowledge units, sub- knowledge units,
l=len(sent)
paragraphs, sentences, and concepts
respectivelyሼܶ௜ ǡ ܵܶ௜ ǡ ܲ௜ ǡ ܵ௜ ǡ ‫ܥ‬௜ ሽ. The dictionary saves the index first=sent[0]
format based on page numbers returned from TOC and then last=sent[l-1]
the position of the concepts in the sentences is computed to mi=sent[1:l-1]
revert the full indexing format. For example, as an input for If int(last)<=p:
the algorithm is: textbook, TOC, and tripleሾ‫ܥ‬ଵ ǡ ܸǡ ‫ܥ‬ଶ ሿ.The lastfound=first
purpose is to find the index for triple ሾ‫ ͳܥ‬ǡ ܸǡ ‫ ʹܥ‬ሿ ,so the dic[first]= ' '.join(mi)
indexing system will return the indexing format for the pp=''
tripleሾ‫ܥ‬ଵ ǡ ܸǡ ‫ܥ‬ଶ ሿ. If '.' in tmp[0]:
Let us assume that our TOC as in Table I and T’ is a pp=pp+''+str(tmp[0].split('.')[0])+':'+str(tmp[0])+':'
used textbook, we need to search for the hierarchal data for Else:
the tripleሾ‫ܥ‬ଵ ǡ ܸǡ ‫ܥ‬ଶ ሿ. The indexing algorithm will return the pp=pp+''+str(tmp[0])+':'
result in Table II. Return PP
Def Knowledge_Unit_Index(set,list):
TABLE I. TABLE OF CONTENT OF A TEXT-BOOK
Index =''
1. knowledge unit 20 words = sent.strip().split(' ')
1.1. Sub- knowledge unit 22 For itm in lst:
If itm in words:
1.1.1. Sub-sub- knowledge unit 28
pos = words.index(itm)
Index = Index +str(pos) +'.'
TABLE II. INDEXING FORMAT Return Index

Indexing Format Triple Figure.5. Relationships Indexing.


1:1:1.1:1:1:1.2.3 ଵ ଵ ଶ Algorithm 4: Relationships Ordering
Input : GT
Where the numbers from the left to right mean: 1: is a Output: Input: G`T
textbook number, 1: is a knowledge unit number, 1.1: is a Def Topological_DFS Unit (GT):
sub- knowledge unit number 1: is a paragraph number, 1: 1. For each Knowledge_Unit in GT:
is a sentence number, 1: position of the ܿ௜ inܵ௜ , 2: position 2. DFS(GT, Knowledge_Unit,Cycle)
of the ‫ݒ‬௜ inܵ݅ , 3: position of the ‫ܥ‬ଶ inܵ௜ . Where the last 3. If cycle=Found:
4. Break // cycle found in Graph
three digits are local indexing related toܵ௜ .it means (Book
5. Orderlist=[ ]
number, knowledge unit number, sub-section number, 6. Return G`T
sentence number, leader concept number, verb number and Figure.6. Relationships ordering Algorithm.
follower concept number)
D. Relationships ordering E. Knowledge Unit Clustering
Once we extract the relationships between knowledge After we convert our textbook as ordering G`T, it is time
units, an ordering method is needed to imitate the to cluster the ordered Knowledge Unit in the G`T into the
prerequisite relationships that are defined by Bloom’s Computer-Science based Cognitive Domain (CSCD) levels
Taxonomy [4], in order to, provide a way to set the puzzle βi= {B1, B2, B3, B4}. We explore the triangle relationships
of learning new knowledge units based on the required in G`T as a way to measure the connectivity among
cognitive skills for each knowledge unit. We utilize the concepts, and how those concepts are connected to a given
topological sort with Depth First Search (DFS) graph Knowledge Unit and also investigate the association
techniques [21] for finding subgraphs in the textual graph among Knowledge Unit themselves. The Graph-
GT which represent the knowledge units and then determine Triangularity based technique presents triangles from a
and order the prerequisite relationships between them. geometrical point of view which indicate a strong
Finally, as shown in Fig.6, we return the knowledge units connectivity between the concepts and Knowledge Unit.
in the reverse order of their finishing times and the output Our technique is based on the shared edges between
is the order G`T which contains all the knowledge units triangles in G`T.Before we start describing Graph-
with prerequisite relationships between them. Triangular mining technique there are a number of
parameters it counted to improve the clustering quality:
1) A Knowledge unit can be a concept in itself and it can be
the main knowledge unit. Also, sometimes a Knowledge
unit appears only as a concept but not as a knowledge unit.
In this case we add it manually.

227
2) Overlap among concepts in Knowledge units defines to be at high level of learning. Fig.8. represents the
how the Knowledge units are strongly connected. concepts-Tringluarity that shared the edge (tB) with t i.e.
3) Overlap among verbs that connect the concepts based on  ୗ୘ ൌ ሼ–ǡ –…ሽ
Bloom measurable verbs list [22] some verbs overlapped 2) CMT (Multi-Triangle): if t denotes the target node
among levels and it is hard for the method to take the which represents one of any triangle’s nodes with t then
decision Human qualified judges are required to provide ‫ܥ‬ሾ‫ݐ‬ሿȁ ‫ ܥ א‬ௌ் ሽ
the correct clustering to guide our system.
4) Knowledge unit can be related to the domain, or can be  ୑୘ ൌ ሼሺ–ሻȁ൫ሺ–ሻ൯‹• ଷ ሽ
more general. In this case weak connectivity appears The concepts in this clustering represent another levels
between knowledge units we clustered as disconnected of the highest level of skills which is an Evaluation level,
class. the Tringluarity displays knowledge about multiple
5) Well-written knowledge units in the textbook performed instances of the knowledge unit. Additionally, it requires a
a worthy cluster while the poor written knowledge units judgment based on some form of measurement criteria for
resulted in an ordinary cluster. some knowledge units. Fig.8 represents the concepts-
Tringluarity in this class i.e.
Algorithm 4: Knowledge Unit Clustering ‫ ܥ‬ெ் ൌ ሼ‫ܦܧݐ‬ǡ ‫ܼܹݐ‬ሽ
Input G`T
3) CWC (Weak-Connection): if t denotes the target node
Output: Knowledge Units Cluster as a Directed
Graph which has neighbors but doesn’t represent any triangle
Def ExtractTriangles_From_Graph(self): with t then ܰሺ‫ݐ‬ሻ ‫ ܥ א‬ௐ஼
1. Triangle=[ ] ‫ ܥ‬ௐ஼ ൌ ሼ‫ܥ‬ሺ‫ݐ‬ሻȁܰሺܰ‫ܥ‬ሺ‫ݐ‬ሻሻሻ ്  ‫ܭ‬௡ ሽ
2. Knowledge_Unit=set()
3. For each(n) in Graph(self):
Fig. 8 represents the concepts with non Tringluarity in
4. Knowledge_Unit.add(n) this class i.e.
5. Knowledge_Unit _Nighbour=set()  ୛େ ൌ ሼ

6. Nighbours=set(self. Graph [n]) The concepts in this clustering represent a level of the
7. For Neighbor in Neighbor’s:
8. If Nighbour in Knowledge_Unit: skills which is the Applying and Analyzing level, because
9. continue the relationship Tringluarity is represented by the
10. Nodes_Nighbour.add(Nighbour) connectivity of the components of the concepts based on
11. For Nighbour_of_Nighbour in Bloom verbs which is show the Appling majority for other
Nighbours.intersection( self. Graph [Nighbour]):
12. Nlist[].append(C[0],count) concepts as well as for the knowledge units.
13. If Nighbour_of_Nighbour in Nodes or 4) CDT (Disconnected): if neighbor of t has neighbor but
Nighbour_of_Nighbour in Nodes_Nighbour: doesn’t represent any triangle t then ‫ܥ‬ሺ‫ܥ‬ሺ‫ݐ‬ሻሻȁ ‫ ܥ א‬஽் ሽ
14. continue ‫ ܥ‬஽் ൌ ሼ‫ܥ‬ሺ‫ݐ‬ሻ ‫ݐ ך‬ሽ
15. Triangle.append(
(n,Nighbour,Nighbour_of_Nighbour) ) This type of clustering represents the concepts that are
16. Return Triangle(GB) in one of the lowest levels of skill which is the
Figure.7. knowledge Units Algorithm. Understanding and Remembering level, because the
Algorithm 4 in Fig.7 represents the Graph-Triangular connectivity of the concepts with the knowledge unit are
mining as follows: The input is the order Graph G`T and feeble. Most of the concepts are common and not related to
C(t) denote the neighborhood of target concept (t) where t a specific domain.
represents a Knowledge Unit to be analyzed such as a Fig.8 represents the disconnected concepts in this class
quicksort. If C(t) is the neighborhood of t where – ‫ك‬ i.e.
ǡ ሺ–ሻ ൌ ሼ‫א୲׫‬୘ ሺ–ሻ ോ ሽ. Let e be the edge relationship ‫ ܥ‬ௐ஼ ൌ ሼܱǡ ܲǡ ܳǡ ܴǡ ܵǡ ܸሽ
and e= {Understanding, Analysis, Applying, and
Evaluating, and Creating} is recommended to achieve high
cognitive skills. Any learner who acquires these knowledge
unit in a sequence is said to have reached high cognitive
skills of learning a Knowledge Unit.
We say subgraph from G`T is triangulated if every cycle
of length greater than three has an edge joining two
nonadjacent concepts of the cycle. Triangular clustering is a
subgraphs extracted from G`T Our clustering methodology
is divided into four types as follows:
1) CST (Strong-Triangle): if t denoted the target node which
is shared multi triangles with ‫ܥ‬ሺ‫ݐ‬ሻ then‫ܥ‬ሺ‫ݐ‬ሻ ‫ ܥ א‬ௌ் .
 ୗ୘ ൌ ሼሾ–ሿȁ†Œሾ–ሿ‹• ଷ ሽ Figure.8. Categories of Tringluarity.
This category of clustering represents the concepts that
are part of one of the highest levels of skill which is the Example of the Proposed Method
Creation level, because the connectivity between concepts The system starts by classifying the knowledge units
represented by one of the verbs in Bloom’s measurable into βi of the cognitive skills in any given textbook. For
verbs list [24] a specific combination of the concepts example, consider a knowledge unit in an Algorithm
creates the cluster. The concepts in this class are required textbook talking about Quick-Sort Algorithm; we need to
classify the knowledge unit into βi. This section will start

228
explaining our system phases through this knowledge unit.
Fig.9 explains the knowledge unit. After that fourth phase (D), our goal is to measure
relationships among extracted knowledge units. In order to
measure pre-requisite relationships, we propose using a
topological sort as explained in section D. The output from
this phase is the reordering of prerequisite relationships
among knowledge unit. Table IV shows a sample of the
prerequisite relationships.

TABLE IV. PRE-REQUISITE RELATIONSHIPS AMONG


CONCEPTS

knowledge unit
subarrays
Divide_and_conquer
Figure. 9. A KU from a TextBook.
Quicksort algorithm
iteration
First phase (A) starts with the preprocessing for the
values
given knowledge unit after all the cleaning as an output. ……………
We got Fig.10.
After the fifth phase (E), in which we reach our goal we
need to cluster the concepts in each knowledge unit into βi
of cognitive skills then cluster the knowledge unit itself.
Fig.12 demonstrates all the concepts clustered into βi for
the analyzed knowledge unit which is a Quick-sort based
on Triangular clustering.

Figure.10. Text Preprocessing of the KU.

After that second phase (B), it extracts the relations


among the concepts in a knowledge unit the output is a
textual graph, Fig.11. We translate our problem to graph to
be easy to deal with. As in Fig.11 explains all the possible
relationships from the given knowledge unit where the
relationships are verb-based some sentences have a simple
structure while others have a complex one. Figure.12. Bloom Graph GB for KU #1.

IV.EXPERIMENT SETUP AND EVALUATION


In this section, we first discuss the data set used for
testing our system and then the evaluation metrics will be
presented.
A. Experiment Setup
We test our method in two ways: locally and globally.
Locally means the behavior of the method using knowledge
unit from the same textbook. Globally means using three
Figure.11. Sementic Graph GT for a KU. high quality textbooks that are used in computer sciences
as course materials. We apply our method to see how it
After Third phase (C), we need to find the indexing of performs on textbooks. Table V. shows the statistical
the relationship in textbooks to save knowledge unit information about three textbooks. As our method included
sequences using the pseudo-code. In Fig.5 we index all the five steps, we start with the preprocessing of the PDF which
concepts in the knowledge unit. TABLE III. presents a sample is the most important step as it includes the stop words
of the indexing result. filtration, to save only the domain specific concept. Then
the relationships extractor stage is used to extract Semantic
TABLE III. TEXT PREPROCESSING ALGORITHM graph controlled by α threshold for the knowledge unit as
illustrated in Fig.13 and also in this step there is more
Indexing Format Triple filtration to access a high quality result by removing
1:7:7.1:1:1:1.2.3 [Quicksort-Algorithm has Worst-case]
isolated knowledge units from the textbooks, then to keep
1:7:7.1:1:1:1.2.5 [Quicksort-Algorithm has input-array]
the hierarchy among the concepts in the knowledge unit,
1:7:7.1:1:2:6.7.12 [Quicksort-Algorithm is sorting]
…………… ………………………………………..
indexing methodology is used. Prerequisite relationships

229
among candidate concepts and knowledge units are order, Eigenvector Centrality (μ) as in [25] presents the
and then we cluster all concepts in the knowledge unit then importance of the target concept neighbors which measure
cluster the knowledge units themselves are then clustered how well-connected a knowledge unit is to other highly
into CSCD levels which will help regroup the textbook connected concepts GT.
based on the cognitive skills to know at which cognitive
skills each knowledge unit must be given for learners. C. Evaluation
In order to evaluate the quality of the GB for each
TABLE V. PHYSICAL CHARACTERISTICS OF THE knowledge unit, we are interested in two different
TEXTBOOKS measures. The first one expresses the completeness of the
Book1 Book2 Book3 set of Bloom relationships, that is, how many valid Bloom
relationships are found with respect to the total number of
TOC depth 4 3 2
measures, which should have been found. This is the recall
Number of Knowledge unit 120 60 30 rate.
Number of extracted 8500 8200 3000 The second measure indicates the reliability of the set
Relationships
of Bloom relationships found in the knowledge unit, that is,
Number of concepts 1060 1020 950
how many valid Bloom relationships are found with respect
Number of verbs 610 480 300 to the total number of Bloom in the knowledge units; this
Overlaps of Knowledge units 400 300 220 is the precision rate. TABLE VI presents the comparison
between the ground truth and our system. These two rates
Fig.15 represents The triangular distribution of were evaluated using knowledge units from the textbooks
concepts in knowledge unit#1. It can be clearly seen that containing all this information as presented in Fig.14.
the number of triangles is high for the concepts which is To compute the metrics, we compare our system with
strongly related to the knowledge unit and the rest of the ground truth by asking the opinion of PhD students using
concepts which have no triangles either common concepts their own background knowledge and additional resources
or related to other knowledge units which helps to connect to cluster some knowledge units from a textbook and
the knowledge units together. determine level of the cognitive skills for each knowledge
Fig.16 shows the graph connectivity measures for unit. Of course, we cannot ask them to do that for a large
KU#1 from different views. All of the measures prove how number of knowledge units because the manual process is
strongly the concepts related to the using knowledge unit time consuming. The students created a semantic graph for
are, which appears at the beginning of the chart while the each knowledge unit, so each graph, graph, we perform the
common concepts come at the end of the chart. Fig.16 ground truth in three knowledge units from the textbook.
illustrates how the neighbors are triangulated with the main The purpose is to create final clustering GB for each
knowledge unit. knowledge unit as possible as the output from our system.
B. Graph Connectivity Measures TABLE VI. PHYSICAL CHARACTERISTICS OF THE
TEXTBOOKS
At this step, to measure the concept and knowledge unit
connectivity which could be correlated with the graph # Knowledge Bloom Trajectory
metrics, we collected and calculated the following success Book Name
Unit for KU
measures: Introduction to algorithm 35 90
Clustering Coefficient (φ): it is the measurement that shows Algorithms 10 40
the connectivity among knowledge units and the concepts Data structure and 30 25
Algorithms
related to them.According to [23] the mathematical
formula of ૎ is as follows:
ଶୣ
ɔ௜ ൌ ௞ሺ௞ିଵሻ (2) V.CONCLUSION AND FUTURE WORK
In this paper we presented a method that finds the
Where i is a knowledge unit with degree deg (i) = k in GT connectivity between knowledge units according to
ɔ௜ Takes values as 0 ≤ ɔ௜ ≤ 1 cognitive skills. The method is an improved version of our
previous work [7]. The result shows high performance of
Degree Centrality (ω): as in [24], shows that the the system from two different views: the graph view and
interactions of a target concept are represented in the the metric evaluation view. The results show that the well-
knowledge unit with other concepts in GT. Our result written knowledge units in textbooks are classified
shows the high centrality of the target concept in GT. ω is correctly, on the other hand, knowledge units with poor
defined as in question [5]. writing suffer from relatedness among concepts which lead
to ordinary classification. Based on our analytical result, it
ɘ௜ ൌ †‡‰ሺ‹ሻ (3) is possible to conclude that by using Bloom’s Taxonomy
Betweennes Centrality (γ) illustrates the connectivity we can decide which parts of a textbook to use at which
between target concept and their neighbors by making a level of Bloom to match the learner’s skills. This generates
path between concepts that is calculated as follows [23]: a way that can be used to identify a range of different
ఙ೔ೕሺ௪ሻ learning trajectories. For future work there will still be
ߛሺ‫ݓ‬ሻ ൌ σሺ௜ǡ௝ሻ‫א‬௏ሺ௪ሻ (4) efforts to introduce different experiments. We will
ఙ೔ೕ

230
investigate the use of the method to evaluate online
learning resources.

TABLE VII. PERFORMANCE COMPARISON BETWEEN GROUND TRUTH AND OUR SYSTEM

Metric Knowledge Knowledge Knowledge


Unit #1 Unit #2 Unit #3
Ground Precision 95.0 95.2 97.7
Truth Recall 96.1 96.4 96.0
Our system Precision 97.6 96.1 97.1
Recall 97.0 95.2 96.0

250
Mis # KU Correct # KU
200

150

#of Knowldge Unit


100

50

0
0.01 0.02 0.05 0.09 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
α Thresuold
Figure.13. KU Controlled by α Threshold.

True Negative Rate


1.002
1
0.998
Error 2 Rate

0.996
0.994
0.992
0.99
0.988
0 1 2 3 4 5
KU#1 in Cognative Levels
Figure.14. True Negative rate for KU #1.

Tringluarity Distribution
10
Number of Tringles

8
6
4
2
0

Concepts
Figure.15. Tringluarity Distribution for KU #1.

231
0
1
2

0.5
1.5

0
0.4
0.6
0.8
1
1.2

0.2
subarray
Partition
array

.
loop
procedure
algorithm_Quicksort

KU #1
divide_and_conquer
element
key
lines

φ
pivot
property
assignment

ω
condition

232
Conquer

KU #2
Divide

γ
merge_sort
specifications

Graph Connectivity Measures


Graph Connectivity Measures

correctness
μ

μ
Figure.16. Graph Connectivity Measures for KU #1. index
item
iteration

Figure.17. Graph Connectivity Measures for KU #1, KU#2, KU#3.


output
process

KU #3
recursive_calls
sets
terminates
values
REFRENCES [16] Cavnar, William B., and John M. Trenkle. "N-gram-
based text categorization." Ann Arbor MI 48113.2
(1994): 161-175.
[1] Agrawal, Rakesh, et al. "Data mining for improving
textbooks." ACM SIGKDD Explorations Newsletter [17] Porter, Martin F. "An algorithm for suffix stripping."
13.2 (2012): 7-19. Program 14.3 (1980): 130-137.
[2] Liu, Jun, et al. "Mining learning-dependency between [18] MacMahon, Matt, Brian Stankiewicz, and Benjamin
knowledge units from text." The VLDB Journal 20.3 Kuipers. "Walk the talk: Connecting language,
(2011): 335-345. knowledge, and action in route instructions." Def 2.6
(2006).
[3] Anthony Fader, Stephen Soderland, and Oren
Etzioni..”Identifying relations for open information [19] Lyons, John. Linguistic semantics: An introduction.
extraction. In Proceedings of the Conference on Cambridge University Press, 1995.
Empirical Methods in Natural Language Processing, [20] Chen, Danqi, and Christopher D. Manning. "A Fast and
2011,pp1535–1545. Accurate Dependency Parser using Neural Networks."
[4] Bloom, B. S., and Krathwohl, D. R..”Taxonomy of EMNLP. 2014.
educational objectives: The classification of [21] Thomas H. Cormen, Charles E. Leiserson, Ronald L.
educational goals”. Handbook I: Cognitive domain, Rivest, and Clifford Stein.”Introduction to Algorithms,
1956. Third Edition (3rd ed.)”. The MIT Press. 2009
[5] Anderson, L. W., Krathwohl, D. R., Airasian, P. W., [22] Pickard, Mary J. "The new Bloom’s taxonomy: An
Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., … and overview for family and consumer sciences." Journal of
Wittrock, M. C. “A taxonomy for learning, teaching, Family and Consumer Sciences Education 25.1 (2007):
and assessing: A revision of Bloom’s taxonomy of 45-55.
educational objectives, abridged edition”. White [23] Newman, Mark EJ. "A measure of betweenness
Plains, NY: Longman. 2001. centrality based on random walks." Social networks
[6] Scott, T. “Bloom’s taxonomy applied to testing in 27.1 (2005): 39-54.
computer science classes”. Journal of Computing [24] Albert, Réka, Hawoong Jeong, and Albert-László
Sciences in Colleges, 2003,pp267-274. Barabási. "Error and attack tolerance of complex
[7] Nafa F. and Khan J.”Conceptualize the Domain networks." nature 406.6794,2000,pp 378-382.
Knowledge Space in the Light of Cognitive Skills”. In [25] Bonacich, Phillip, and Paulette Lloyd. "Eigenvector-
Proceedings of the 7th International Conference on like measures of centrality for asymmetric relations."
Computer Supported Education. 2015. Social networks 23.3,2001.pp 191-201.
[8] Oliver, D., Dobele, T., Greber, M., and Roberts.” This [26] Albert, Réka, Hawoong Jeong, and Albert-László Barabási.
course has a Bloom Rating of 3.9”. In Proceedings of "Error and attack tolerance of complex networks." nature
the Sixth Australasian Conference on Computing 406.6794,2000,pp 378-382.
Education, 2004,pp.227-231.
[27] Bonacich, Phillip, and Paulette Lloyd. "Eigenvector-like
[9] Schmitz, Michael, et al. "Open language learning for measures of centrality for asymmetric relations." Social
information extraction." Proceedings of the 2012 Joint networks 23.3,2001.pp 191-201.
Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language
Learning. Association for Computational Linguistics,
2012.
[10] Johnson, C. G., and Fuller, U. (2006, February).”Is
Bloom’s taxonomy appropriate for computer science?”.
In Proceedings of the 6th Baltic Sea conference on
Computing education research: Koli Calling 2006 (pp.
120-123). ACM.
[11] Hearst, Marti A. "Automatic acquisition of hyponyms
from large text corpora." Proceedings of the 14th
conference on Computational linguistics-Volume 2.
Association for Computational Linguistics, 1992.
[12] Ritter, Alan, Stephen Soderland, and Oren Etzioni.
"What Is This, Anyway: Automatic Hypernym
Discovery." AAAI Spring Symposium: Learning by
Reading and Learning to Read. 2009.
[13] Rajaraman, Kanagasabai, and A-H. Tan. "Mining
semantic networks for knowledge discovery." Data
Mining, 2003. ICDM 2003. Third IEEE International
Conference on. IEEE, 2003.
[14] Fürst, Frédéric, and Francky Trichet. "Axiom-based
ontology matching." Proceedings of the 3rd
international conference on Knowledge capture. ACM,
2005.
[15] Buckley, Chris, James Allan, and G. Salton.
"Automatic retrieval with locality information using
SMART." Proceedings of the First Text REtrieval
Conference TREC-1. 1993.

233

You might also like