You are on page 1of 15

A Scalable Heterogeneous Big Data Framework for

e-Learning Systems
David Otoo-Arthur Terence L. van Zyl
School of Computer Science and Applied Mathematics School of Computer Science and Applied Mathematics
University of the Witwatersrand University of the Witwatersrand
Johannesburg, South Africa Johannesburg, South Africa
dotooarthur@gmail.com Terence.VanZyl@wits.ac.za

Abstract—The adoption of e-learning systems in higher edu- and promote effective instructor-learner communication via
cation is a remarkable phenomenon that has redefined teaching messages and chat.
and learning. Initially, it was proposed to allow people to learn E-learning system otherwise known as Learning Manage-
for personal accomplishment without physically attending any
traditional university or academic settings. While these systems ment Systems (LMSs) allow learners to fit learning into their
continue to provide an efficient and flexible approach for teaching lifestyle. Recent studies suggest that the interest and adoption
and learning, the rapid integration of ICTs and the expansion of of LMSs in higher education results from the institutional
data from these systems remain much concern to the education pedagogical goals, flexibility on content delivery and user
community. In this study, we propose a smart and secure data engagement with course materials without geographical con-
flow architectural framework for e-learning that uses a rich
set of big data tools within a distributed and parallel analysis straint [1]. Notwithstanding the benefits, the ever-increasing
platform. Reference Model of Open Distributed Processing (RM- data has brought serious implications in education worldwide.
ODP) reference model guided the development of this Big Romero et al. [2] argued that these massive data generated
Data framework for e-learning analytics. Using the RM-ODP from LMSs could be examined to gain insight into student
model as a benchmark, we classify the educational institution’s learning, instructors’ effectiveness in supporting learners and
architecture in terms of existing elements, functions and processes
to understand stakeholders’ views for the development of the make informed decision to enhance teaching and learning.
framework. Our approach uses an existing distributed comput- Sin and Muthu [3] opines that these LMS introduces unique
ing environment and offers an adaptable standard framework data format as a result of the different description and com-
to improve the data acquisition, storage, processing, analysis, position of the data entries. Working on a similar metric that
security and virtualization for e-learning systems. We implement connects these data entries will allow for collection and better
a scalable and adaptable big data framework for e-learning
(BiDeL) and point out how big data concept could integrate analysis of information. According to Labrinidis and Jagadish
into online learning systems to improve teaching and learning in [13], variety, data complexities, time, data privacy and scaling
higher education using Apache Spark as a case. The proposed are the major issues of Big Data that all levels of education
framework was applied to both batch and streaming dataset of face in creating value from data. This they explained emanates
students online activities on moodle LMSs. BiDel framework per- right from the acquisition of data to the decision-making point
formance shows improved data integration and data governance.
This big data framework and the general view of the current state where data to be kept or discarded must be consistent and
of the art in “big data” technologies will serve as a guide for accurate with the metadata.
the creation of e-learning systems, which create new value from The massive build-up of e-Learning systems data and the
existing but underused data. The acquired value will provide problems it present as a result of the big ”3V” (Volume,
decision-makers in higher education with new insights from data Variety and Velocity) continue to increase. These issues such
to enhance productivity in teaching and learning. These insights
can lead to innovation, competitiveness in the academic space as the number of logins by student users and tutors (volume
and even create entirely new teaching and learning models. of data), varying data format and length (variety) and forum
Index Terms—Big Data framework, Online Learning, e- discussions and online chats streaming (velocity) is inundating.
Learning, higher education, RM-ODP Again, dealing with data from numerous e-learning systems
institutions use is a challenge. These problems introduce an
I. I NTRODUCTION ideal situation where processing Big Data using conventional
analysis approaches become more challenging. Consequently,
The rapid integration of ICTs and the quest to leverage on developing alternative frameworks for undertaking big data
technologies to enhance teaching and learning has escalated analytics in higher education is indispensable.
higher education activities globally. In the past, e-learning has Otoo-Arthur and Van Zyl [4] show few studies have at-
focused on computer-based method like stand-alone computers tempted to integrate Big Data framework and learning ana-
with instructions on CD-ROMs. The launch of the internet lytics in higher education. Although these attempts are sig-
has brought many developments in e-learning offering the op- nificant steps to develop frameworks that support Big Data
portunity to share slideshows, audio/videos, conduct webinars and learning analytics, they mainly focused on analytics with

978-1-7281-6770-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
little attention to data integration and governance. Again, However, these frameworks were found to have suffered
obtaining an adaptable standard framework with theoretical several limitations which include a thorough discussion on data
underpinning that could serve a benchmark for Big Data security, privacy and ownership and theories that underpin the
analytics for higher education remains an issue. framework and models for analytics.
Similarly, Anshari et al. [5] examined the internet behaviour
II. J USTIFICATION FOR R ESEARCH AND C ONTRIBUTION and online learning of students in a big data context. Their
OF THIS S TUDY study suggested that extraction of unstructured data from mul-
tiple sources and platforms, and the role of data scientist were
To date, existing literature shows a technological gap in
significant challenges of big data in e-learning. It is against
the development of a standardised framework model for e-
this backlash that we undertake this experiment to propose a
learning. In particular, some few works that explored the
new big data framework model for e-learning systems. The
use of Big Data technologies in higher education focused on
new model leverages on the emerging big data technologies
analytics. These works did not consider all the institutional
to increase scalability, performance and availability.
capabilities of a successful Big Data practice. This study,
therefore, attempts to address most of the issues identified
IV. B IG DATA IN H IGHER E DUCATION
from our previous studies on a systematic review of the Big
Data framework for higher education. We develop a general A. Defining Big Data
framework standard that considers data integration and data There are many definitions proposed for big data. Generally,
governance with theoretical underpinning is the main objective these definitions focus on the scope, characteristics and diver-
of this study. The proposed framework borrows the notion of sity of information. The prevalence of digital technologies and
the Reference Model of Open Distributed Processing (RM- the ever-increasing data-reliant applications have spread the
ODP) and explores the benefits of combining Big Data and term across many disciplines including commerce, medicine,
e-Learning. The intended framework will provide a structure security and information science. In the field of education, the
for higher institutions that want to start with big data or aim to term big data has received much attention recently [4].
improve their Big Data competencies further. Our framework According to Andrea et al. [6], the term “Big Data” is used
provides a standard reference model that can be implemented with variations and lacks a consensual definition. They further
across higher institutions. explained that four main themes anchor the definition of Big
We propose a Big Data framework architecture that models Data, including big data, information technology, methods
higher institutions existing functions, elements and processes and impact. These themes must connect to put clarity on the
based on various stakeholders viewpoints. The term framework concept of big data. They define Big Data as “the Information
in our context describes a conceptual or real-time architectural asset characterised by such a High Volume, Velocity and
model that supports big data analytics in higher education. We Variety to require specific Technology and Analytical Methods
organised the rest of this study as follows: Section III identifies for its transformation into Value.”
various works that relate to this study. Section IV describes big McKinsey Global Institute [7] defines “Big Data” as
data in the context of higher education, followed by Section V, “datasets whose size is beyond the ability of typical database
which presents the overview of e-learning in higher education. software tools to capture, store, manage and analyse.” Xu and
Section VI describes the approach for integrating big data and Shi [8] point out that volume, variety, velocity and veracity
e-learning systems. We further discuss how institutions will are the main features that define Big Data.
use theory to support big data analytics in higher education The National Institute of Standards and Technology (NIST)
in Section VII. In Section VIII, we present the proposed [9] describes big data as the deluge of data in today’s net-
big data framework for e-learning analytics, followed by the worked, digitised, sensor-laden, and information-driven world.
presentation of preliminary results in Section IX. Finally, we The growth rates for data speeds, volumes and complexities
draw our conclusion and suggest future research in Section X. of big data they indicated are overwhelming traditional ap-
proaches to data management and analytics.
III. R ELATED W ORKS
Despite the widespread agreement on the inherent benefits
Large scale data challenges in e-learning systems have and current limitations of big data, several questions on the
spurred a large number of studies on developing a framework definition, the attributes, essential characteristics, integration
that will handle the massive data this system generates. In with existing systems, technological and standardisation re-
the context of big data, very few works have been done to mains a centre of discourse.
integrate this dimension and e-learning in higher education. We examined the most important occurrences of the term
In our previous study, we found ten (10) key frameworks “Big Data” in both business and academic perspectives. The
that tried to provide a guide to integrating big data analytics former describes Big Data as a new strategic resource in the
in e-learning in higher education. We note that four main digital era and the critical factor to drive innovation. The later
stages characterised these proposed frameworks, including defines Big Data to be a collection data with complexity,
data acquisition, data processing, developing models for data diversity, heterogeneity and high potential value that is difficult
mining, and presentation and visualisation of results. to process and analyse in a reasonable time.

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
B. Characteristics of Big Data 5) Value: The availability of data is creating value for
Characteristics of Big Data There are several characteristics many organisations of which educational institutions are not
proposed to differentiate Big Data from traditional data. Many exempt. McKinsey Global Institute [7] reports that Big Data
studies point to five (5) main characteristics of Big Data five- has the potential annual value of 250 billion to Europe’s public
Vs (Volume, Velocity, Variety, Veracity and Value) [10], [11]. sector administration. The significance of data is the value
However, not all of these characteristics are required for data attained from its analysis. The value is in how educational
to conform to the big data tag. Generally, high-volume, high- institutions use data to create the enabling environment to
velocity and high-variety are the key characteristics that define promote teaching and learning based on insights derived from
Big Data [12], [13], [14], [9] data mining.
1) Volume: The use of volume as a feature of Big Data
refers to the storage space required to record and store data C. Data Sources
[15]. They further explain that Big Data is qualitatively dif- Generally, Big Data sources in higher education are sys-
ferent from traditional data [16]. Generally, Big Data requires tems/machine automated or human-mediated forms. We ex-
a terabyte (240 bytes) or petabytes (250 bytes) storage space amine the characteristics and nature of Big Data using the
(The Economist, 2010 in [15]). The emergence of ubiquitous three (3) key features (3V’s) suggested by Laney [14] to have
devices and Internet of Things (IoT) have contributed to the a picture of the various sources of Big Data in HE. We see
massive growth of data. Massive data generated from business how these data type parameters combine to produce multiple
processes, social media, machines, sensors and networks keep forms of Big Data (Table 1).
increasing at an incredible speed, making it difficult to be
managed by the traditional database systems. Reports from V. OVERVIEW OF E -L EARNING
International Data Corporation (IDC) in 2011 indicate that
the overall created and copied data volume in the world was The history of e-learning a presently adopted in higher
1.8ZB (1021 B). The amount of data from these data has been education traces its root to the works of Bitzer [23] and Suppes
projected to double every two years [17], [18]. Strom [19] [24] who sought to situate the use of technology within a
argues that processing and analysing Big Data or storing it on broader educational agenda. Since the 1960s, e-learning has
a single machine are computationally demanding. Batty [20] evolved in many ways affecting education, businesses and
raises the issue of the limitation of conventional statistical and training sector. E-Learning, as defined in the context of higher
visualisation technique computational power to analysing these education, refers to the use of both software-based and online
data. Recent developments advance storage solutions such as learning [25].
NoSQL and Hadoop that makes it possible to collect, store During the 1960s, there were few computer applications
and manipulate large data in real-time and timely. in higher education. Many institutions thought the exorbitant
2) Velocity: Handling the speed at which data trickles in cost of technology would prevent the ubiquitous uptake as an
every second is the main description for velocity. Traditionally, educational tool [26]. In the future, it would be possible for
data processing was usually sampled occasionally in a batch all students to have access to the service of a personal tutor
process. In the Big Data era, real-time creation of data is the in the form of computer [24].
biggest challenge that organisations face. The velocity of data Further, he asserted that the most compelling reasons for
comes in two kinds: frequency of data generation and the using computers in education are learner-centred instruction
frequency of handling, recording, and publishing of data [15]. and supportive dialogue. A conjecture which was based on
3) Variety: This feature is made up of structured, semi- Bloom’s (1984) research; that one-to-one tutoring improved
structured and unstructured data and seems to be Big Data learners’ achievement by two standard deviations over group
weakest attribute [15]. Traditional database system, such as instructions [26].
spreadsheets and databases, was structured and was in rows Blitzer’s creation of PLATO, a time-shared computer system
and columns. Presently, data comes in structured, semi- in 1962 which addressed the delivery of computer-based
structured and unstructured forms such as audios, videos, literacy programme sets the ground which brought meaning
images, pdf and many more. 90% of the data generated are to Suppes work.
unstructured [17]. The challenge is to identify these formats Wooley [27] argued that two decades before the emergence
and apply appropriate tools and algorithms to interpret these of web PLATO system pioneered online forums and messages
datasets. board, instant messaging, chat rooms, email, remote screen
4) Veracity: Grolinger et al. [21] opine that one of the sharing and multiplayer games leading to one of the world’s
characteristics that we often overlook is data veracity. Pri- first online community.
marily we collect data for decision-making. Collecting the There is, however, no single agreed explanation or evo-
right information or data will, therefore, provide reliable lutionary point which e-learning originates [26]. The emer-
information for decision making. Inappropriate and inaccurate gence of the World Wide Web (WWW) around the 1990s
data could cause problems for organisations and consumers has changed the phase of e-learning concerning instructional
Gualtieri in [22]. Ensuring all data and analysis are accurate learning. Table II provides the historical perspective of e-
is essential to the Big Data paradigm. Learning based on the macro-level features.

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
TABLE I
G ENERAL O UTLOOK OF B IG DATA S OURCES IN H IGHER E DUCATION

Volume Velocity
Category Source of Data Type of Data Variety
Frequency
Number Volume Volume
Frequency of of Handling,
of Per (TBs, PBs,
Generation Recording
Records Records ZBs)
and Publishing
System-log data
(Including biological
traits like fingerprint,
Medium Medium Medium Real-time Instant and Delayed Unstructured
retina and iris patterns,
Transaction Data DNA, voice waves
and signature)
Web-log data High Low High Real-time Instant structured
Machine- Batch, Structured,
Instant and
Generated LMS-log data High Medium High near real-time Semi-Structured
Delayed
and real-time and Unstructured
Learner behavioural Instant and
Medium Low Medium Near real-time Semi-structured
traits Delayed
Access Data (ICAM,
Wifi, LAN, Vehicle High Low Medium Real-time Instant Unstructured
Sensed Data sensors)
Hearing/Visual aids Low Low Medium Real-time Instant Unstructured
Fire detection and
Low Low Low Real-time Instant Unstructured
Warning
Near real-time
Mobile M-Learning data High Low Medium Instant Structured
and real-time
Communication
Near real-time Structured and
Data Mobile phone data High Low High Instant
and real-time Semi-Structured
Digital CCTV High High High Real-time Instant Unstructured
Camera Data
Web camera High High High Near real-time Delayed Unstructured
Staff/Faculty and
Student Record
Structured and
Administrative (Including HR High Medium Medium Batch Delayed
Semi-Structured
system data, SIS,
Admission data)
Human- WBT data Batch, Structured,
Instant and
Mediated e-Learning (Including MOOCs, High Medium Medium near real-time Semi-Structured
Delayed
LMS, LCMS) and real-time and Unstructured
Structured and
Web searches High Low High Real-time Instant
Unstructured
Websites
Web scrapping High Medium High Real-time Instant Semi-Structured
Click streams High Low High Real-time Instant Structured
streaming
High High High Real-time Instant Unstructured
audio-visuals
Social media
Collaborative Instant and Structured and
Low Low Medium Near real-time
earning Delayed Semi-Structured
Social networks
Instant and Semi-Structured
(facebook, twitter, High High High Real-time
Delayed and Unstructured
flickr etc)

A. Features of e-Learning times, interactivity and collaboration between teachers and


learners, and virtual learning environment (VLE) which offers
According to Cheung (2007), four functional features that convenient access to educational materials over the distant
underpin every e-learning platform. (i) curriculum design location.
which looks at how the design of syllabus, study schedule,
class activities and study sequences and the preparation of Another generation of e-learning emerging is mobile-
learning resources and materials, supports teaching and learn- learning (M-Learning). This e-learning technique uses wire-
ing. (ii) Information exchange and discussion which focus on less technologies for delivering instructions and learning. M-
using real-time and non-real-time channels such as chat room, learning allows learners to collaboratively merge their learning
forums and electronic mailbox. (iii) performance assessment experiences in a shared environment [46]. Currently, learners
which refers to the assessment and grading of assignments and can interact with teachers from any location with the presence
tests and (iv) course administration which refers to learners of internet and web, which makes mobile learning the state-of-
data maintenance and system administration support. the-art for distance learning. The ubiquitous nature of mobile
Bhatia [45] also suggested four features of e-learning based devices such as smartphones and tablets create a learning
on the needs, abilities and background of learners. These environment that fits the diverse needs of the learner. The
include connectivity or networking, flexibility in learning principal target of the next generation of e-learning systems

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
TABLE II
H ISTORICAL PERSPECTIVE OF E -L EARNING

Period Focus Idenfiable Educational Features


Programmed Instructions (Lim) Learning and instructions were rooted in the theories of behaviourism and reinforcement
1940 - 1985 Drill and Practice [28] and [29]. Programming tools built for self-instruction and User-computer interactions were
Computer-Assisted Learning (CAL) localised.
Computer-based Instruction models dominate within interactive multimedia courseware.
Computer-Based Training (CBT): The constructivist approach to learning begins to influence the design and use of educational
1983 - 1990
Multimedia software. Learning instructions were passive and did not give room for learners to receive
feedback from the instructors.
Content delivery is internet-based, active learner models with limited end-to-end interactions.
1990 -1995 Web-Based Training (WBT)
Learning and instructions were deeply rooted in the constructivist approach [30] [31], [32].
Interactive Internet-based flexible instructional delivery is highly visible. There is a dominance
e-Learning (Smart Learning
1995 - 2005 of online multimedia courseware. Learning instructions rely mostly on cognitive and
Environments (SLEs)
constructivist models, social networking emerges, and remote user access possible.
Interactive Internet-based and multimedia courseware conventional, social networking and
Mobile-Learning (m-Learning - remote-user access improved. Accessible, mobile, ubiquitous communication and possible
2005 - date
SLEs and Ubiquity Learning (UL)) contextualisation. Learning instruction is deeply rooted in cognitive, constructionism and
social constructivist approaches [33], [34], [35], [36], [37], [38], [39].
Accessibility, mobility, ubiquity, communication, context-learning and social interaction
conventional. Interactive design (support 2D and 3D format, extendable (mobile
collaborative), combine physical and virtual worlds, gaming. Cognitive, constructionism
Augmented Reality (AR)
2010 - date and social constructivist approaches underpin learning and instructions. Has the prospect
Virtual Reality (VR)
to be a disruptive technology in the delivery of educational materials at all levels, from
public outreach activities to expert level teaching at undergraduate and postgraduate levels
[40], [41], [42] [43], [44].

is to use the current emerging technologies to provide new Spark is an open-source, in-memory, general-purpose dis-
techniques of learning, training and education [47]. tributed framework for Big Data processing that is known
to work in tandem with Hadoop. Spark API for distributing
B. Big Data Technologies for e-Learning programming extends MapReduce concepts by offering much
Several Big Data technologies are used in HE to manage, more complex operations that easily allows for parallelisation.
store, analyse and visualise large datasets from e-learning The core engine of Spark implements several modules for
systems. Generally, we classify these Big Data technologies computational analytics which is useful in optimising huge
into six (6) categories: middleware, programming languages, data management workflow. According to Franklin [51], these
databases, mining tools, ETL tools and visualisation tools [4]. modules include SparkSQL for queries on structured, rela-
tional data, GraphX for graph analytics, MLlib for machine
1) Middleware: The essential role of the middleware is
learning applications, and Spark Streaming. Spark also pro-
the provision of an environment to manage heterogeneous,
vides several APIs for languages such as Java, Scala, R, and
complex and distributed infrastructure.
Python, which makes it flexible for data analytics [52], [53].
Apache Hadoop, an implementation of Google’s MapRe-
duce, is the most popular middleware for carrying out analytics Flink is a parallel streaming dataflow processing engine
in HE over large scale data. Hadoop is an open-source platform designed to solve problems that emanate from micro-batch
that delivers an excellent plan for handling and processing big models (Spark Streaming). Flink has been designed to per-
data in a distributed computing domain. Hadoop primarily was form in-memory computations and execute in typical cluster
driven by Doug Cutting and Tom White in 2006 to offer local environment at any scale with programming abstractions in
computation and storage from a single server to thousands Java and Scala. Jobs run in Flink as streams with each task in
of machine with high scalability. The Hadoop core system a cyclic dataflow iterations. Flink implements two iterations
includes a processing part based on the MapReduce processing operators. Standard iterator which works with a single partial
model that allows for multiple clusters of data processing. Yet solution and delta iterator which employs the next entry set to
Another Resource Negotiator (YARN) which is responsible process and solution set. These iterators are designed to reduce
for resource management and a storage part known as Hadoop the computations of data sent between nodes creating an equal
Distributed File System (HDFS) [48]. semantic execution plan to avoid data access conflicts. Again,
Mahout is an open-source machine learning library built Flink offers a complex fault-tolerant mechanism and efficient
on top of Hadoop to provide real-time distributed analytics resource plan for storage and network. Also, Flink provides a
capabilities to massive datasets. An initial effort that allows scalable machine learning algorithms and an intuitive API for
a collection of scalable Machine learning algorithms for Big building complex data analysis pipelines using operations like
Data on Hadoop is Mahout. Mahout offers distributed and MLlib from Spark [10].
scalable data mining algorithms in conventional clustering Storm is an open-source, general-purpose, scalable and
and classification tasks such as k-means clustering, nearest partially fault-tolerant programme for real-time data stream
neighbour and naı̈ve-Bayes [49], [50]. processing [10]. Storm provides an environment that allows for

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
the stable distributed process while delegating the complexity is the preferred programming language for data scientists
of parallel processing and framework recovery mechanisms. (cite). Python enables developers to roll out programs and
Storm parallel framework provides three (3) key modules, in- get prototypes running, making the development process much
cluding Nimbus, which distributes codes to parallel processing faster. Once a project is on its way to becoming an analytical
delegates tasks and handles errors. A supervisor that initiates tool or application, it can be ported to more sophisticated
the processing of multifaceted events, and zookeeper which languages such as Java or C if necessary.
coordinates services for parallel applications [10]. Sqoop is an application designed to transfer data between
Apache Kafka framework is used to ingest the data streams relational database servers and Hadoop. Sqoop allows for easy
into the processing platforms. LinkedIn initially developed data import and export of data from structured data stores such
Kafka in 2010 for collecting and delivering high volumes of as relational databases such as Oracle and MySQL, enterprise
log data with low latency. Kafka framework provides a flexible data warehouses and NoSQL systems. Scala is a functional and
messaging system designed to be speedy and scalable for log object-oriented programming model for data science designed
collection. Written in Java and Scala, Kafka supports real- for distributed processing and scalability. Scala’s provides fast
time streaming and operational data, distributed processing performance with less code compared to Python and Java,
and parallel data load into the Hadoop architecture. Kafka respectively.
combines off-line, and on-line processing to develop real-
time computation and provide, which makes it a suitable R programming language is one of the widely used by
architecture for messaging tools and data pipelines [54]. statisticians and data miners for data analysis. R programming
Apache Flume is a flexible application for data feeding language provides an intensive environment for data analysis,
that works in tandem with Hadoop. Flume combines with a processing and visualisation.
processing framework that provides an efficient means to ag- 3) Databases: Many database technologies have been pro-
gregate and moves a large amount of streaming data. Flume’s posed for handling Big Data. The most known platforms
compatibility with new Hadoop products and distributions are the traditional relational database management systems
makes it an ideal choice for entrepreneurs. However, flume (RDBMSs) which store information in a structured format
tends to lose data occasionally due to unavailability of event (row and columns). These RDBMSs manage and manipulate
replication [55], [56]. data using the Structured Query Language (SQL). Common
Tez is a generalised dataflow open-source framework that RDBMSs include MySQL, Oracle, MS SQL Server, Post-
provides a scaffolding and library components for building a greSQL, Maria DB and Sybase.
fast, scalable and efficient dataflow centric engines. Tez allows
NoSQL is another database technology used to store un-
for modelling computations as DAG (Directed-Acyclic-Graph)
structured data. NoSQL databases are becoming very popular
that delivers greater control and customisation for both batch
due to the growing trend of Big Data. Although NoSQL
and interactive data processing. Also, Tez provides an efficient
databases do not provide the same level of consistency as
and scalable implementation of state-of-the-art features such
compared to RDBMSs, they have fast performance. Popu-
as YARN and an excellent opportunity for framework writers
lar NoSQL databases include Cassandra, Redis, MongoDB,
and researchers to innovate quickly due to its experimentation
Couchbase and Oracle’s NoSQL database.
support via pluggable APIs [57].
The hybrid processing combines big data platforms to 4) Data Mining Tools: There are various data mining tools
synthesise both batch and stream processes based on the that data scientists use for analytics. While there are much
Lambda architecture. Lambda data processing framework is popular open-source tools like KNIME, Orange, Rapid miner
designed to handle massive data leveraging on the merit of and WEKA free software. There are also other software tools
both batch and stream methods. This high-level architecture like IBM’s and SAS that facilitate data science functionalities
usually presents three layers, including the batch layer which in a closed-source way.
manages the master dataset stored in a distributed system. The
5) ETL Tools: An ETL tool is a software for Extracting,
serving layer which loads and exposes the views of the batch
Transforming and Loading data. ETL tools are commonly used
layer in a data store for the query, and speed layer that deals
to fetch both historical and transactional data for developing
with low latency new data. At last, there is a merge of the
data warehouse as well as data migration projects. Popular
complete result by combining the batch and real-time views
ETL tools include Scriptella, Pentaho, KETL, OpenRefine,
[58], [59].
Boomi, Infosphere and Azure Data Factory.
2) Programming Languages: There are several program-
ming tools used in conjunction with Big Data middleware 6) Visualisation Tools: Data visualisation tools present data
technologies to perform analytics. Prominent among these Big in a graphical format that makes it easy to understand and
Data programming tools are Python, Sqoop, Scala, and R. interpret. Common Big Data visualisation tools include kibana,
Python is an uncomplicated and robust application that elastic search, tableau and infograph. These tools offer effec-
delivers both power and complexity of traditional compiled tive and fast means to explore large datasets, spot trends and
languages along with the ease-of-use of simpler scripting identify unexpected relationships and correlation within the
and interpreted languages. Recent studies show that Python datasets.

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
C. Technologies for e-Learning 5) Augmented Reality (AR) and Virtual Reality (VR) in
HE: The use of AR and VR in HE has gain prominence in
There are several notable technologies and emerging models
the last decade. These technologies can deliver an interactive
that are currently driving e-learning in HE [4]. The past
blended learning experience created from the combination of
decade has recorded an exponential growth of educational
real and virtual environments or materials in the classroom
technologies. However, many higher education institutions are
[69]. Now, learners can have access to virtual laboratories
at the cradle of understanding and utilising these technologies
anywhere without the need for a physical system. Learners can
to enhance students learning experience. This study identified
practice in a “safe” environment before using real, physical
six (6) key emerging trends of technologies for e-learning and
components. As suggested by many authors [70], [71], [72]
its educational characteristics.
and [73], AR/VR could increase learners interest, encourage
1) Smart Learning in HE: There has been a sharp incline creative thinking and allow for collaboration between tutors
towards the use of Smart Learning Environments (SLEs) such and learners. Research and development relative to AR/VR
as smart e-learning, mobile learning and smart universities are technology is focused on a whole ecosystem around smart-
transforming the delivery of teaching and learning in the HE. phones, including applications and educational content, games
Using these systems, instructors and learners can communicate and social networks, creating immersive three-dimensional
(local and remote), share course content, view course content spatial experiences addressing new ways of human-computer
in a preferred language, ask questions, present (local and interaction [73].
remote) and take on discussions through forums and chats. 6) Internet of Things (IoT) in HE: The recent acceptance
Administrators can also manage administrative and academic of the Internet of Things (IoT) in the consumer space and the
activities efficiently with the provision of school management Web of Things (WoT) has led to HE exploring how this area
systems. Nonetheless setting up a SLE that meets criteria could be useful in education. IoT is presently reforming the
defined in Smart Classroom standards and data governance academic space in relation to providing support for learners
remains a challenge [60]. with disabilities [74], automating learning tasks, monitoring
2) Big Data Analytics in HE: With the increasing digiti- instructors and learners and collecting data in the education
sation of higher education, a large amount of data has been setting [75]. According to [76], The IoT-enhanced LMS will
accumulated, and the size is increasing at an unprecedented transmit lecture material and live video streams of lectures
rate. Big Data carrying digital traces of a person’s life can to students anywhere in the campus in a speedy and efficient
be used to uncover private details or even predict the future method.
behaviour of individuals [61]. We can examine the massive
data to discover in-depth knowledge and values that are key VI. I NTEGRATING B IG DATA A NALYTICS AND
to improve educational outcomes and explain educational E -L EARNING
phenomena [2]. This technological trend promises a tool that The development of Big Data in Higher Education requires
deals with the real-time collection, storage, privacy, security a thorough look at the issues that delineate its implementation.
and analysis of data about the learning environments. The general picture (Fig. 1) of Big Data issues can be viewed
3) Cloud Computing in HE: With higher education facing from the needs of educational institutions.
the problem of computational infrastructure, cloud computing The adoption of smart learning systems, student information
offers an excellent platform that facilitates efficient use of systems and other educational applications in higher education
computing resources, simplify management, and improve ser- are generating massive data that puts pressure on existing
vices safely and securely. Cloud computing guarantees access systems. Data collection and how the information will be
to essential data to all stakeholders in HE (students, teachers, secured whiles deriving needed value from processing these
parents, administrators and government) using any device from large datasets (Data Governance) is a challenge. Eventually,
the workplace. the need to examine these vast and varied datasets to provide
4) Artificial Intelligence and Machine Learning in HE: timely information for decision-making is increasing the need
Artificial Intelligence (AI) and Machine learning (ML) are the for learning analytics in higher education. The need to make an
most researched issues and often seems to be used interchange- informed decision will not only allow for timely intervention
ably in higher education. AI and ML can suggest changes to but allow higher education to gain a competitive advantage
the e-learning system, support real-time skills and knowledge within the academic space.
assessment [60]. AI and ML join with statistical and data The quest for analytics consequently demands substantial
mining tools to monitor and evaluate learners’ behaviour. computational resources. This issue, therefore, requires high-
These concepts have been widely used to predict students’ per- performance computing, hardware revolution and virtual sys-
formance, students’ enrolment management, predict students’ tems to handle the computational needs.
attrition rate, identify learners with low motivation, undertake Beyond these issues are the need for innovative techniques
social network analysis and students’ collaboration, security to capture, analyse, store, search, visualise, share and secure
and risk mitigation, estimate students’ learners affect and used data. The demand for effective Big Data tools and technolo-
in recommendation systems [62], [63], [64], [65], [66], [67], gies, statistics-based and machine-learning-based models are
[68]. imminent issues that need to be addressed.

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
makes it challenging to understand and explain how and why
implementation succeeds or fails. This outcome restrains the
possibility to identify successful implementation factors to
develop better approaches to achieve more successful imple-
mentations. In HE, not much has been done to establish a
theoretical basis of implementation and strategies to facilitate
implementation.
Academics point to a theory as being made up of variables,
a domain of application, a set of relationships of variables
and specific predictions [80], [81]. Sutherland in [82] define
theory as an ordered set of assertions about generic behaviour
or structure assumed to hold throughout a significantly broad
range of specific instances. Hunt [80] explains that these
statements indicate the importance of how and why specific
phenomena will occur.
In general, a theory may constitute a system of ideas or gen-
eral principles that are designed to structure our observation,
understanding and explanation of the world [83] and [82]. A
functional theory provides a clear explanation of how and why
specific relationships lead to specific events.
Another issue that arises using theory to support Big Data
analytics in higher education is the overarching objectives of
the use of models, frameworks and theories. Models involve
the simplification of a phenomenon or an aspect of a phe-
Fig. 1. Big Data issues in HEIs. We see three common Big Data challenges nomenon. Models relate closely to theories, and the difference
HEIs face. These challenges include the need for learning analytics, compu-
tational resources and effective Big Data technologies. is not always clear. Models may describe theories with a more
narrowly defined scope of explanation. A model is descriptive,
whereas a theory is explanatory as well as descriptive [83].
We believe that solutions to the above needs will impact Frameworks usually denote a structure, overview, outline,
positively on Big Data in higher education. These impacts system or plan consisting of various descriptive categories
include but are not limited to performance prediction, planning such as concepts, constructs or variables and the relations
and management of stakeholders, behaviour visualisation and between them that are presumed to account for a phenomenon.
analysis, social network analysis and students’ collaboration, Frameworks do not provide explanations. They only describe
skill estimation and course, and programme recommendation. empirical phenomena by fitting them into a set of categories
[83].
VII. U SING T HEORY TO S UPPORT B IG DATA A NALYTICS
In order to advance theoretical concepts in Big Data analyt-
IN H IGHER E DUCATION
ics in HE, we need to understand it from the implementation
Theories place attention on some of the variables recognised science education perspective. This idea will mean examining
as essential in a framework. Many research findings in Educa- and demonstrating what makes interventions work in real-
tional Big Data analytics over the years provide a wide variety world contexts. An essential objective for the practitioner is
of evidence-based models and practices for use in higher to make use of the evidence-based practice, which depends on
educational institutions. However, these results are empirical robust research evidence of efficacy and effectiveness.
and pay little attention to the theories that drive the analytic
process. Eccles et al. [77] assert that these approaches use an VIII. P ROPOSED B IG DATA F RAMEWORK FOR
expensive trial-and-error and could have minimal confidence E -L EARNING A NALYTICS (B I D E L)
in replicating success if it is achieved.
Generalising such findings from these studies in education Designing a Big Data framework is a daunting task. It
to support the learning phenomenon could be daunting. This requires a well-articulated process to break it down into
result often arises due to the limited understanding of the char- actionable tasks which will offer a common reference point
acteristics of the targeted behaviour (learner), professionals for the development of framework and models for learning
(staff/faculty) and the environment in which they operate. The analytics. This study adopts Reference Model of Open Dis-
features of these variables may influence the effectiveness of tributed Processing (RM-ODP) as a benchmark to provide for
different model implementations. abstraction, which is essential for understanding and to reuse
Davies et al. [78] opine that only 10% of guideline im- the general pattern of reasoning.
plementation strategies provide a clear justification for their RM-ODP defines the essential concepts necessary to specify
strategies. Nilsen [79] suggests that a weak theoretical basis a reference model from five prescribed viewpoints ISO/IEC

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. The core elements of HEIs showing the various tasks that could emanate from their environment in line with RM-ODP viewpoints - institutional
(enterprise), information, computational, engineering and technology.

10746 1 . Viewpoints are a form of abstraction that is achieved A. Contextualising the framework using RM-ODP Viewpoints
using a selected set of architectural concepts and structuring
rules, in order to focus on particular concerns within a system 1) Enterprise viewpoint: At the enterprise viewpoint is the
ISO/IEC 10746-2:2009 2 . HEI tasks from short and medium to long-term targets that
RM-ODP defines five essential viewpoints of a system and results in their core functions. At the framework level, there
its environment. They are: is the need to have design requirements that support the core
functions of HEIs in terms of flexibility to modification, com-
• Enterprise viewpoint, which describes the business re- posability, availability, adaptable, fault tolerance, processing
quirements and how to achieve them. The focus here is capacity and high response time.
the purpose, scope and policies of a system. 2) Information viewpoints: The information viewpoint cap-
• Information viewpoint focuses on the semantics of infor- tures invariants that describe the related concepts, relationships
mation and information processing. and workflows. At the high level, it may be referred to as the
• Computational viewpoint considers all aspect of the func- class such as the faculties, schools, departments and units. De-
tional decomposition of objects of the distributed system veloping BD framework will, therefore, have to bring out the
and how they interact. relationship among the elements/modules of the architecture.
• Engineering viewpoint, which concerns the mechanisms Again, the components of the framework can be described
and functions required to support distributed interaction from an information viewpoint using information flow model
among objects in the system and (IFM). The goal of the IFM is to flow from data acquisition
• Technology viewpoint, which focuses on the technologi- to decision making. Pertinent issues at this stage include data
cal choices of a system. security and privacy, data visualisation and data ownership.
The strategy of the HEI constitutes their business model 3) Computational viewpoints: The BD framework design
which reflects the main elements of HEI. These strategies can represent a pattern of interaction which include batch
include institutional policies, goals and targets, curriculum processes, near-real-time or real-time data processing. The
and instructional designs, pedagogical designs and assessment computational viewpoint deals with transaction management,
and the learning resources which have a direct impact of the virtualisation, platform-specific rules and the development of
principles and position of the HEI in the academic space. fast and efficient algorithms. These computational algorithms
include machine learning, statistic-based learning and other
techniques such as Hadoop MapReduce [4]. Again, we can
1 Available at: https://www.iso.org/standard/55723.html describe the behaviour of framework components based on
2 Available at: https://www.iso.org/obp/ui/iso:std:55723:en the communication style (either event-driven or messaging).

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
4) Engineering viewpoint: From the engineering viewpoint, The third layer, machine learning and analytics comprise
we consider the distributions and their related issues. These both real-time processing engine and batch processing module
include data warehousing, hardware, operating systems con- for streaming data, data modelling and data analysis. This layer
siderations. BD systems components are the subject of a adopts the CRoss-Industry Process for Data Mining (CRISP-
computational viewpoint. Another engineering dimension is DM) approach to extract big education insights. CRISP-DM
the design style and dashboards for visualisations. is very flexible when using analytics to solve HEIs complex
5) Technology viewpoint: The technology viewpoint de- issue and provides a structured approach to planning a big
scribes the BD tools for the system management processes. educational data mining task. Machine-learning analytics layer
Otoo-Arthur & Van Zyl [4] identifies six (6) categories of focuses on the development of practical algorithms and models
BD tools. These categories are middleware, programming for data analytics. Fig 5 shows the type of data analytics that
languages, database technologies, data mining tools, ETL tools could be performed on big education data. The more complex
and data visualisation tools. the analytics the added-value it contributes to improving HEIs.
One issue to developing the BD framework is that of The fourth layer is dashboards and visualisation (DV). This
making the semantics of the relationship of the components layer focus on the development of dashboards for creative
explicit. The discussion of the RM-ODP viewpoints shows data exploration with interactive visualisation. Dashboards
significant considerations for developing the framework. Fig. simplify reports into graphs and charts for more insights
2 outlines the core elements of HEI and the various tasks that whiles changing data interactively. Using visualisations, HEIs
could emanate from their environment in line with RM-ODP can easily spot areas of attention, classify which factors
viewpoints. influence performance and predict the success of HEIs goals
and strategies.
B. High-Level Architectural Framework for HEIs
The high-level system data flow framework of Big Edu- C. Implementation of BiDel Framework
cation Data (Fig. 3) is designed to support both batch and The rapid development of technology has inspired the use
streaming operations. A typical HE Big Data solution employs of e-learning systems in HEIs. As noted earlier, e-learning is
an aspect of the components of the framework based on characterised by a large amount of data generated through the
their desired functionality. The proposed approach is made learners’ and lecturer online interactions. Consequently, the
up of four layers: big education data source, information need for efficient big data analytics system and techniques
management, machine learning and analytics, and dashboards is imminent. The realisation of the system will allow for the
and visualisation. Various applications data contributes to big management and analysis of the HEIs large datasets to enhance
education data sources (first layer). These data sources include learners’ learning experience on e-learning platforms.
camera data, mobile app data, e-learning data, machine and Fig. 6 illustrates the BiDeL framework proposed to address
sensors, administrative data, transaction and usage logs, email e-Learning big education data issues in HEIs. BiDeL has
and messaging and social media data. three main modules unplugged from the high-level big data
Data from these data sources are sent to the information framework for HEIs to implement an instance of dealing with
management layer. Several operations executed under this massive data in e-Learning systems. BiDeL framework pro-
layer include data encryption to ensure data and information vides a general look at HEIs big data ecosystem that supports
confidentiality, privacy and provenance whiles performing multi-structured dataset processing and analysis with various
analytics. The data lake serves as a storehouse for massive roles of the actors (learners, lecturers, faculty, management,
data in its native format without constraints to the data data scientist, and pedagogical team) of the ecosystem.
type (structured, semi-structured and unstructured). Data lake 1) Data capture and information management module: The
(fig. 4) consists of landing (transient) zone, staging zone data capture module collects log data from the e-Learning
and analytic sandbox. Landing zone undertakes preliminary system, clean, filter and aggregate using the ELT and ETL
cleaning, filtering and aggregation of raw data whiles data processes. We employ the ELT process to provide us with
which does not require pre-processing go to the staging zone. the full view of data to decide which parts are useful or
The analytic sandbox performs basic analytics for which HEIs not. Data captured from the e-Learning system may include
may not directly use the findings. Various technologies for analytics data captured from learner interaction (assessment
implementing data lake includes HDFS, Cassandra, HBase and and other learning activities), survey data (students’ perception
MongoDB. In terms of data processing Storm, Spark, Flume, of instruction), learners’ collaboration data (social sharing,
Sqoop or Hadoop MapReduce may be an option. From the data chats, forums and wiki), learner data (knowledge, profile, skills
lake, the relevant information is sent to the big data warehouse and level) and pedagogical resources (texts, images, videos,
(local or cloud-based infrastructure) which could be extracted audios tutorials and multimedia). Unstructured data constitutes
for analysis by a big data application. The primary function of about 95% of the e-Learning data. To facilitate data loading
the information management layer is to ensure data has been process from the database server, we use Flume for collecting,
pre-processed for noise reduction, consistency and reliability. aggregating and moving LMSs log data. We use Sqoop to
This pre-processing is done based on the directives of HEIs export and import all data types between the database server
professionals. and Hadoop HDFS.

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. A high-level framework data flow of big education data processing system. The big data framework supports batch and streaming operations, and
have four main stages - big education data source, information management, machine learning and analytics, and dashboard and visualisation.

Fig. 5. Types of big data analytics to improve in decision making in HEIs from
simple analytics to more complex types. The more complex the analysis, the
more value we have. Source: https://www.scnsoft.com/blog/4-types-of-data-
analytics

Fig. 4. Data lake architecture showing how structured and unstructured big
framework. We implement Spark on top of the Hadoop’s
education data is stored. Basic operations here include data cleaning, filtering HDFS to build a secure data pipeline for the ingested data
and aggregations (ELT process). from the information management module. Two implemen-
tations possible under this module are streaming and batch
processing of LMSs log data. All machine learning algorithms
2) Machine learning and analytics module: The machine for developing HEIs analytics models fall under this module.
learning and analytics module is the central point of the BiDeL Various machine learning algorithms may include supervised

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Proposed BiDeL framework for e-learning systems in HEIs. We see three main modules unplugged from the high-level framework to deal with massive
data from e-learning systems. BiDeL combines multiple big data technologies for processing data from the e-learning systems. Various roles of actors of the
system are also presented.

learning, unsupervised learning, statistics-based learning other strategy. These strategies include the utilisation of the concept
techniques such as fuzzy logic, page ranking, TF-IDF, genetic of data-driven teaching, digitisation of teaching and learning,
algorithm, gradient boosting. and policy-driven analysis of HEIs big data.
3) Dashboard and visualisation module: This module ap-
IX. P RELIMINARY R ESULTS
plies actionable insights from data analytics to facilitate
decision-making in HEIs. These decisions include content A. Setting up the Environment Configuration
management, learner behaviour monitoring, improving ped- We perform the initial experiment on a workstation with
agogical strategies and course recommendation system. 16GB memory, Intel Core I7 -7500U CPU @ 2.90 GHz and
The field of education big data analytics thrive in a multidis- 1TB hard disk. On top, hortonworks sandbox HDP 2.6.5 with
cipline: data structures and algorithms from computer science, an allocation of 8GB main memory. The default configuration
statistics and probability from mathematics, and HEI domain parameters for Flume, Sqoop and Spark were used for the
knowledge. The overarching issues of these disciplines make exploration. The idea for the implementations in sandbox HDP
the attainment and implementation of theories a complicated is its command line and graphical dashboard features coupled
practice. A theoretical underpinning of the framework will with the various in-built Big Data tools, which makes it
serve as a reference model for education practitioner and flexible in usage. The configuration could, however, be scaled
researchers is essential to establishing a correct concept of up from single machines to multiple machines.
big education data to facilitate information teaching technol-
ogy. The notion of RM-ODP used in this instance gives a B. Implementation
broader view of big education data analytics in HEIs from the Our initial experiments showed an impressive performance
three multidiscipline. BiDeL framework analyses the layers using Sqoop to import and export all data types between
of education data following an extensive data development moodle server and Hadoop HDFS. We seek to implement three

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
other algorithms to existing ones that spark provides in its [11] K. Stefanova and D. Kabakchieva, “Educational data mining perspec-
MLlib to develop early warning systems for e-learning man- tives within university big data environment,” in 2017 International
Conference on Engineering, Technology and Innovation (ICE/ITMC).
agement systems. Our future works will provide more detailed IEEE, 2017, pp. 264–270.
information on the methods of implementation, algorithms and [12] C. D. L. Demchenko, Yuri and P. Membrey, “Defining architecture
results. components of the big data,” IEEE, vol. 9, no. 4, pp. 104–112, 2014.
[13] S. Pandey and V. Tokekar, “Prominence of mapreduce in big data pro-
cessing,” in Communication Systems and Network Technologies (CSNT),
X. C ONCLUSION AND F UTURE W ORKS 2014 Fourth International Conference on. IEEE, 2014, pp. 555–560.
[14] D. Laney, “3d data management: controlling data volume, velocity,
This paper harnesses the potential integration of Big Data and variety. application delivery strategies file 949, meta group inc.,
paradigm in higher education in this era where ubiquitous stamford,” 2001.
technologies are becoming an integral part of distance learn- [15] R. Kitchin and G. McArdle, “What makes big data, big data? exploring
the ontological characteristics of 26 datasets,” Big Data & Society, vol. 3,
ing. We provide a general outlook of Big Data sources in no. 1, p. 2053951716631130, 2016.
higher education. Our work provides a high-level framework [16] R. Kitchin, “Big data and human geography: Opportunities, challenges
data flow of Big Data in higher education which is scalable and risks,” Dialogues in human geography, vol. 3, no. 3, pp. 262–267,
2013.
for context-specific systems. BiDeL framework provides high- [17] J. Taylor, Decision management systems: a practical guide to using
performance concerning its storage distribution and parallel business rules and predictive analytics. Pearson Education, 2011.
processing. BiDeL leverages on the numerous big data tools [18] M. Chen, S. Mao, and Y. Liu, “Big data: A survey,” Mobile networks
and applications, vol. 19, no. 2, pp. 171–209, 2014.
and algorithms provided by spark MLlib library, which can [19] D. Strom, “Big data makes things better. slashdot, 3 august,” 2012.
improve users experience in online learning. We also highlight [20] M. Batty, S. Gray, A. Hudson-Smith, R. Milton, O. O’Brien, and
the need to ground learning models on theories to drive the F. Roumpani, “Visualizing spatial and social media,” Innovations in
digital research methods, pp. 245–270, 2015.
analytics processes. [21] K. Grolinger, M. Hayes, W. A. Higashino, A. L’Heureux, D. S. Allison,
On-going work will focus on the implementation of e- and M. A. Capretz, “Challenges for mapreduce in big data,” in Services
learning framework using a combination of big data tech- (SERVICES), 2014 IEEE World Congress on. IEEE, 2014, pp. 182–189.
[22] D. Kabakchieva, K. Stefanova et al., “Big data approach and dimensions
nologies. We envisage an early warning system for users for educational industry,” Economic Alternatives, vol. 4, pp. 47–59,
of e-learning systems to predict when students are likely to 2015.
disengage the system. This system will allow for real-time [23] D. L. Bitzer, P. G. Braunfeld, and W. Lichtenberger, “Plato ii: A
multiple-student, computer-controlled, automatic teaching device,” Pro-
monitoring of students learning behaviour in the e-learning grammed learning and computer-based instruction, pp. 205–216, 1962.
system. [24] P. Suppes, “Modern learning theory and the elementary-school curricu-
lum,” American Educational Research Journal, vol. 1, no. 2, pp. 79–93,
R EFERENCES 1964.
[25] L. Campbell, “What does the “e” stand for,” Melbourne: Department
[1] E. Dahlstrom, D. C. Brooks, and J. Bichsel, “The current ecosys- of Science and Mathematics Education. The University of Melbourne,
tem of learning management systems in higher education: Student, 2004.
faculty, and it perspectives,” Research report. Louisville, CO: ECAR, [26] T. T. Kidd, “A brief history of elearning,” in Web-based education:
September 2014. Available from http://www. educause. edu/ecar. 2014 Concepts, methodologies, tools and applications. IGI Global, 2010,
EDUCAUSE. CC by-nc-nd, Tech. Rep., 2014. pp. 1–8.
[2] C. Romero and S. Ventura, “Data mining in education,” Wiley Interdisci- [27] D. Wooley, “Plato: The emergence of online community. retrieved
plinary Reviews: Data Mining and Knowledge Discovery, vol. 3, no. 1, november 1, 2002,” 1994.
pp. 12–27, 2013. [28] M. Schittek, N. Mattheos, H. Lyon, and R. Attström, “Computer assisted
[3] K. Sin and L. Muthu, “Application of big data in education data mining learning. a review,” European Journal of Dental Education: Review
and learning analytics–a literature review.” ICTACT Journal on soft Article, vol. 5, no. 3, pp. 93–100, 2001.
computing, vol. 5, no. 4, 2015. [29] C. S. Lim, K. N. Tang, and L. K. Kor, Drill and Practice in Learning
[4] D. Otoo-Arthur and T. Van Zyl, “A systematic review on big data (and Beyond). Boston, MA: Springer US, 2012, pp. 1040–1042.
analytics frameworks for higher education - tools and algorithms,” in [Online]. Available: https://doi.org/10.1007/978-1-4419-1428-6 706
Proceedings of the 2019 2nd International Conference on E-Business, [30] G.-J. Hwang, L.-Y. Chiu, and C.-H. Chen, “A contextual game-based
Information Management and Computer Science, ser. EBIMCS ’19. learning approach to improving students’ inquiry-based learning perfor-
New York, NY, USA: Association for Computing Machinery, 2019. mance in social studies courses,” Computers & Education, vol. 81, pp.
[Online]. Available: https://doi.org/10.1145/3377817.3377836 13–25, 2015.
[5] M. Anshari, Y. Alas, N. Yunus, N. Sabtu, and M. Hamid, “Online [31] S. E. Bond, S. P. Crowther, S. Adhikari, A. J. Chubaty, P. Yu, J. P.
learning: trends, issues, and challenges in the big data era,” Journal Borchard, C. S. Boutlis, W. W. Yeo, and S. Miyakis, “Evaluating the
of E-Learning and Knowledge Society, vol. 12, pp. 121–134, 01 2016. effect of a web-based e-learning tool for health professional education on
[6] A. De Mauro, M. Greco, and M. Grimaldi, “A formal definition of big clinical vancomycin use: Comparative study,” JMIR medical education,
data based on its essential features,” Library Review, vol. 65, no. 3, pp. vol. 4, no. 1, p. e5, 2018.
122–135, 2016. [32] C. S. Chai, L. Tan, F. Deng, and J. H. L. Koh, “Examining pre-service
[7] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and teachers’ design capacities for web-based 21st century new culture of
A. H. Byers, “Big data: The next frontier for innovation, competition,” learning,” Australasian Journal of Educational Technology, vol. 33,
Washington, DC: McKinsey Global Institute, 2011. no. 2, 2017.
[8] Z. Xu and Y. Shi, “Exploring big data analysis: Fundamental scientific [33] D. Parsons and H. Ryu, “A framework for assessing the quality of mobile
problems,” Annals of Data Science, vol. 2, no. 4, pp. 363–372, 2015. learning,” in Proceedings of the international conference for process
[9] W. L. Chang, G. von Laszewski et al., “Nist big data interoperability improvement, research and education. Citeseer, 2006, pp. 17–27.
framework: Volume 8, big data reference architecture interfaces,” Tech. [34] M. Sharples, I. Arnedillo-Sánchez, M. Milrad, and G. Vavoula, “Mobile
Rep., 2018. learning (pp. 233-249),” The Netherlands: Springer, 2009.
[10] A. Mohamed, M. K. Najafabadi, Y. B. Wah, E. A. K. Zaman, and [35] A. Kukulska-Hulme, M. Sharples, M. Milrad, I. Arnedillo-Sánchez, and
R. Maskat, “The state of the art and taxonomy of big data analytics: G. Vavoula, “Innovation in mobile learning: A european perspective,”
view from new big data framework,” Artificial Intelligence Review, pp. International Journal of Mobile and Blended Learning (IJMBL), vol. 1,
1–49, 2019. no. 1, pp. 13–35, 2009.

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
[36] M. Kearney, S. Schuck, K. Burden, and P. Aubusson, “Viewing mobile [60] T. Hoel and J. Mason, “Standards for smart education–towards a
learning from a pedagogical perspective.” Research in learning technol- development framework,” Smart Learning Environments, vol. 5, no. 1,
ogy, vol. 20, no. 1, p. n1, 2012. p. 3, 2018.
[37] L. Briz-Ponce, J. A. Juanes-Méndez, F. J. Garcı́a-Peñalvo, and [61] A. De Mauro, M. Greco, M. Grimaldi, and P. Ritala, “Human resources
A. Pereira, “Effects of mobile learning in medical education: a counter- for big data professions: A systematic classification of job roles and
factual evaluation,” Journal of medical systems, vol. 40, no. 6, p. 136, required skill sets,” Information Processing & Management, vol. 54,
2016. no. 5, pp. 807–817, 2018.
[38] C. Pimmer, M. Mateescu, and U. Gröhbiel, “Mobile and ubiquitous [62] S. Ameri, M. J. Fard, R. B. Chinnam, and C. K. Reddy, “Survival
learning in higher education settings. a systematic review of empirical analysis based framework for early prediction of student dropouts,” in
studies,” Computers in Human Behavior, vol. 63, pp. 490–501, 2016. Proceedings of the 25th ACM International on Conference on Informa-
[39] M. Al-Emran, H. M. Elsherif, and K. Shaalan, “Investigating attitudes tion and Knowledge Management. ACM, 2016, pp. 903–912.
towards the use of mobile learning in higher education,” Computers in [63] Y. Huang, M. Yudelson, S. Han, D. He, and P. Brusilovsky, “A frame-
Human behavior, vol. 56, pp. 93–102, 2016. work for dynamic knowledge modeling in textbook-based learning,” in
[40] J. Yip, S.-H. Wong, K.-L. Yick, K. Chan, and K.-H. Wong, “Improving Proceedings of the 2016 conference on user modeling adaptation and
quality of teaching and learning in classes by using augmented reality personalization. ACM, 2016, pp. 141–150.
video,” Computers & Education, vol. 128, pp. 88–101, 2019. [64] J. Yang, J. Ma, and S. K. Howard, “Investigating live streaming data for
[41] N. Pellas, P. Fotaris, I. Kazanidis, and D. Wells, “Augmenting the student behaviour modelling,” in 2017 IEEE International Conference
learning experience in primary and secondary school education: a on Fuzzy Systems (FUZZ-IEEE). IEEE, 2017, pp. 1–6.
systematic review of recent trends in augmented reality game-based [65] S. S. Chaurasia and A. Frieda Rosin, “From big data to big impact:
learning,” Virtual Reality, vol. 23, no. 4, pp. 329–346, 2019. analytics for teaching and learning in higher education,” Industrial and
[42] F. Marcel, “Mobile augmented reality learning objects in higher educa- Commercial Training, vol. 49, no. 7/8, pp. 321–328, 2017.
tion,” Research in Learning Technology, vol. 27, 2019. [66] J. S. He, S. Ji, and P. O. Bobbie, “Internet of things (iot)-based learning
[43] A. Wilson, “Analysis of current virtual reality methods to enhance framework to facilitate stem undergraduate education,” in Proceedings
learning in education,” Selected Computing Research Papers, p. 61, of the SouthEast Conference. ACM, 2017, pp. 88–94.
2019. [67] E. Ghaleb, M. Popa, E. Hortal, S. Asteriadis, and G. Weiss, “Towards
[44] J. Westlake, “Exploring the potential of using augmented reality and affect recognition through interactions with learning materials,” in 2018
virtual reality for stem education,” in Learning Technology for Education 17th IEEE International Conference on Machine Learning and Appli-
Challenges: 8th International Workshop, LTEC 2019, Zamora, Spain, cations (ICMLA). IEEE, 2018, pp. 372–379.
July 15–18, 2019, Proceedings. Springer, 2019, p. 36. [68] M. Attaran, J. Stark, and D. Stotler, “Opportunities and challenges
[45] R. P. Bhatia, “Features and effectiveness of e-learning tools,” Global for big data analytics in us higher education: A conceptual model for
Journal of Business Management and Information Technology, vol. 1, implementation,” Industry and Higher Education, vol. 32, no. 3, pp.
no. 1, pp. 1–7, 2011. 169–182, 2018.
[46] U. Farooq, W. Schafer, M. B. Rosson, and J. M. Carroll, “M-education: [69] J. Barrow, C. Forker, A. Sands, D. O’Hare, and W. Hurst, “Augmented
bridging the gap of mobile and desktop computing,” in Proceedings. reality for enhancing life science education,” in VISUAL 2019-The
IEEE International Workshop on Wireless and Mobile Technologies in Fourth International Conference on Applications and Systems of Visual
Education. IEEE, 2002, pp. 91–94. Paradigms, 2019.
[47] M. Sarrab, L. Elgamel, and H. Aldabbas, “Mobile learning (m-learning) [70] V. S. Pantelidis, “Reasons to use virtual reality in education and training
and educational environments,” International journal of distributed and courses and a model to determine when to use virtual reality,” Themes
parallel systems, vol. 3, no. 4, p. 31, 2012. in Science and Technology Education, vol. 2, no. 1-2, pp. 59–70, 2010.
[48] T. White, Hadoop: The definitive guide. ” O’Reilly Media, Inc.”, 2012. [71] Y.-c. Chen, “Effect of mobile augmented reality on learning perfor-
[49] S. Dwivedi and V. K. Roshni, “Recommender system for big data mance, motivation, and math anxiety in a math course,” Journal of
in education,” in 2017 5th National Conference on E-Learning & E- Educational Computing Research, p. 0735633119854036, 2019.
Learning Technologies (ELELTECH). IEEE, 2017, pp. 1–4. [72] N. Hockly, “Technology for the language teacher: augmented reality,”
[50] A. D. Barrachina and A. O’Driscoll, “A big data methodology for ELT Journal, 2019.
categorising technical support requests using hadoop and mahout,” [73] G. Papanastasiou, A. Drigas, C. Skianis, M. Lytras, and E. Papanasta-
Journal of Big Data, vol. 1, no. 1, p. 1, 2014. siou, “Virtual and augmented reality effects on k-12, higher and tertiary
[51] M. Franklin, “Making sense of big data with the berkeley data analytics education students’ twenty-first century skills,” Virtual Reality, vol. 23,
stack,” in Proceedings of the Eighth ACM International Conference on no. 4, pp. 425–436, 2019.
Web Search and Data Mining. ACM, 2015, pp. 1–2. [74] S. Hollier and S. Abou-Zahra, “Internet of things (iot) as assistive
[52] C.-W. Tsai, S.-J. Liu, and Y.-C. Wang, “A parallel metaheuristic data technology: Potential applications in tertiary education,” in Proceedings
clustering framework for cloud,” Journal of Parallel and Distributed of the Internet of Accessible Things. ACM, 2018, p. 3.
Computing, vol. 116, pp. 39–49, 2018. [75] R. Garg and J. Kim, “An exploratory study for understanding reasons of
[53] P. P. Nghiem and S. M. Figueira, “Towards efficient resource provi- (not-) using internet of things,” in Extended Abstracts of the 2018 CHI
sioning in mapreduce,” Journal of Parallel and Distributed Computing, Conference on Human Factors in Computing Systems. ACM, 2018, p.
vol. 95, pp. 29–41, 2016. LBW024.
[54] M. Tennant, F. Stahl, O. Rana, and J. B. Gomes, “Scalable real-time [76] K. Mershad and P. Wakim, “A learning management system enhanced
classification of data streams with concept drift,” Future Generation with internet of things applications.” Journal of Education and Learning,
Computer Systems, vol. 75, pp. 187–199, 2017. vol. 7, no. 3, pp. 23–40, 2018.
[55] J. Anuradha et al., “A brief introduction on big data 5vs characteristics [77] M. Eccles, J. Grimshaw, A. Walker, M. Johnston, and N. Pitts, “Chang-
and hadoop technology,” Procedia computer science, vol. 48, pp. 319– ing the behavior of healthcare professionals: the use of theory in promot-
324, 2015. ing the uptake of research findings,” Journal of clinical epidemiology,
[56] S. Bharti, B. Vachha, R. Pradhan, K. Babu, and S. Jena, “Sarcastic vol. 58, no. 2, pp. 107–112, 2005.
sentiment detection in tweets streamed in real time: a big data approach,” [78] P. Davies, A. Walker, and J. Grimshaw, “Theories of behaviour change
Digital Communications and Networks, vol. 2, no. 3, pp. 108–121, 2016. in studies of guideline implementation,” in Proceedings of the British
[57] B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. Murthy, and C. Curino, Psychological Society, vol. 11, no. 1, 2003, p. 120.
“Apache tez: A unifying framework for modeling and building data [79] P. Nilsen, “Making sense of implementation theories, models and
processing applications,” in Proceedings of the 2015 ACM SIGMOD frameworks,” Implementation science, vol. 10, no. 1, p. 53, 2015.
international conference on Management of Data. ACM, 2015, pp. [80] S. D. Hunt, Modern marketing theory: Critical issues in the philosophy
1357–1369. of marketing science. South-Western Pub, 1991.
[58] C. P. Chen and C.-Y. Zhang, “Data-intensive applications, challenges, [81] P. D. Reynolds, Primer in theory construction: An A&B classics edition.
techniques and technologies: A survey on big data,” Information sci- Routledge, 2015.
ences, vol. 275, pp. 314–347, 2014. [82] J. G. Wacker, “A definition of theory: research guidelines for different
[59] H. Wang and A. Belhassena, “Parallel trajectory search based on theory-building research methods in operations management,” Journal
distributed index,” Information Sciences, vol. 388, pp. 62–83, 2017. of operations management, vol. 16, no. 4, pp. 361–385, 1998.

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.
[83] C. Frankfort-Nachmias and D. Nachmias, “Research methods in the
social sciences.” 1996.

Authorized licensed use limited to: Oman Virtual Science Library (Masader). Downloaded on July 18,2021 at 12:42:41 UTC from IEEE Xplore. Restrictions apply.

You might also like