Educollab: Artificial Intelligence Powered E-Learning Platform

EduCollab
ARTIFICIAL INTELLIGENCE POWERED E-LEARNING

PLATFORM
- EDUCOLLAB
PROJECT REPORT
SUBMITTED TO
GURU NANAK DEV UNIVERSITY, AMRITSAR
IN PARTIAL FULFILMENT FOR THE DEGREE OF
M.B.A (FINANCE)
(2018 - 2020)
SUPERVISED BY: SUBMITTED BY:

Dr. Mandeep Kaur Tavleen Singh
(Professor & Head) M.B.A (Finance)
SEM IV
Roll No. 27641882912
UNIVERSITY SCHOOL OF FINANCIAL STUDIES

(USFS) GURU NANAK DEV UNIVERSITY
AMRITSAR (PUNJAB) – 143005
(2020)
EduCollab
DECLARATION
This is to certify that the project report entitled “Artificial Intelligence Powered E-Learning
Platform -EduCollab” submitted by me, in the partial fulfilment for the Degree of Masters of
Business Administration (Finance) is the result of my original and independent research-work
carried out under the guidance of Dr. Mandeep Kaur, Professor & Head, University School of
Financial Studies, Guru Nanak Dev University, Amritsar (Punjab). It has not been submitted
elsewhere for the award of any other degree, diploma, fellowship or other similar title of any
University or Institution. All the ideas and references have been duly acknowledged.
Dated: July, 2020 Tavleen Singh

EduCollab
ACKNOWLEDGEMENT
I acknowledge my heartfelt gratitude to my supervisor, Dr. Mandeep Kaur, Professor &
Head, University School of Financial Studies, for her untiring guidance and support. Without
her, this journey would not have been possible. Her careful attention to the details of my
writing, valuable discussions and the constructive feedback enabled me to complete my
project report. I feel short of words to express how much I feel indebted to her.
I’m extremely grateful to Dr. Sarabjot Singh Anand, Co-Founder Tatras Data & Sabudh
Foundation, for giving me the golden opportunity to intern in his association. His dynamism,
vision, sincerity and motivation have deeply inspired me. It was a great privilege and honour
to work and study under his guidance.
I also wish to thank my mentors of Sabudh for their guidance, help and providing me with an
excellent atmosphere due to which I have been able to complete my internship. Interning at
Sabudh was a great opportunity. It gave me a wonderful platform to recognize my talent. The
whole team was very helpful and the working environment, teaching techniques and guidance
which I got here were marvellous.
A very special thanks to Dr. Jaspal Singh, Professor, University School of Financial Studies,
for always being my pillar of strength. His fatherly figure and wisdom have always guided
me to the right path.
I cannot express enough thanks to Mr. Bhavneet Bhalla, Senior Vice President at Lowe
Lintas and Partners, for being my Godfather. I’m forever indebted to him for his
thoughtfulness, enthusiasm, continuous support and encouragement. His keen interest in my
daily activities has gone a long way in my overall growth.
The completion of the project could not have been accomplished without the respondents of
the questionnaire. I also have a deep sense of gratitude for my family for their love, prayers,
support and help.
Above all, I thank God almighty for giving me the strength and capacity to complete this
project report and for letting me through all the difficulties.
Dated: July, 2020 Tavleen Singh

EduCollab
EduCollab
TABLE OF CONTENTS
CHAPTER PARTICULARS PAGE

NO. NO.
I PROFILE OF THE ORGANIZATION 1
II PROBLEM DEFINITION 10
INTRODUCTION 10
RATIONALE 27
LITERATURE SUPPORT 29
III RESEARCH METHODOLOGY 35
IV ANALYSIS & RESULT REPORTING 60
V SUMMARY 88
CONCLUSION 95
RECOMMENDATIONS 96
BIBLIOGRAPHY 97
ANNEXURE 104
EduCollab
LIST OF TABLES
TABLE TITLE PAGE

NO. NO.
1.1 ABOUT SABUDH 2
1.2 ABOUT TATRAS DATA 7
2.1 VARIOUS E-LEARNING PLATFORMS 16
2.2 BUSINESS MODEL CANVAS 20
3.1 CONFUSION MATRIX 45
3.2 DEFINITION OF TERMS 45
3.3 SAMPLE SELECTION OF STUDENTS 55
4.1 CALCULATIONS 64
4.2 GENDER 69
4.3 MODEL AND MAKE OF MOBILE 70
4.4 INTERNET CONNECTIVITY ISSUES 72
4.5 USE OF E-LEARNING PLATFORM BEFORE EDUCOLLAB 72
4.6 PROFICIENCY IN USING E-LEARNING PLATFORMS 73
4.7 LIKING THE CONCEPT OF PEER TO PEER LEARNING 73
4.8 ENCOUNTERED CYBER-BULLING 74
4.9 SATISFACTION WITH PLATFORM 74
4.10 LOGIN PROBLEM 75
4.11 BUGS ENCOUNTER 75

EduCollab
TABLE TITLE PAGE

NO. NO.
4.12 INFORMING EDUCOLLAB TEAM 76
4.13 EFFICIENCY OF EDUCOLLAB TEAM 76
4.14 WORK UPTO EXPECTATIONS 77
4.15 KEEP USING EDUCOLLAB 77
4.16 COMPARISON OF EDUCOLLAB 78
4.17 ENCOURAGE FRIENDS 79
4.18 PREFER BOOK LEARNING TO E-LEARNING 79
4.19 RELIABILITY STATISTICS 80
4.20 KMO & BARTLETT’S TEST 80
4.21 COMMUNALITIES 83
4.22 TOTAL VARIANCE EXPLAINED 84
4.23 ROTATED COMPONENT MATRIXA 85
4.24 FACTOR 1: EFFECTIVE LEARNING 86
4.25 FACTOR 2: INTERACTIVE LEARNING 86
4.26 FACTOR 3: ATTRACTIVE LEARNING 87
4.27 FACTOR 4: SAFE LEARNING 87
5.1 FACTOR SUMMARY 94

EduCollab
LIST OF FIGURES
FIGURE TITLE PAGE

NO. NO.
2.1 THE INTERNET IN INDIA BY 2020 13
2.2 USE OF AI IN OUR PLATFORM 19
2.3 ARCHITECTURE OF APPLICATION 19
2.4 MARKETING STRATEGY 21
3.1 PROFANITY DETECTION 36
3.2 PROCESS OF PROFANITY DETECTION 37
3.3 LIBRARIES USED 39
3.4 PHASES 40
3.5 DEPICTION OF LABELS AND COUNT OF COMMENTS 40
3.6 DEPICTION OF DATA 41
3.7 IMAGE LAYERS 47
3.9 PHASES 49
3.10 WORKING OF RECOMMENDER SYSTEM 51
3.12 PHASES 53
4.1 PROFANITY DETECTION ON TEXT 61
4.2 CALCULATIONS 64
EduCollab
FIGURE TITLE PAGE

NO. NO.
4.3 RESULT OF PROFANITY DETECTION ON TEXT 65
4.4 PROFANITY DETECTION ON VIDEOS 66
4.5 RECOMMENDER SYSTEM 67
4.6 EXAMPLE OF WORD CLOUD 68
4.7 GENDER 69
4.8 MODEL AND MAKE OF MOBILE 71
4.9 INTERNET CONNECTIVITY ISSUES 72
4.10 USAGE OF E-LEARNING PLATFORM 72

BEFORE EDUCOLLAB
4.11 PROFICIENCY IN USING E-LEARNING PLATFORMS 73
4.12 LIKING THE CONCEPT OF PEER TO PEER LEARNING 73
4.13 ENCOUNTERED CYBER-BULLING 74
4.14 SATISFACTION WITH PLATFORM 74
4.15 LOGIN PROBLEM 75
4.16 BUGS ENCOUNTER 75
4.17 INFORMING EDUCOLLAB TEAM 76
4.18 EFFICIENCY OF EDUCOLLAB TEAM 76
4.19 WORK UPTO EXPECTATIONS 77
4.20 KEEP USING EDUCOLLAB 77

EduCollab
FIGURE TITLE PAGE

NO. NO.
4.21 COMPARISON OF EDUCOLLAB 78
4.22 ENCOURAGE FRIENDS 79
4.23 PREFER BOOK LEARNING TO E-LEARNING 79

EduCollab
ABBREVIATIONS USED
AI Artificial Intelligence
ITMS Intelligence Traffic Management System
ANPR Automated Number Plate Recognition
WHO World Health Organization
NLP Natural Language Processing
IOT Internet of Things
DSU Daily Stand Up
ML Machine Learning
IT Information Technology
VR Virtual Reality
BJF Bhai Jaitaji Foundation
NOFM National Optical Fibre Network
IGNOU Indira Gandhi National Open University
CBT Computer Based Training
WBT Web Based Training
CSP-ICT Content(s) Service Provider- Information and Communications Technology
LMS Learning Management System
CBSE Central Board of Secondary Education
NCERT Nation Council of Educational Research and Training

EduCollab
ICSE Indian Certificate of Secondary Education
CAT Common Admission Test
IAS Indian Administrative Service
JEE Joint Entrance Examination
NEET National Eligibility cum Entrance Test
WAVE Whiteboard Audio Video Environment
IISER Indian Institute of Science Education and Resarch
TAM Technology Acceptance Model
PU Perceived Usefulness
PEOU Perceived Ease of Use
ICT Information and Communications Technology
ELAM E-Learning Acceptance Model
M-Leaning Mobile Learning
GUI Graphical User Interface
NLTK Natural Language Toolkit
CSV Comma Separated Values
TF Term Frequency
IDF Inverse Document Frequency
CNN Convolutional Neural Network
LSA Latent Semantic Analysis
PLSA Probabilistic Latent Semantic Analysis
LDA Latent Dirichlet Allocation

EduCollab
LSVM Linear Support Vector Machine
NBSVM Naïve Bayes Support Vector Machine
LSTM Long Short-Term Memory
SPSS Statistical Package for Social Sciences

EduCollab
CHAPTER I
PROFILE OF THE ORGANIZATION
EduColl
CHAPTER 1
PROFILE OF THE ORGANIZATION
Sabudh Foundation, an offshoot of Tatras Data, is a

Data Science Company-leveraging data science for
social good. It was founded on the idea that if Data
science applications help a number of businesses in
achieving business objectives and increasing revenues,
so why not use data science to create social impact.
Therefore, Sabudh Foundation was formed by the leading data scientists in the industry in
association with the Punjab government. The objective was to bring together data and young
data scientists to work on focused, collaborative projects for social benefit. It aims to enable
the youth to use powerful AI technologies for the greater good of society by working on real-
world problems. In partnership with the Punjab Government, Sabudh Foundation has formed
the Centre for Data Science for Social Good.
The Organization’s aim is to work with Government, citizens on data-driven approach for
bringing change for social good on specific projects concerning problems in the areas of:
 HEALTHCARE- Medical diagnosis and image analysis in rural hospitals.

 AGRICULTURE- Helping farmers to increase crop yields and address the issue of
crop diversification.
 GOVERNANACE- Creating crime map to identify the prime concern area based on
crowdsourcing, using social network analysis to help better town planning, etc.
 EDUCATION- Training students to work on live projects and Developing interaction
for youth with Global role models.
1
EduColl
TABLE 1.1
ABOUT SABUDH
Founded 2018
Parent Company Tatras Data
Industry Higher Education
Company Size 11-50 employees
Mentors Dr. Sarabjot Singh Anand, Dr. Satman Singh, Dr. Vikas Agrawal,
Prof. Bhiksha Raj, etc
Partners Tatras Data, Innosential, BJFI, PEC, Punjab Police, Punjab
University Patiala, Punjab Engineering College, etc
Headquarters Mohali, Punjab
Type Non-profit
Website http://sabudh.org/
HOW IS DATA SCIENCE APPLIED TO SOCIAL GOOD?
Sabudh believes that the technological advances can be used not only in making the
businesses better but have varied applications in the areas of education, public policy, health
etc. that can actually help society for the better, in functioning more efficiently. The various
projects being carried out for social good are below:
TRAFFIC MANAGEMENT:
Sabudh has signed MOU with Punjab Traffic Police to work together to build innovative
solutions for traffic control and management. It has database of accidents across Punjab,
videos from the CCTV footage across Ludhiana and audio recordings from emergency call
centre in Punjab for stress detection from the speech.
With the analysis of this data and the use of machine learning and AI, the aim is to develop
intelligent solutions to issues of road safety and traffic control leading efficient policing of
our roads.
2
EduColl
The project "TRAFFIC COPS " deals with installing AI in traffic management system as a
part of our infrastructure. The main idea behind it to reduce the manual interface to minimal.
It is a set-up of an Intelligence Traffic Management System (ITMS) which includes radar-
based monitoring, automated traffic signals, CCTV cameras to capture motorists &
commuters breaking laws, Automated Number Plate Recognition (ANPR) to directly send
challan to their homes, tracing emergency calls and protecting traffic police personnel from
health-related problems. It is an initiative taken to make an improved version of "CITY
BRAIN".
PRECISION FARMING:
AI is being used to detect early signs of disease by using drone for Ariel images which
coordinates with a hand held device. The drones and their own multispectral sensors, as well
as developing tools to train a computer program to analyse the images and classify them
based on disease progression. The pesticides are then sprayed on the affected crops using the
drones to prevent harm leading to higher crop yield.
For-example in agriculture, there are now Agrobots and drones being used to gauge the health
of the harvest that can help farmers improve their crop yield and reduce costs. With the help
of advanced technologies, we’re able to save 90% of the spraying costs. These technologies
can help states like Punjab which has always been the food basket of India to rehabilitate
food security while improving crop health. The project “SAT SRI AKAL” caters to this and
aims at bringing Prosperity at the bottom of pyramid in Punjab.
DIMINISHING LANGUAGE BARRIERS:
Owing to increasing western influence Gurmukhi is losing its identity day by day. So here
arises the need to preserve its purity. It is widely known that translators are helpful in
understanding a foreign language. Ever wonder that Language Translators can also aid in
promoting and preserving Cultural Heritage. Well, the project “LINGUA FRANCA” is an
attempt to safeguard our Cultural Heritage by Diminishing the Language Barriers.
3
EduColl
INCLUSION OF SPECIALLY-ABLED:
According to World Health Organization (WHO), 466 million people across the world have
disabling hearing loss (over 5% of the world population). So, project “ABILITY
INFINITE”
– “We choose not to put ‘Dis’ in your ability”, is being carried out in an attempt to recognise
Indian sign language for their communication. It aims at using AI for social inclusion of
specially-abled people.
MEDICAL AID:
Medicine is another field where Artificial intelligence has progressed to make the right
diagnosis and detect the disease at the right time for it to be cured. Punjab has the highest rate
of cancer of India. 18 people succumb to the disease every day, according to a recent report
published by the state government. Having AI and machine learning algorithms to diagnose
the fatal disease at an early stage can significantly decrease the mortality rate.
TRAINING PROGRAM AT SABUDH
The centre provides, the aspiring Data Scientists, to undergo six months internship and
become a SABUDH FELLOW, with potential employment offered after the completion of
the internship. Interns enrolling in this program get exposure to live projects having real
social impact, fellows, staff and partners. They are mentored by leading data scientists in the
industry and academia. During this period various topics like Core Python Concepts,
Introduction to Machine Learning, Data Exploration and Pre-processing, Supervised
Learning, Unsupervised Learning, etc are covered. Tests are conducted every now and then,
along with the weekly assignments to make sure that students are well versed with the
concepts. Each student participates in projects involving Text Analysis, Natural Language
Processing (NLP), etc. Along with this the interns are also made to participate in various
Kaggle competitions with includes forming 5-6 teams of students and conducting weekly
progress presentations. Students are expected to work closely and collaboratively with team
members onsite for the duration of the program. Hence, making the interns ‘Masters’ in Data
Science.
4
EduColl
There are many benefits for the students in store at Sabudh:
 Training in advanced technologies such as Machine Learning, AI, Cyber Security, IoT.
 Learn how to win and compete on Kaggle – Mentorship form Kaggle Grandmasters.
 Extensive network of worldwide academics and industry leaders to interact with
students.
 Access to top companies in this space for student employment.
 A conducive environment for technical and personal growth.
There are around 30 students in each batch which consists of B. Tech students. It was for the
first time that M.B.A students were given an opportunity to be a part. So, six students
including me, from our M.B.A(Finance) batch were fortunate enough to join in. Each of us
was allotted a project to work on as project manager. We were all given 5-6 B. Tech students
to work as a team and carry out the projects. I was also given access to Jira and Trello Board
to manage the project.
Our day started with a DSU (Daily Stand Up) at 10 a.m. where one by one the M.B.A
students and the Mentors there, stood up and listed out what was done by us the previous day
and our agenda for the day. The first half was devoted to the study session of 2:30 hours,
where the various topics of Data Science were taken up by the mentors. In the latter half, the
project work was carried out. Different mentors were allotted for each project according to
the domain knowledge they have about the project. They guided us as to the requirements of
the project and took daily updates on the work done by us and our teams. Dr. Sarabjot Singh
Anand, the founder, came on a weekly visit to enlighten us on the various important aspects
of Data Science. Along with this, weekly updates on the projects were presented to him by
us.
My project was to help in developing an Interactive Educational Platform –

“EDUCOLLAB”, which provides a platform for collaborating for Happy Learning.
It
is based on the idea that the best way to learn is to teach and hence it promotes peer to peer
learning between the age groups of 6-16 years. Becoming India’s first AI driven learning app,
it provides for cyber bullying identification which distinguishes it from its competitors. Since,
the access to teachers is limited and there is an increasing acceptance and adoption of digital
5
EduColl
learning in India such platform will come in handy.
6
EduColl
This platform is a product of Tatras Data, Sabudh’s Parent Company, but its team also had
members from Sabudh. Our Sabudh EduCollab team consisting of 5 B. Tech students and
myself helped the Tatras team in this project.
Tatras Data is a lab-based data science consultancy. It focuses on

the development of tools and applications to disrupt business
domains and create value through intellectual property and
transformative solutions leveraging the power of Machine Learning
(ML) and Artificial Intelligence (AI). The Company DNA is a
melting pot of team of academics, data practitioners and business
professionals with
decades of experience in the data mining and data science disciplines, delivering robust and
reliable Data Science solutions across various domains such as retail, healthcare, media, IT,
education etc.
In the year 2012, Dr. Sarabjot Singh Anand and Noah Gresham came together to form Tatras
to help technology firms implement data science related initiatives and create a conducive
environment where Data Science can flourish. They understood the complexity of business
and the data science solutions that answer the big questions. They are excited about how
disruptive technologies like AI, blockchain, VR etc are quickly becoming the core of how
businesses operate across the globe and how Application of Data Science will fundamentally
change business from this day forward.
The company is committed to providing the highest level of “cradle to grave” services
ranging from the Need Assessment Stage to the Final Implementation Stage and value to their
clients while promoting a work environment that fosters learning, cooperation and respect
among team members. It
 Delivers accelerated project implementation.

 Provides ML based business solutions.
 Develops AI powered products and platform.
 Generates intellectual property.
 Solves complex Data Science challenges.
7
EduColl
THEIR APPROACH:
The company believes in becoming their client’s true partners.

The teams always start by understanding the business needs and challenges.
And provides expertise taking into consideration both technology and people, resulting in
seamlessly integrated solutions.
TABLE 1.2
ABOUT TATRAS
Founded 2012
Offices India: Delhi, Mohali

USA
Management Team Co-Founder& Chief Data Scientist: Sarabjot Singh Anand
Co-Founder & CEO: Noah Gresham
Head of Strategy: Atishi Pradhan
Differential Deep Domain Expertise, Data Science & Business Expertise,
Powerful Data Science Innovations, Client-First Approach,
High Quality Implementation, Seamless Workflow
Integration,
Unparalleled Support, Flexible Engagement Models.
Skill Sets Data Science and Analytics: MATLAB, PLOTLY, R, SAS,
SPSS, SISENSE.
Machine Learning and Artificial Intelligence: TensorFlow,
Dialogflow, Keras, Mahout, Azure, OpenCV.
Data Tools and Deployment: Kettle, MySQL, AWS, Kafka.
Eb Technologies & Tools: GitHub, mongoDB, IONIC, Magento
Clients BOLD, Compro, Dainik Bhaskar, GoodUnited, Golf locker,
Delhivery, CSSL, Foot Locker, myrefers, etc.
Professionals 50+
Global Clients 20+
Domains 10+
Global Offices 2
8
EduColl
OTHER ASSOCIATIONS:
Tatras and Innosential Labs have joined hands for providing quality
training in data science. In August 2017, Tatras and Innosential Labs,
in association with Nasscom conducted a 5-day Data science
Masterclass at Bengaluru. The masterclass was delivered by the
leading lights in the industry from across the globe. The Data Science
Masterclass was a huge success with 200+ attendees and 500+
companies.
Bhai Jaitaji Foundation India (BJF India) helps empower the rural
youth, who are constrained due to their socio-economic condition, to
realize their potential through provision of specialized coaching and
continuous mentoring. They ensure eligible beneficiaries get all the
available financial and other support from the government and
other partner
organizations. Tatras has joined hands with BJFI to foster altruistic values and commitment
to social change among volunteers.
9
EduColl
CHAPTER II
PROBLEM DEFINITION
1
EduColl
CHAPTER 2
PROBLEM DEFINITION
INTRODUCTION
Digital transformation is the buzzword today. Propelled by the falling costs and rising
availability of smartphones and high-speed connectivity, India is already home to one of the
world’s largest and fastest-growing bases of digital consumers and is digitizing faster than
many mature and emerging economies. These technological advancements have an impact on
almost every aspect of our lives on an ongoing basis.
In September 2015, Indian Government launched the revolutionary reform Digitization in

India, under which it envisioned increased internet connectivity and making India a digitally
empowered nation. For the government, the shift to digital has primarily been about
transparency and reach—taking services and resources in the healthcare, education and
financial sectors to the rural population of India. Thus far, digital has been characterised by
four aspects—social media, mobility, analytics and cloud— commonly called SMAC PwC
and IACC Report (2017).
We often hear about how technologies like Artificial Intelligence, Machine Learning, Cloud
Computing have made our lives easier. But what is that one thing that underpins all these
revolutionary technologies! Is it the service? No! It’s DATA.
Data has also evolved dramatically in recent years, in type, volume, and velocity and has
become the new business currency.
However, data on its own is not enough to grow the business. We must know how to derive
useful insights from it. This is where Data Science comes into picture. Data Science is the
process of using data to find solutions to predict outcomes for a problem statement. For
example, Apple uses Data Science to build a watch that monitors health of an individual.
Similarly, Uber cab providers use algorithms to variate their cab fares according to the
demand and the peak times. Banks analyse various customer credentials/customer related info
such as payment history, products held, and credit history to determine the credit
worthiness of a
1
EduColl
customer and digitally embed it in various channels to offer pre-approved loans and instant
disbursements.
In addition, analytics and Artificial Intelligence (AI) are being integrated with digital
channels to provide instant recommendations on products that will best suit the customer. For
example, a college graduate can be offered an education loan while a married couple with
young children can be offered child investment plans and life insurance policies along with
relevant loan products.
While banks are still trying to achieve this effectively, many organizations have already taken
the lead and set awe-inspiring benchmarks—a case in point being recommendations provided
by Netflix and Amazon based on the customer’s viewing and purchase/browsing history.
Over and above, social information is providing a rich data of hitherto unknown insights
about the customers, such as interests and hobbies, which further helps in offering relevant
services.
Organizations can now self-determine the needs of customers through behavioural analytics.
Analytics can help them understand not only what customers say they want vis-a-vis what
they don’t say they want but also what they really need. It can help organizations understand
the profile of customers and their behaviour across digital touch points. These insights can
then be used for driving sales, services, and personalizing their digital channel experience
Purohit Anup (2017).
Data science is enabling the next generation of enterprise software, resulting in solutions that
tell users what is going to happen and what they should do about it today. It is the only sure-
fire way of creating and validating solutions to improve decision-making across the board.
For modern, forward-thinking businesses that find themselves with more data than they know
what to do with, the appliance of data science will be the difference between sink or swim
Nejmeldeen Ziad (2016).
Honestly, there is no segment of business or society alike which is untouched by the data
science interventions. The impacts of data science are also visible in the field of education
and have effected major changes in how education is being imparted and consumed. Rote
learning and reliance on printed material or book-based learning are fast becoming a
characteristic of the past.
1
EduColl
Till the end of the last century, the education system in India was working on the traditional
classroom-based learning, where the students didn’t get the opportunity to participate in the
interactive sessions. To face the challenges of the changing time, it became necessary to
make concepts clearer and students competent enough to cope up globally. Hence, the
concept of Digital Learning evolved in 2002 - 2003. With technology spreading its wing to
the education sector, the typical classroom which was once characterized by boring hour-long
sessions now transformed into an interesting, fun-filled environment. Digital education made
life easier for both, students and educators Malhotra Monika (2018).
E-learning platforms are bringing a measurable difference in students' engagement and

performance. It is reducing gaps in the delivery of education and giving a new dimension to
the education space. Online education goes beyond the realms of secondary, post-secondary
and tertiary education. It also includes courses and modules for competitive exam
preparation, professional skill enhancement, and other non-academic subjects.
These educational platforms act as a great asset to ones’ learning as they provide a
combination of innovative learning and core learning. Many apps include animated videos to
enhance the learning experience on their app. No matter what ones’ learning goals are, there
are apps for almost every subject, exam, or Interest.
Their importance has increased all the more due to the pandemic situation created by corona
throughout the world. It is through these platforms only that the students are still able to
pursue their education and develop various skills in the times of lockdown.
The E-learning industry in India is a prolific one, witnessing a steady growth rate of 25 per
cent year-on-year and is projected to be a $1.96 billion industry by 2021. With a network of
more than 1.5 million schools and 18,000 higher education institutes, the market for digital
education in India is enormous. Today, digital learning is no longer a luxury but the
implementation of digital tools of learning has become a necessity in schools Malhotra
Monika (2018).
1
EduColl
Key Drivers for growth of E-learning being;
1. Growth in internet and smartphone penetration:
The number of internet users is expected to reach 730 million by 2020. India may replace
China to have the second largest users after the US.
FIGURE 2.1
Source: http://www.aurumequity.com/the-online-education-industry-in-india-present-and-
future/
The young demographic (15-40 years) who are the most active consumers of smartphones
and internet, look for online learning modules to fulfil their educational requirements at low
cost without having to move out of home, office or city. The internet offers huge accessibility
to enrol for distance courses, degrees and certifications from around the world to urban as
well as rural, and mentally or physically restrained population.
2. Cost of online education is low:
Online education providers can reach out to the masses without setting up a physical
infrastructure or incurring administrative costs such as staff salaries, stationery, books, etc.
Hence, the cost savings are passed to the users. Also, students do not have costs associated
with commuting to a campus, living expenses, etc.
1
EduColl
3. Traditional model unable to fulfil the additional capacity:
The aim of the government is to raise its current gross enrolment ratio to 30% by 2020. India
will have the world’s largest tertiary-age population and second largest graduate talent
pipeline globally by the end of 2020.
However, the existing educational infrastructure is unable to meet the additional capacity.
The e-learning can supplement the conventional model, and bridge the gap to a considerable
extent.
4. Digital-friendly government policies:
Several programmes under the initiatives such as ‘Digital India’ and ‘Skill India’ to spread
digital literacy, create a knowledge-based society in India, and implement three principles
‘access, equity and quality’ of the Education Policy have been launched. This will be helpful
in transforming our nation and creating opportunities for all citizens by harnessing digital
technologies
In order to establish digital infrastructure, the government has also launched National Optical
Fibre Network (NOFN) which aims to expand broadband connectivity and faster network.
5. Demand among working professionals and job-seekers:
The Indian job scenario is currently reeling under the twin pressure of layoffs and job
paucity, especially due to automation and slow-down in the global economy. According to a
World Bank report, automation is threatening 69% of jobs in India. There have been massive
layoffs in IT, BFSI, Telecom and Manufacturing sectors, and people are being replaced by
technology driven by machine learning and artificial intelligence.
Owing to all these factors, both job-seekers and working professionals feel a need to gain,
refresh or enhance skills through career advancement courses, which could increase their
chances of landing better jobs, switch jobs, get promotions, negotiate better pay packages and
stay industry-relevant. Online career courses are affordable, give hands-on knowledge, can be
completed in one-fourth time that of an offline course, and offer flexibility in terms of
personal schedule. They can be done anywhere, anytime at one’s convenience.
1
EduColl
EVOLUTION
Open and distance learning in India dates back to the 1960s. By the 1980s there were 34
universities offering correspondence education through departments designed for that
purpose. The first single mode Open University was established in Andhra Pradesh in 1982,
followed by the Indira Gandhi National Open University (IGNOU), and subsequently in
Bihar, Rajasthan, and Maharashtra, Madhya Pradesh, Gujarat, Karnataka, West Bengal, and
Uttar Pradesh (established throughout 1980s and 1990s).
According to Education World Special Report (2018), The first teaching-learning

innovation of the IT (information technology) industry that flowered in the mid-1980s was
computer- based training (CBT) which enabled learners to use study materials stored on CD-
ROMs. As internet penetration grew and broadband connectivity was invented at the turn of
the century, web-based training (WBT) utilising digital content stored on CD-ROMs and
giant servers facilitated interactive online learning.
Gonella L. and Panto E, researchers at CSP-ICT Innovation, Italy in their paper on e-

learning (2008), have traced the following four stages in the evolution of online education
which promises the most dramatic global knowledge explosion in world history since the
invention of the printing press by Johannes Gutenberg in 1440.
E-Learning 2.0.
Online Education
E-Learning 1.0.
Web-Based Training
1
EduColl
E-LEARNING PLATFORMS IN INDIA
Today, there are enormous number of e-learning platforms in India itself, providing a wide
variety of learning experiences and products ranging from personalised learning, course-
work, practice papers, entrance exam preparations, career counselling to parent connect
facilities. The list is endless and so is its evolution and competition. These platforms are in
the form of websites and apps. Some of these are shown in the table below.
TABLE 2.1
VARIOUS E-LEARNING PLATFORMS
BYJU'S Excelling in the market is Byju’s- The Learning App. It is a

Bangalore- based educational technology (EdTech) and online tutoring
firm founded in 2011 by Byju Raveendran at Bangalore (India).
It provides Study Materials: CBSE, NCERT Solutions, ICSE, CAT,
IAS, JEE, NEET, Commerce, State Boards, Government Exams, Kids
Learning, Academic Questions, Test Preparation, free live classes, buy
a course option, free counselling, etc.
BRAINLY Brainly is a peer-to-peer learning community and educational
technology company based in New York City, New York, United
States and Krakow, Poland.
Brainly provides questions and answers for students and parents
looking for help with homework-related tasks relating to Hindi, Math,
History, English, Geography, Biology, Physics, Chemistry, Social
Sciences, Music, Business Studies, Psychology, Accountancy, CBSE
board X,
CBSE board XII, etc
QUORA Quora is an American question-and-answer website where questions
are asked, answered, and edited by Internet users, either factually or in
the form of opinions.
It relates to Q/A-Feed, Television Series, Fashion and Style, Pizza,
Recipes, Photography, Visiting and Travel, Cooking, Health, Music,
Technology, Sessions.
1
EduColl
VEDANTU Vedantu is an interactive online tutoring platform where teachers

provide school tuitions to students over the internet, using a real-time
virtual learning environment named WAVE (Whiteboard Audio Video
Environment), a technology built in-house. It is said to operate on a
marketplace model for teachers, where students can browse, discover
and choose to learn from an online tutor of their choice. Currently the
company's primary business is live online tutoring in science, English,
Mathematics, Hindi, Sanskrit, German, Test Preparation; etc.
TOPPR
Toppr is an eLearning Program which brings Toppr's top-notch
guidance on the digital platform. It is a platform designed to help
students ace the cut throat competition of today. It deals in class
subjects from 5-12; entrance Exam Preparation: Engineering, Medical,
commerce; scholarship; etc
CHEGG
Chegg, Inc., known as Chegg, is an American education technology
company based in Santa Clara, California, with over three million
subscribers. It provides digital and physical textbook rentals, online
tutoring, and other student services like Textbook solutions, Expert Q
& A Writing- plagiarism checker, grammar checker, Flashcards, Math
Solver, Tutors, Internships, Test Preparation and scholarships.
KHAN Khan Academy is a non-profit educational organization created in

ACADEMY 2008 by Salman Khan with the goal of creating a set of online tools
that help educate students. The organization produces short lessons in
the form of videos on subjects like General Math, Science,
Science &
engineering, Computing, Arts & humanities, Economics, test prep, etc.
MERITNATION Meritnation is an online education platform that provides live classes,
study materials and animated videos for school students.
It deals with Class 1-12th, Entrance Exam Preparation: Engineering,
Medical, and other online Tuitions, NCERT Solutions, Board Paper
Solutions, Textbook Solutions.
QUIZLET Quizlet is a mobile and web-based study application that allows
students to study information via learning tools like study material and
games, etc
1
EduColl
Today, the schools have been shut all across the world due to the COVID-19. Globally, over
1.2 billion children are out of the classroom. Due to which, education has changed
dramatically, with the distinctive rise of e-learning, whereby teaching is undertaken remotely
and on digital platforms. According to a study by Velocity MR, a leading market Research
and Analysis company, 72 per cent Indians prefer online or e-learning as compared to
traditional classroom training. Indian demography is ideal for online learning since many of
the learners come from rural or semi-rural areas where educational facilities, be it school,
college or entrance examination level, is below par.
Despite the presence of multiple e-learning platforms their demand far exceeds their supply.
This is due to the gap in the education technology market since there is still enough room for
innovation and advancement. The project EDUCOLLAB was initiated with this objective.
ABOUT EDUCOLLAB
EduCollab is an Interactive Educational Platform which provides

a forum for collaborating for Happy Learning. It is based on the
idea that the best way to learn is to teach. Becoming India’s first
AI driven learning app for the age of 6-16 years, it provides for
cyber bullying identification which distinguishes it from its
competitors. It has been a finalist project at Young Founder’s
Summit at Beijing last year. The project development is currently
on stream.
WORKING/FUNCTIONALITY:
• EduCollab is easy to use and easy to get results with. The basic idea is that a student can
ask a question, answer a question asked by another student and personalise their learning.
Various courses are also available to the students.
• The answers are screen-recorded and hence it serves as an interactive platform for the
students.
• The recommender system deployed helps to organize the data and provide the user with
1
EduColl
the most appropriate content.
2
EduColl
• AI is used to Identify inappropriate content and signs of cyberbullying, translate content

between languages and Direct questions to those students most likely to give best
answers.
FIGURE 2.2
USE OF AI IN OUR PLATFORM
AI
Identify inappropriate Direct questions to those

Translate content
content and signs students most likely
between languages.
of cyberbullying. to give best answers.
 The platform has a Leader-board which shows the student of the week/month. This
student is chosen using the criteria of maximum coins earned, maximum answers given,
etc.
 It has an appropriate coin system and a rating system for the content put up.
 Built in Flutter, the app uses backend services with python and flask to access data stored
in mongo DB, the database. Videos are recorded using Youtube and Google helps in
language translation.
FIGURE 2.3
ARCHITECTURE OF THE APPLICATION
2
EduColl
Source: EduCollab Pdf
2
EduColl
MISSION: To provide a flexible and safe learning platform through

open dialogues and enquiry
VISION: Nurturing and Empowering young minds through

technology and innovation.
TABLE 2.2
BUSINESS MODEL CANVAS
KEY KEY VALUE CUSTOMER CUSTOMER

PARTNERS ACTIVITIES PROPOSITIONS RELATIONSHIPS SEGMENTS
Schools Interactive Kid to Kid Community

Educational
Governments Teach to Learn Rewards School Students
APP/P
between 6-16
UN Complement to
CHANNELS years
Organizations KEY School Learning
RESOURSES
AI Engineers
App Stores
and Software
Developers Direct Download
Security
Accessibility
Simplicity
COST STRUCTURE REVENUE STREAMS

Maintenance and Development of Monthly Fees,
Application, marketing cost – advertising
Sponsoring and Publicity
and merchandise, hosting cost (Amazon
Web Services)
Source: EduCollab Pdf
2
EduColl
PROPOSED GO TO MARKET STRATEGY:
A marketing strategy refers to a business's overall game plan for reaching prospective
consumers and turning them into customers of the products or services the business provides.
The aim of the platform is to create a hub where the ‘Manmohans’ and the ‘Kalams’ of the
present generation can educate their peers who have limited access to quality education.
Therefore, a well-planned market strategy has been proposed to reach out to the masses and
have a deep penetration of the platform in the entire country.
FIGURE 2.4
MARKET STRATEGY
Product: to provide a platform where the students can indulge in peer to peer
learning under the supervision of their instructors
Price: to charge nominal prices so that all the students can have an access to
unlimited quality education.
Place: to reach every knook and corner of the country through direct
download or using app store or going to the homepage of the website.
Promotion: to reach out to the masses various promotional strategies have

been used like:
Marketi
The top contributing students will be awarded merchandise to motivate them

to learn and participate more and more.
Attending top EdTech Conferences to promote EduCollab as well as to know
ng
what else is happening in the market so as to work upon it.

Partnering with global education publishing houses, etc
People: to provide an team who can meet the interest of the customers in the
best possible manner, be it customer care, after sales service, or anything.
Process: to make the platform user-friendly so that the students can operate
the platform without the help of their parents.
Physical Evidence: to provide the access to the students through an

interactive application and website, which is easy to use and easy to get
results with.
2
EduColl
• The app and the website will be launched in three phases: Sabudh release, Beta
release and the General release. Sabudh and Beta release has already been done and
we are just left with General release. In the Sabudh release, the platform was made
available to the Sabudh students and in the Beta release the platform was launched in:
 Heritage Schools, New Delhi and Gurgaon
 Khaitan Schools, New Delhi, Noida and Ghaziabad
 The British School, New Delhi
 IISER institute, Mohali
In the General release, it will be made available to anyone who wants to enjoy the
platform.
The launch has been planned out in these phases so that it is tried and tested at every
phase and by the time it reaches the public, EduCollab is beyond perfect.
• It has also been planned to recruit further schools from the advisors’ networks
of over 1500 schools in India
REWARD SYSTEM
A little thoughtfulness can go a long way.

Rewards are proven to be an effective method to arouse interest among the students and
motivate them to take part more efficiently. They create a feeling of pride and achievement
among them.
We, through EduCollab seek to do this by using our own version of reward system which is
in the form of coins. It will not only help the students to stay on track and work hard to
achieve their goals but will also promote positive and appropriate behaviour among students.
Coins provide a forum for gamification. Each student will be allotting some particular
number of coins along with the question asked, which will be earned by another student who
answers it correctly after the student who has asked this question approves it. Hence, the
students compete with one another to give the most appropriate answer to the question put by
another student and earn coins. QUENCHING EACH OTHER’S CURIOSITY in the
process.
2
EduColl
Our coin system is based on three basic functionalities – Coins to New User, deducting coins
on asking a question and adding coins on getting an answer approved.
 Coins to new user - Students will get 50 coins on their sign-up.

 Coins on questioning – student asking the question has to assign some coins
(according to the coins with him) to the question.
 Coins on answering – when the student who has asked a question approves the
answer given by another student/students, the latter student would receive coins.
 In case, student asking the question approves only one answer, all the coins
assigned for that question would go to the student answering that question.
 In case, student asking the question approves the answer given by more than
one student, the assigned coins would be divided amongst them equally.
Coins on answering are allocated only after the question has been marked ‘DONE’ by
the student asking the question.
A proposal at hand is that these coins could be redeemable e,g buying starbucks coffee,
fortnite game, etc.
TASKS ASSIGNED FOR EDUCOLLAB
I was assigned the work of project management. As mentioned earlier, I had a team of 5 B.
Tech students for this project. As a project manager, I reported and discussed matters with the
Sabudh mentors and the Tatras team working on EduCollab on a daily basis. We worked on
the various pieces of the platform both at backend and frontend. The major tasks included
profanity detection on text, profanity detection on videos, some inputs on recommender
system.
 Profanity Detection on text: this task was taken up to detect the use of cuss words or any
such words, in the questions; answers and comments put by students, which could hurt
the feelings of other students or have negative impact on them. So that appropriate action
can be taken.
 Profanity Detection on videos: this was done to detect and take action against the use of
images, signs, words, etc which could hurt the students’ feelings and take the form of
cyberbullying.
2
EduColl
 Recommender System: it was worked upon so that the best possible data was directed
towards the user as per his taste.
The minor tasks were:
 Comparative analysis of the various e-learning platforms: it included various fields like
their investors (Funding rounds, Funding amount, total number of investors, Number of
lead investors), the products and services they deal in, their cost and revenue streams, etc.
 In depth study of Brainly.
 tracking and listing out bugs and recommendations as and when the updated link came.
 figuring out the basic functionality needed.
 listing out the various ways the students can be cyberbullied.
 planning out the coin system and creating documentation on it.
 proposing various layouts for the homepage of the website.
 creating content for the homepage.
 suggesting logo design.
 communicating the issues and recommendations given by the students after the Sabudh
release of the app.
 Conducting survey to gauge the students’ perception towards EduCollab.
For the communication purposes with the Tatras Team, I was made a member of the Slack
and also of Jira, the Scrum-board for project management where, to post epics and stories of
the tasks. I also used Trello board for managing project with my B- Tech team.
CURRENT STATUS
It has been released at Sabudh, where students are using it every-day to go through the daily
lecture sessions plus for clearing each other’s doubts by asking and answering questions.
It has become all the more useful since the students can carry out their regular studies in this
time of lockdown.
Students also report the bugs they come across, if any and recommendations they have for
feature enhancements. All these reports were tracked by me and were forwarded to the Tatras
Team.
Soon there will a beta release of the app, for the further testing before the general release
2
EduColl
SOME SCREENSHOTS OF THE APPLICATION AND THE WEBSITE
2
EduColl
Source: EduCollab application and website
2
EduColl
RATIONALE
work, practice papers, exam preparation, career counselling to parent connect facilities. The
list is endless and so is its evolution and competition. Despite this, there seems to be a gap in
the education technology market. This is so because there is still enough room for innovation
and advancement.
It has been observed that students who teach in peer to peer sessions in schools themselves
learn better. It helps to fill the gap in their own understanding of the topic. Quora is widely
used by adults for peer to peer learning but it is an open system hence an unsafe environment
for students.
Unfortunately, the advancement in technology has exposed people to serious risk of abuse
online. Not all people on the Internet are interested in participating nicely, some perceive it as
an avenue to vent their rage, insecurity, and prejudices. As a result, the cases of cyber-
bullying are an area of concern. Each case of cyberbullying has the potential to cause damage
and their severity and impact are highly dependent upon the vulnerability of individual
victims. Bullying, no matter whether it is traditional bullying or cyberbullying, causes
significant emotional and psychological distress, leading to anxiety, fear, depression, and low
self-esteem.
As high as 37% of parents in India report that their children are subject to cyberbullying.
(Source) In fact, just like any other victim of bullying, cyberbullied kids experience anxiety,
fear, depression, and low self-esteem leading to significant emotional and psychological
distress.
Taking a cue from this, a need was felt to create a platform which not only promotes
peer to peer learning but also protects the students from the crimes/signs of
cyberbullying and inappropriate content. The project EDUCOLLAB was initiated with
this objective.
The need for a safe platform has increased all the more amidst this pandemic situation due
the fact that the entire world has resorted to online channels whether be it for education or
work.
3
EduColl
OBJECTIVES:
The main purpose of the present study is to reduce the gap in the education technology
market (in India). The objectives to be achieved are:
1. To develop a model to protect the students from cyberbullying and inappropriate content.
2. To create a Recommender System that provides the students with the questions and
answers of their preference.
3. To understand the perception of the students towards EduCollab after its Beta release.
LIMITATIONS OF THE STUDY
1. Finding out the right source of data for both profanity on text and images.
2. Finding the right algorithm for profanity on text.
3. Checking whether the data we collected for images was biased or not.
4. Finding the right architecture for images task.
5. The time constraints limited the scope of study.
3
EduColl
LITERATURE SUPPORT
Review of Literature is the backbone of every research study. It is important to review the
literature to have an overview of what kinds of studies have been conducted and what are the
gaps in the literature. Here, an attempt has been made has been made to present an evaluative
report of studies found in the literature related to researchers selected area on evolution and
growth of E-learning and the upcoming trends therein. This review will be helpful in
identifying the research gaps, underlying the need of the present study.
The major aspects reviewed under this study include:
 To develop a model to protect the students from cyberbullying and inappropriate content.
 To create a Recommender System that provides the students with the questions and
 To understand the perception of the students towards EduCollab.
O'Donnell & King (1999) claimed that peer-learning strategies are valuable tools.
They argued that the outcomes of peer learning ultimately depended on learning design
strategy, course outcomes or objectives, teachers’ facilitating skills, and the commitment of
students and teachers. Importantly, the teacher must consciously orchestrate the learning
activity and choose the appropriate method for undertaking peer learning. Only then will
students in fact engage in peer learning and reap the benefits.
Söderlund (2000) pointed out that an important supporting structure for learners is the
social interaction with other learners, in which they are able to form and give expression for
their thoughts, exchange ideas and share these with others, and jointly reflect on various
phenomena. This in turn establishes a ground for processes within the individual learner, and
deepens the understanding of the learning process. In their learning processes, learners use
different resources that are only partly created or offered by the teacher. Learners also use
resources available in their close environment, at work or at home. To this one can add the
ever-increasing use of computers and different communication technologies as yet other
learning resources.
3
EduColl
Boud (2001) claimed that students in peer learning situation construct their own
meaning and understanding of what they need to learn. Essentially, students would be
involved in searching for, collecting, analysing, and evaluating, integrating and applying
information to complete an assignment or solve a problem. Thus, engaging themselves
intellectually, emotionally and socially in “constructive conversation” and learning by talking
and questioning each other’s views and reaching consensus or dissent.
Reed et al. (2001) conducted a research focused on the frequency and scheduling of
rewards and their relation to motivation and results. Rewards may be given in a continuous
fashion (on a pre-determined schedule) or on a variable schedule (Skinner, 1938). The
frequency with which rewards are given has been found in a number of studies to affect the
probability that desired behaviour would be repeated.
Keller Christina & Cernerud Lars (2002) conducted a study with students at
Jönköping University in 55 Sweden as an example. The students had experiences from two
years of e‐learning on campus. Students (n = 150) filled in a questionnaire with closed as well
as open‐ended questions. The answers were analysed in a multiple regression analysis,
putting the students’ perceptions in relation to gender, age, previous knowledge of computers,
attitudes to new technology, learning styles and the way of implementing e‐learning at the
university. Advantages and disadvantages of e‐leaming were categorized in a qualitative
content analysis. The main conclusion from the study was that the strategy of implementing
the e‐leaming system at the university was more important in influencing students’
perceptions than the individual background variables. Students did not regard access to e‐
learning on campus as a benefit. Male students, students with previous knowledge of
computers and students with positive attitudes to new technologies were all less positive to e‐
learning on campus than other students. Another aspect that must be considered is that of
gender. It is of great importance especially when luring students to a university.
Johansson (2003) stressed that the conditions for socialisation and learning change
with the introduction of new media technology. Learning and education can also be affected
as students and teachers become dependent on the technology.
Liu et al. (2005) developed a theoretical framework, in order to predict a user's

acceptance behaviour of e-learning. This was done to explain students‟ intentions to an e-
learning system using TAM and flow theory. Additional variables that were investigated are
different presentation types (Text audio, Audio-video. Text-Audio-video) and concentration.
3
EduColl
They found the difference in presentation types as well as concentration to have a significant
impact on usage intentions.
Landry et al. (2006) made use of TAM (Technology Acceptance Model) to measure
student's acceptance of web-based e-learning tools. In both studies TAM is found to perform
well with the main hypotheses being supported and a total variance in usage intentions
explained with a little less than 40%. The relationship between university students'
perceptions of ease of use and usage of Blackboard elements was fully supported but varied
at different levels. As originally hypothesized by Davis (1989); Landry's et al. (2006)
findings suggest that if students perceive Blackboard to be easy to use, they would also
perceive Blackboard to be useful. Usefulness turned out to be the strongest determinant of
usage intentions.
Robert Agnew (2006) in his “General Strain Theory,” hypothesized that the strain and
stress exerted on an individual as a result of bullying “can manifest itself in problematic
emotions that lead to deviant behavior,” possibly leading to delinquency. This theory stresses
the vicious cycle that many teens may go through while being victimized. The cyclical
repercussions of this process are particularly alarming if it leads a victim to antisocial
behaviors when they try to find an outlet for their emotions.
Roca et al. (2006) investigated student's intention to continue using an e-learning

system. As the focus was on continued use, a satisfaction construct was proposed. They
suggested that the impact of the two TAM variables PU (Perceived Usefulness) and PEOU
(Perceived Ease of Use) on continued use is mediated by the satisfaction. They broke down
the component perceived performance into perceived quality and perceived usability and
further proposed the constructs information quality, confirmation, service quality, system
quality and cognitive absorption as antecedents of satisfaction. found support for their
proposed model, yet again, PU turned out to be the strongest determinant.
Manochehr Nick (2007) attempted to compare the effects of e-learning versus those of
traditional instructor-based learning, on student learning, based on student learning styles.
Another goal was to determine if e-learning is more effective for those with a particular
learning style. They examined the dependent variable of student knowledge based on the
learning style of each subject and the learning method to which each was exposed. The
results revealed that for the instructor-based learning class (traditional), the learning style was
irrelevant, but for the web-based learning class (e-learning), the learning style was
significantly
3
EduColl
51 important. The results of this research paper revealed that students’ learning styles were
statistically significant for knowledge performance.
Lim et al. (2008) used questionnaires adapted from the research instruments used by
Poon, Low, and Yong (2004) to measure distance learners’ acceptance of e-learning. They
measured learners’ acceptance by students’ characteristics, instructors’ characteristics,
technology support and system, institutional support, course content and knowledge
management, and online tasks and discussion groups. They highlighted that well-designed
course content provided students with better learning experiences and helped students with
easily information access. In their study, the results indicated that students had moderate level
of e-learning acceptance for the factor of technology and system. It was also stated that that
an e-learning system or a web-page with harmonious configuration of colour and background
enhanced students’ interest to study. Attractive combination of colors with appropriate
graphics and animations on web sites were useful in delivering information in a user-friendly
way.
Roberts (2008) stated that peer learning can lead to development of self-directed
learning skills; critical and creative thinking and problem-solving skills; communication,
interpersonal and teamwork skills; learning through self, peer assessment and critical
reflection; and increased understanding of concepts, skills and enhancing self-image.
Banerjee, J. & Bose, I. (2011) have conducted a study entitled as “Higher Education
Through Mobile Learning: An Analysis of Students From Kolkata”. The objectives of the
study were to find the percentage of respondents who were interested in M-learning mode of
management education and the reasons for preference of mobile based education compared to
traditional method. It was found that 80% of the respondents were aware of the M-learning
platform and that 78% of the respondents were willing to opt for M-learning courses. Also,
56% of the respondents were willing to take management courses on M-learning mode, if
offered. Findings revealed that awareness level regarding the M-learning and number of
people willing to take courses through M-learning mode were quite high. Indicating there is
the tendency of high deviation regarding choice of course through M-learning and that Mean
of preference of management course if offered through M-learning is above average.
Kakoty Sangeeta et al. (2011) analysed the current e-learning and recent market of e-
learning procedure. This study shows that globalization of education, cross-culture aspects
and culturally complex student support system in distance education as well as in e-learning
environment is a prospective research area. Improvements in these areas could be made by
3
EduColl
integrating new technologies and ICT tools. The ELAM (E-learning Acceptance Model)
identifies four determinants of e-learning acceptance are – (1) Performance expectancy, (2)
Effort expectancy, (3) Social influence and (4) facilitating conditions. The main contribution
of the paper is that it presents a framework to understand e-learning acceptance as governed
by the teacher, students and institutional factors.
Makoe, M. (2012) has conducted a study entitled as “The Pedagogy of Mobile

Learning in Supporting Distance Learners.” Its objectives were to investigate the pedagogic
approach that best support effective use of cell phones in the distance education context.
Findings reveal that Cell phones can be used as a tool to facilitate interaction through
synchronous and asynchronous learning. Students could be encouraged to use cell phone
social networks such as Whatsapp, BBM to form study groups and work collaboratively on
projects, to enhance interaction through weekly self-assessment quizzes where students can
test themselves on basic factual information and to pace themselves as they go through their
study material.
Sarrab, M. Elgamel, L. & Aldabbas, H. (2012) have conducted a study entitled as

“Mobile Learning (M-Learning) and Educational Environments.” The objectives of the study
were to discuss the background of mobile learning and how it could be used to enhance the
whole E- Learning system and to highlight the benefits and future challenges of mobile
learning in our educational environments. It was found that M-Learning makes the merge and
connection between technology and education possible. M-learning could be used to solve the
traditional learning system problems and could complement the learning process in our
schools and universities. Both teachers and students need a proper and handy system to
interact with each other and this could be facilitated through M-learning.
Rueckert, D. Kim, J.D. & Seo, D. (2013) have conducted a Study entitled as
“Students’ Perceptions and Experiences of Mobile Learning” with the objective to find out as
to how students perceive the use of mobile devices to create a personalized learning
experience outside the classroom. The findings of this study suggested that mobile
technologies have the potential to provide new learning experiences. The fact that the
students’ TACI scores dropped significantly after participating in these activities indicates
that the use of mobile technologies in these classes opens up new avenues for interaction and
learning. The participants became more willing to adopt new technologies into their own
lives, which revolve around teaching English as a profession. The t-test results indicated
statistically significant changes in their views towards mobile technology.
3
EduColl
Ceobanu and Boncu (2014) investigated in a theoretical manner the challenges

associated with the use of mobile technology in adult education. They argued that mobile
learning (mLearning) can be placed at the connection of eLearning and mobile computing,
which is differentiated by the capability to access learning resources anywhere, anytime,
through high capabilities of search, high interaction, high support for effective learning and
ongoing assessment based on performance. Also, mLearning considered to be as an extension
of eLearning, but characterized by its independence from a location in space and time.
Furthermore, mLearning comprises the use of mobile technology in the service of the
processes related to teaching and learning. The mLearning can be considered as the point
where mobile computing and eLearning meet to create a learning experience that can be
commenced anytime and anywhere. ()
Saxena, A. & saxena, A. (2015) conducted a study entitled as “A Viewpoint and

Attitudes of Students’ towards Future of Mobile Learning in Education Industry of India”.
The aim was to compare the viewpoint of student’s towards future of mobile learning in
education industry of India with respect to gender and to draw the general attitude of
student’s towards future of mobile learning in education industry of India. The result
indicated that there was no significant difference found as far as the students’ viewpoint
towards future of mobile learning in education industry of India with respect to gender. It was
also found that majority of students showed a high number of positive attitudes towards M-
leaning and that there was a general agreement among the students who saw the bright side of
M-learning.
Konwar (2017) found that college students have positive attitudes towards e- learning
and there is no significant difference in attitude towards e- learning between male and female
students on the basis of their locality.
3
EduColl
CHAPTER III
RESEARCH METHODOLOGY
3
EduColl
CHAPTER 3
RESEARCH METHODOLOGY
market (in India).
Based on the conceptual and theoretical framework, this chapter focuses on describing the
database, data collection, data pre-processing, modelling and the detailed elucidation of the
Statistical and Data Science tools used in the present study. The research framework to
achieve the objectives using different research techniques and approaches have been
discussed below.
OBJECTIVE 1: TO BUILD A MODEL TO PROTECT THE

STUDENTS FROM CYBERBULLYING AND INAPPROPRIATE
CONTENT
This is the most important objective of the present study and catering to it gives an edge to
our platform. With the advancement of technology, the cases of cyberbullying have been on
the rise. As mentioned earlier, cyberbullying causes significant emotional and psychological
distress among students hence the students should be provided with a platform where
appropriate measures of profanity detection are taken up. Profanity here means use of any
language, act or anything that can hurt the sentiments of the students.
To fulfil this objective, our platform uses Artificial Intelligence (AI). Profanity Detection
Task is divided into two sub-tasks profanity detection on text and profanity detection on
videos.
FIGURE 3.1
Profanity Detection
Text Videos
3
EduColl
PROFANITY DETECTION ON TEXT
This task was taken up to detect the use of cuss words or any such words, in the questions,
answers and comments put by students, which could hurt the feelings of other students or
have negative impact on them so that appropriate action can be taken. This appropriate action
is in the form of not showing the word or the sentence of any such nature on the screen and
warning the student for such an act. If the student gives no consideration to the warning and
continues putting in such words, then reducing his coins and finally banning him from using
the account for a certain time. Now, this is where toxic text classification comes to the rescue.
FIGURE 3.2
Process
will not be displayed on the screen
students usingcomment
Non-toxic such words would be warned
Toxicand if the action persists they can be banned from using their account
comment
 LANGUAGE USED
Python 3- Python is will

a general purpose
be displayed and high-level programming language that can be
on he screen
used for developing desktop GUI applications, websites and web applications. It is open
source, which means it is free to use, even for commercial applications and it is considered a
scripting language. For our study, Python 3 has been used.
4
EduColl
 TECHNIQUE USED
NLP: NATURAL LANGUAGE PROCESSING
Since the objective of profanity detection on text can be fulfilled only when the machine is
able to derive meaning underlying the words as well as sentiments of the students writing
such words or sentences hence the technique of NLP has been used.
Natural Language Processing or NLP is a field of Artificial Intelligence that gives the
machines the ability to read, understand and derive meaning from human languages. By
utilizing NLP, developers can organize and structure knowledge to perform tasks such as
automatic summarization, translation, named entity recognition, relationship
extraction, sentiment analysis, speech recognition, and topic segmentation.
Data generated from conversations, declarations or even tweets are examples of unstructured
data. Unstructured data doesn’t fit neatly into the traditional row and column structure of
relational databases, and represent the vast majority of data available in the actual world. It is
messy and hard to manipulate. Nowadays it is no longer about trying to interpret a text or
speech based on its keywords (the old-fashioned mechanical way), but about understanding
the meaning behind those words (the cognitive way). This way it is possible to detect figures
of speech like irony, or even perform sentiment analysis. Lopez Diego (2019).
The various use cases of NLP are: recognition and prediction of disease, sentiment analysis,
financial trading, identifying fake news, talent recruitment, etc.
In the present task NLP has been used to distinguish between positive and negative words
and sentences as well as the context in which they have been used. It also helps in finding out
the sentiments underlying those words and sentences.
TEXT CLASSIFICATION
Text classification (also known as text categorization or text tagging) is one of the
fundamental tasks in Natural Language Processing (NLP) with broad applications such as
sentiment analysis, topic labelling, spam detection, and intent detection. It is the process of
assigning tags or categories to text according to its content. Text classifiers can be used to
organize, structure, and categorize pretty much anything.
Toxic comment classification is basically a method of classifying a comment into toxic or

non- toxic depending upon its inherent features.
4
EduColl
 LIBRARIES USED
FIGURE 3.3
LIBRARIES USED
NumPy: NumPy is a module for Python. The name is an acronym for "Numeric Python" or
"Numerical Python". It has been used for scientific computing and for performing different
operations. NumPy enriches the programming language Python with powerful data structures,
implementing multi-dimensional arrays and matrices.
Pandas: “Python Data Analysis Library ” is a fast, powerful, flexible and easy to use open
source data analysis and manipulation tool, built on top of the Python programming language.
It is has been used for machine learning in form of data-frames and for importing data of
various file formats such as csv, excel etc.
Pickle: Python pickle module is used for serializing and de-serializing a Python object
structure. This process is also called marshalling or flattening. Objects are pickled so that
they can be saved on disk, and loaded in a program again later on.
Sklearn: The sklearn library contains a lot of efficient tools for machine learning and
statistical modelling including classification, regression, clustering and dimensionality
reduction. Therefore, sklearn has been used to build machine learning models.
NLTK: The Natural Language Toolkit (NLTK) is one of the most powerful NLP libraries
which contains packages to make machines understand human language and reply to it with
an appropriate response. This is one of the most usable and mother of all NLP libraries. It
contains text processing libraries and hence is has been used to perform tokenization, parsing,
classification, stemming, tagging and semantic reasoning. NLTK was preferred over spacy
and textblob which are other NLP libraries, since it showed better results.
4
EduColl
This task was catered to in five phases: data collection, data pre-processing, feature extraction,
modelling and evaluation.
FIGURE 3.4
PHASES
Data Data Pre- processingFeature

Modelling Evaluation
ollecti
Con Extractio
n
1. DATA COLLECTION:
Data collection is the first phase of the profanity task. Here, data is in the form of comments
and has been collected from various sources like Google, Kaggle competitions, etc. It has
been collected manually and also through web-scaping. It is present in csv (comma separated
value) format. The total comments being 1,58,000 in number. It consists of both toxic and
non-toxic comments where, at the beginning toxic comments were 16,000 and non-toxic were
1,42,000 in number. And later on, changed due to data imbalance. The toxic comments could
be placed under any one or more of the six labels: hate, insult, obscene, threat, severe toxic
and toxic.
FIGURE 3.5
DEPICTION OF LABELS AND COUNT OF COMMENTS

LABE
Source: Kaggle
4
EduColl
We wanted to focus more on detecting toxicity rather than identifying the sub-class of
toxicity. Therefore, we converted our problem into binary classification by merging all the
comments which had at least 1 of the 6 tags into toxic class and comments which had no tags
were added to the non-toxic class. After this non-toxic class had almost 1.5 lakh comments
and the toxic class had 16 thousand comments.
So, basically it was a multi-classification problem which was converted into binary
classification problem by merging all the 6 labels into 1, in an attempt to reduce data
imbalance. Therefore, the two classes that emerge are toxic and non-toxic.
FIGURE 3.6
DEPICTION OF DATA
Source: Kaggle
2. DATA PRE-PROCESSING:
In this phase the data is cleaned so that the actual target is easily achieved. Data pre-
processing is a data mining technique which is used to transform the raw data in a useful and
efficient format. Raw data (real world data) is always incomplete and that data cannot be sent
through a model. That would cause certain errors. That is why we need to pre-process data
before sending through a model.
4
EduColl
In any Machine Learning process, Data Pre-processing is that step in which the data gets
transformed, or Encoded, to bring it to such a state that now the machine can easily parse it.
In other words, the features of the data can now be easily interpreted by the algorithm.
Data goes through a series of steps during pre-processing:
1. Data Cleaning: Data is cleansed through processes such as filling in missing values,
smoothing the noisy data, or resolving the inconsistencies in the data.
2. Data Integration: Data with different representations are put together and conflicts
within the data are resolved.
3. Data Transformation: Data is normalized, aggregated and generalized.
4. Data Reduction: This step aims to present a reduced representation of the data in a
data warehouse.
5. Data Discretization: Involves the reduction of a number of values of a continuous

attribute by dividing the range of attribute intervals.
Data pre-processing varies according to the data at our disposal and need of the study. And
so, for pre-processing of the comments data that we had, we only had to perform the
following:
 Tokenization: Tokenization is the process of breaking up a sequence of strings into

pieces such as words, keywords, phrases, symbols and other elements called tokens
and at the same time throwing away certain characters, such as punctuation.
Input = "Hello everyone. Welcome to Earth."
Output= ['Hello', 'everyone', '.', 'Welcome', 'to', 'Earth', '.']
 Stop words removal: A stop word is a commonly used word that a search engine has
been programmed to ignore Hence, they are filtered out before or after processing of
natural language data. “The”, “a”, “an”, “in”, etc are examples of stop words. They
are removed or excluded from the given text so that more focus can be given to those
words which define the meaning of the text.
 Lemmatization: It is the process of grouping together the different inflected forms of a
word so they can be analysed as a single item. Stemming and lemmatization are the
two processes used in the natural processing language. Stemming helps in
the reduction of the words to a compact form called word stem. Lemmatization is the
same process which includes doing things properly with the same vocabulary.
4
EduColl
3. FEATURE EXTRACTION:
When the input data to an algorithm is too large to be processed and it is suspected to be
redundant, then it can be transformed into a reduced set of features (also named a feature
vector). Feature Extraction is a process of dimensionality reduction by which an initial set of
raw data is reduced to more manageable groups for processing. These new reduced set of
features should then be able to summarize most of the information contained in the original
set of features. For the purpose of my study I have used the TF-IDF technique of feature
extraction.
TF-IDF- It stands for “Term Frequency — Inverse Document Frequency”. This is a

technique to quantify a word in documents by giving a weight to each word which signifies
the importance of the word in the document and corpus. In other words, it is a very common
algorithm to transform text into a meaningful representation of numbers. It is used because
the computer can understand any data only in the form of numerical value. Therefore, we
vectorize all of the text so that the computer can understand the text better. This method is a
widely used technique in Information Retrieval and Text Mining and is widely used to extract
features across various NLP applications.
TF-IDF = Term Frequency (TF) * Inverse Document Frequency (IDF)
TF measures the frequency of a word in a document.
DF measures the importance of document in whole set of corpus, this is very similar to TF.
The only difference is that TF is frequency counter for a term t in document d, whereas DF is
the count of occurrences of term t in the document set N. In other words, DF is the number of
documents in which the word is present
4. MODELING:
Data modeling is an essential part of the data science pipeline. It is the one that often receives
the most attention among data science learners. A big part of data science modeling involves
evaluating a model, that is, making sure that it is robust and therefore reliable. Also, it is
closely linked to creating an information rich feature set. Moreover, it entails a variety of
other processes that ensure that the data at hand is harnessed as much as possible.
Classification is a data mining technique that assigns categories to a collection of data in

order to aid in more accurate predictions and analysis. It is, in some ways, the simplest of the
4
EduColl
several
4
EduColl
types of predictive analytics models. It puts data in categories based on what it learns from
historical data. Also, sometimes called a Decision Tree, classification is one of several
methods intended to make the analysis of very large datasets effective. Two major
Classification techniques that stand out are Logistic Regression and Discriminant Analysis
Logistic Regression is the appropriate regression analysis to be conducted when the

dependent variable is binary. It is a predictive analysis, like all regression analyses. It is used
to describe data and to explain the relationship between one dependent binary variable and
one or more nominal, ordinal, interval or ratio-level independent variables.
Logistic regression sounds similar to linear regression but is actually focused on problems
involving categorization instead of quantitative forecasting. In other words, the goal of
logistic regression is to categorize whether an instance of an input variable either fits within a
category or not. The output of logistic regression is a value between 0 and 1. Hence, the
output variable values are discrete and finite rather than continuous and with infinite values
as in case of linear regression.
Results closer to 1 indicate that the input variable more clearly fits within the category.
Results closer to 0 indicate that the input variable likely does not fit within the category.
Therefore, Logistic regression is often used to answer clearly defined yes or no questions. For
Example, will a customer buy again. To meet our objective, we have used Logistic
Regression to categorize whether a comment fits within toxic and non-toxic comments
category.
5. EVALUATION:
When we get the data, after data cleaning, pre-processing and wrangling, we feed it to an
outstanding model and get output in probabilities. Then we measure the effectiveness of our
model because better the effectiveness, better the performance and that’s exactly what we
want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a
performance measurement for machine learning classification.
A confusion matrix is a table that is often used to describe the performance or accuracy of a
classification model (or “classifier”) on a set of test data for which the true values are known.
It is also known as an error matrix.
4
EduColl
It allows the visualization of the performance of an algorithm and easy identification of

confusion between classes e.g. one class is commonly mis-labelled as the other.
TABLE 3.1
CONFUSION MATRIX
Class 1 Predicted Class 2 Predicted
Class 1 Actual TP FN (Type 2 error)

Class 2 Actual FP (Type 1 error) TN
Here, Class 1: Positive and Class 2: Negative
TABLE 3.2
DEFINITION OF TERMS
Positive (P) Observation is positive (for example: is an app)
Negative (N) Observation is not positive (for example: is not an app)
True Positive (TP) Observation is positive, and is predicted to be positive
False Negative (FN) Observation is positive, but is predicted negative
True Negative (TN) Observation is negative, and is predicted to be negative
False Positive (FP) Observation is negative, but is predicted positive
Classification Rate or Accuracy: Classification Rate or Accuracy is given by the relation:
However, there are problems with accuracy. It assumes equal costs for both kinds of errors. A
99% accuracy can be excellent, good, mediocre, poor or terrible depending upon the problem.
Recall: Recall can be defined as the ratio of the total number of correctly classified positive
examples divide to the total number of positive examples. In other words, out of all the
positive classes, how much we predicted correctly. It should be high as possible.
High Recall indicates the class is correctly recognized (a small number of FN).
4
EduColl
Precision: To get the value of precision we divide the total number of correctly classified
positive examples by the total number of predicted positive examples. In other words, out of
all the positive classes we have predicted correctly, how many are actually positive.
High Precision indicates an example labelled as positive is indeed positive (a small number of
FP).
High recall, low precision: This means that most of the positive examples are correctly
recognized (low FN) but there are a lot of false positives.
Low recall, high precision: This shows that we miss a lot of positive examples (high FN) but
those we predict as positive are indeed positive (low FP)
F-measure: Since we have two measures (Precision and Recall) it helps to have a
measurement that represents both of them. We calculate an F-measure using Harmonic Mean
in place of Arithmetic Mean as it punishes the extreme values
more. The F-Measure will always be nearer to the smaller value of Precision or Recall.
PROFANITY DETECTION ON VIDEOS
To give an edge to our platform, profanity detection task is not performed only on text but
also on the video content. Since the answers on EduCollab are mostly in the form of screen
recording there is a high chance of use of inappropriate and profane content on the platform
hence the profanity detection task becomes all the more important. The task is a big one and
requires a lot of research and study, since a lot of work has not already been done in this field.
To work on this, this task was subdivided into various sub-tasks. The first sub-task being
profanity detection on images. During the duration of my internship, we were able to work
only on this sub-task.
5
EduColl
For profanity detection on images, various image layers were framed to be tackled one by one.
FIGURE 3.7
Nudity
Obscene and offensive hand gestures and body language
Wine, beer, liquor, spirits, and other alcoholic beverages
Offensive symbols like Nazi swastikas and ISIS flags
Weapons like guns, rifles, knives, swords, missiles, and

more
Sweary Abbreviations like OMFG, WTF, STFU, etc
Personal information like Email, Phone & URL filter

Image
Explicit images of surgery, diseases, or body parts such as graphic photographs of open w
Text/ Embedded text that has been artificially added to

images (such as %&$#@)
Explicit Patterns
Racism/Meme
Blurriness
Drugs
 LANGUAGE
USED PYTHON 3
 TECHNIQUE USED
CNN- Convolutional Neural Networks: Convolutional Neural Networks

(ConvNets or CNNs) are a type of Neural Networks that are used in areas such as image
recognition and classification. ConvNets have been successful in identifying faces, object,
traffic signs, etc. It has one or more convolutional layers and are used mainly for image
processing, classification, segmentation and also for other auto correlated data.
5
EduColl
There are four main operations in the ConvNet:
1. Convolution
2. Non-Linearity (ReLU)
3. Pooling or Sub Sampling
4. Classification (Fully Connected Layer)
 LIBRARIES USED
FIGURE 3.8
LIBRARIES USED
Os: It is a library in python that provides functions for interacting with the operating system. It
comes under Python’s standard utility modules. This module provides a portable way of
using operating system dependent functionality. In other words, it allows you to interface
with the underlying operating system that Python is running on – be that Windows, Mac or
Linux.
Matplotlib: Matplotlib is a plotting library used for creating static, animated, and interactive
visualizations in Python. It is an amazing visualization library in Python for 2D plots of
arrays. One of the greatest benefits of visualization is that it allows us visual access to huge
amounts of data in easily digestible visuals. Matplotlib consists of several plots like line, bar,
scatter, histogram etc.
Cv2: It is a highly optimized library with focus on real-time applications. It is designed to

solve computer vision problems. It is basically an image and video processing library.
5
EduColl
Keras: It is an open-source neural-network library written in Python. It is high-level in nature

– which makes it extremely simple and intuitive to use. It works as a wrapper to low-level
libraries like TensorFlow or Theano. Keras doesn't handle low-level computation. Instead, it
uses another library to do it, called the "Backend”. It was developed to make implementing
deep learning models as fast and easy as possible for research and development.
Pandas: Please refer to page 39.
Sklearn: Please refer to page 39.
Like the text task, this was performed in three phases: data collection, data pre-
processing and modelling.
FIGURE 3.9
Data Data Pre-

PHASES Modelling
Collection processing
1. DATA COLLECTION: Data was collected in the form of labelled images as per the
image layer selected. A sample of 450 images of each category were taken up in case of each
layer. The dataset consisted of safe for work images as well as not safe for work images. The
images were collected manually from google.
DATA PRE-PROCESSING: For data pre-processing of images data, image re-sizing was
done to convert them into suitable and identical sizes.
MODELING: Resnet50 Model of CNN has been used for this task. Resnet, short for
Residual Networks is a classic neural network used as a backbone for many computer vision
tasks.
ResNet-50 is a deep residual network. The “50” refers to the number of layers it has. It’s a
subclass of convolutional neural networks, with ResNet most popularly used for image
classification.
5
EduColl
OBJECTIVE 2: TO CREATE A RECOMMENDER SYSTEM THAT

PROVIDES THE STUDENTS WITH THE QUESTIONS AND
ANSWERS OF THEIR PREFERENCE
Analytics and Artificial Intelligence (AI) are being integrated with digital channels to provide
instant recommendations on products that will best suit the customer. Many organizations
have already taken the lead and set awe-inspiring benchmarks, like recommendations
provided by Netflix and Amazon based on the customer’s viewing and purchase/browsing
history.
Therefore, this objective of creating a recommender system was taken up so as to provide the
students with the learning material, questions and answers that best suit their needs. And also,
to direct the questions asked, to those students most likely to give the best and appropriate
answers. Recognising the fact that retention (defined here as a visit after the first visit) is a
huge issue for apps, there is a need to make an impact on the first visit itself. But unlike, other
applications, no prior information is available/gathered about the user. All the
recommendations are based on the actions of the user while using the app.
There are three types of Recommender Systems:
1. Content based Filtering –in this the user consumption is tracked and items similar to
those consumed in the past are commended.
2. Collaborative Filtering – in this the user consumption all users are tracked and the items
consumed by users who have a similar consumption pattern to the user of interest are
recommended. There is no need for item descriptions in this case.
3. Hybrid Approach- it combines both approaches to address shortcomings.
5
EduColl
To meet our objective, we used the hybrid model.
Figure 3.10
WORKING OF RECOMMENDER SYSTEM
Recommender system
Scree Item set
n
interfa
user
User’s item
consumption history
BUILD STRATEGY
How would the recommender bot make recommendations to;
● Reduce bias in data collection (Example Bias: topics that get served often and ranked
higher, have a higher likelihood of being consumed (obtaining a clickthrough))
● Learn as much as possible about the users on their first visit
● Maximise coverage of the topic corpus
● Exploit the current user profile but still explore new user interests
To fulfil this objective, the task was divided into two parts called Bots.
BOT 1: This bot selects topics to serve a user. Inputs to the bot is the corpus of topics and a
user profile if available. It provides the new user with randomly selected topics such that
(her)his screen shows different topics and then his actions are gauged.
BOT 2: Once the user starts consuming topics, (s)he leaves behind a clickstream. This bot
extracts user interests from such data that can then be used for further personalisation for
(her)his news feed.
5
EduColl
 LANGUAGE
USED PYTHON 3
 TECHNIQUE
USED Topic Modelling:
Topic modeling is an unsupervised machine learning technique that’s capable of scanning a

set of documents, detecting word and phrase patterns within them, and automatically
clustering word groups and similar expressions that best characterize a set of documents.
They are a type of statistical language models used for uncovering hidden structure in a
collection of texts. It can be thought of as a task of:
Dimensionality Reduction
Unsupervised Learning
Tagging
Topic Models are very useful for the purpose for document clustering, organizing large
blocks of textual data, information retrieval from unstructured text and feature selection. For
Example
– New York Times are using topic models to boost their user – article recommendation engines.
There are several existing algorithms that can be used to perform the topic modelling. The
most common of it are, Latent Semantic Analysis (LSA/LSI), Probabilistic Latent
Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA)
 LIBRARIES USED
FIGURE 3.11
LIBRARIES USED
5
EduColl
Seaborn: It is a library for making statistical graphics in Python. Built on top of matplotlib
and closely integrated with pandas data structures, it aims to make visualization a central part
of exploring and understanding data. Seaborn provides a high-level interface for drawing
attractive and informative statistical graphics.
String: It’s a built-in module and we have to import it before using any of the constants and
classes. The string module contains a number of useful constants and classes, as well as some
deprecated legacy functions that are also available as methods on strings.
Gensim: it is an open source Python library for topic modelling, document

indexing and similarity retrieval with large corpora. Target audience is the natural language
processing (NLP) and information retrieval (IR) community. Gensim is designed to handle
large text collections using data streaming and incremental online algorithms, which
differentiates it from most other machine learning software packages that target only in-
memory processing.
NLTK, Pandas- please refer to page no. 39
Matplotlib – please refer to page no. 48
The three phases are explained as under:
FIGURE 3.12
Data Data Pre-

PHASES Modelling
Collection processing
DATA COLLECTION: the data required for this task was in the form of questions and
answers. Therefore, a corpus of questions and answers was created to train and test the
model. The data was created using data from google and also by manually adding them. It
took around 2 weeks to collect the data.
5
EduColl
DATA PRE-PROCESSING: Data pre-processing is absolutely crucial for generating a

useful topic model: as the saying goes, “garbage in, garbage out.” For the purpose of data
cleaning of this particular dataset we have used:
 Tokenization: converting a document to its atomic elements.
 Stop-word removal: removing meaningless words.
 Lemmatization: merging words that are equivalent in meaning
 Punctuation removal.
 Lowercasing: converting text into lowercase.
MODELLING: LDA (Latent Dirichlet Allocation) model has been used in this particular
task to perform the topic modelling.
LDA is a generative probabilistic model that assumes each topic is a mixture over an
underlying set of words, and each document is a mixture of over a set of topic probabilities.
In other words, it is a topic model that generates topics based on word frequency from a set of
documents. LDA is particularly useful for finding reasonably accurate mixtures of topics
within a given document set.
PARAMETERS OF LDA
Alpha parameter is Dirichlet prior concentration parameter that represents document-topic

density — with a higher alpha, documents are assumed to be made up of more topics and
result in more specific topic distribution per document.
Beta parameter is the same prior concentration parameter that represents topic-word density
— with high beta, topics are assumed to made of up most of the words and result in a more
specific word distribution per topic.
5
EduColl
OBJECTIVE 3: TO UNDERSTAND THE PERCEPTION OF THE

STUDENTS TOWARDS EDUCOLLAB AFTER ITS BETA
RELEASE
The beta launch of the platform took place on April 27, 2020 amid the pandemic situation.
This is the time when the schools are shut all over the world and the learning is taking place
digitally. Hence, the platform needed to be perfect to fulfil all the requirements of the
students. And the only way we could do this was by analysing the perception of the beta users
and making appropriate modifications.
Taking all these points into consideration, a study was conducted to understand the
perception of the students. To achieve this objective, the research technique and approach
used are mentioned below:
DATABASE: The present objective is primarily based on the primary data collected from
103 respondents (Sabudh Students and associates). The respondents were interviewed
through a non-disguised structured questionnaire in English language.
UNIVERSE OF STUDY: The universe of study is all the school students of age group 12
and above.
SAMPLE AND SAMPLING DESIGN: The present study focuses on the Sabudh students.
Survey method was used for the collection of data. It was conducted during the period of
April to May, 2020. Convenience sampling technique has been used. The questionnaire was
prepared and shared online with the respondents consisting of 42 questions. Out of 120
questionnaires distributed, 103 usable responses were used for the analysis purpose.
TABLE 3.3
SAMPLE SELECTION OF STUDENTS
Category Students contacted for Actual complete responses

achieving the sample received and used
Sabudh Students 50 40
Non- sabudh Students 70 63
Total 120 103
5
EduColl
DATA COLLECTION: Both primary data and secondary data was collected for the
completion of this objective. The respondents were shared the questionnaire through the
social media platforms like Whatsapp, Instagram, etc.
RESEARCH INSTRUMENT (Questionnaire): For the purpose of data collection, research

instrument i.e. questionnaire was used. The questionnaire was prepared with the help of
insights from various researches done in the past and as described in Chapter 2.
First version of questionnaire was pre-tested on 20 students. There were slight modifications
in the content of the questionnaire which were later modified before the actual data collection
process took place. The responses collected in the pretesting of questionnaire were not
considered for the analysis of the data. The description of the questionnaire is given in the
following paragraphs:
PART 1
This part of the questionnaire consisted of personal information about the respondents
regarding their name, gender, mobile model and make, android version, connectivity issues
faced by them, their proficiency in using the e-learning platforms, etc.
PART 2
This part of the questionnaire consisted of questions to gauge the perception of the students
about EduCollab. It consisted of questions like whether EduCollab is user friendly,
interactive, attractive, useful to the students, etc. The responses were based on 5-point Likert-
scale, where 1 = strongly agree, 2 = agree, 3 = neutral, 4 = disagree, 5 = strongly disagree.
METHODOLOGY OR STATISTICAL TECHNIQES USED
After collecting the data using questionnaires, it was analysed by transferring it to SPSS
software. The first step in the analysis involved ensuring that all questions were answered and
rectifying any missing or wrong entries. This was done to ensure accuracy of the data and to
make it more appropriate for analysis.
Data collected from the survey was analysed using different statistical, mathematical and data
interpretation techniques. The analysis techniques used in this study are:
6
EduColl
 Descriptive statistics: Descriptive statistics are quantitative analysis of data which

are used to present and summarize data in a short, simple and informative way. These
are used to describe the basic features of the data in the research. It involves measures
of central tendency (i.e. mean, mode, and median) and measures of variability (i.e.
mean deviation, percentile, quartiles, range, standard deviation and variance) of the
data. In this study, descriptive statistics are used to analyse data collected from
questions related to demographics of respondents. It involves analysis in the form of
number, frequency and percentage distribution of different categories of the
demographic variables and other nominal variables. The same frequency distribution
is depicted in simple graphics
i.e. pie charts and bar graphs, for virtual analysis of data.
 Reliability analysis: Reliability analysis is used to measure the validity and reliability
of data to obtain high-quality research result. Reliability means that a measure (or in
this case questionnaire) should consistently reflect the construct that it is measuring.
In statistical terms, the usual way to look at reliability is based on the idea that
individual items (or set of items) should produce results consistent with overall
questionnaire. In this study, Cronbach’s Alpha, a reliability analysis test, is
conducted within SPSS Software to measure the internal consistency of items in the
questionnaire. This test is most commonly used when the questionnaire is developed
using multiple Likert-scale statements. It is used to determine whether the scale is
reliable or not. Cronbach’s Alpha reliability coefficient normally ranges between 0
and 1. The closer coefficient value of Cronbach’s alpha is to 1 means greater the
internal consistency of the items in the scale. According to general decision of
reliability test; if the value of Cronbach’s Alpha is greater than 0.7 then the
questionnaire items are dictated reliable. If the value of Cronbach’s Alpha is less than
0.7 then questionnaire items are considered as unreliable.
George and Mallery (2003) provide the following rules of thumb to analyse reliability of
questionnaire with the help of Cronbach’s Alpha reliability coefficient value:
o Cronbach’s Alpha > 0.9 – Excellent
o Cronbach’s Alpha > 0.8 – Good
o Cronbach’s Alpha > 0.7 – Acceptable
o Cronbach’s Alpha > 0.6 – Questionable

6
EduColl
o Cronbach’s Alpha > 0.5 – Poor
6
EduColl
o Cronbach’s Alpha < 0.5 – Unacceptable
According to this rule, if value of Cronbach’s Alpha is less than 0.5 then the data is unreliable
& unacceptable for further analysis, if the value lies in 0.5 – 0.7 then the analyst can take
decision to accept the data or to modify it for making it more reliable. If the value is greater
than 0.7 then the data is acceptable. The value of Cronbach’s Alpha greater than 0.8 indicates
the data is good and Cronbach’s Alpha greater than 0.9 indicates the data is excellent for
further analysis.
 Factor Analysis: In this research factor analysis is used to analyse 21 statements related to
students’ perception towards EduCollab.
Factor Analysis: it is a technique that is used to reduce a large number of variables into
fewer numbers of factors. It extracts maximum common variance from all variables and puts
them into a common score. As an index of all variables, this score can be used for further
analysis. In other words, Factor analysis is a way to take a mass of data and shrinking it to a
smaller data set that is more manageable and more understandable. It’s a way to find hidden
patterns, show how those patterns overlap and show what characteristics are seen in multiple
patterns. It groups variables with similar characteristics together. It is used to allocate
correlated statements into single factor. The reduced factors can also be used for further
analysis.
Assumptions of Factor Analysis:
 No missing value or unengaged response.
 Metric data: Factor analysis is conducted on metric scale. The term metric scale
summarizes interval scales, ratio scales and absolute scales. In other words, each
question is a statement followed by a Likert scale.
 Sample size: The sample size should be adequate to run the factor analysis on the
data. The reliability of factor analysis is dependent on sample size because correlation
coefficients fluctuate much more so in small samples than in large. Much has been
written about the necessary sample size for factor analysis resulting in many ‘rules of
thumb’. The common rule is to suggest that a researcher has at least 10–15
participants per variable.
6
EduColl
In SPSS, The KMO (Kaiser-Meyer-Olkin) measures the sampling adequacy. The KMO
can be calculated for individual and multiple variables and represents the ratio of the
squared correlation between variables to the squared partial correlation between variables.
The KMO statistic varies between 0 and 1. There is universal agreement that factor
analysis is inappropriate when value of KMO is below 0.50. Kaisen (1974) recommend
accepting values greater than 0.5 as barely acceptable (values below this should lead to
either collect more data or review other variables to include). Furthermore, KMO statistic
values between
0.5 and 0.7 are mediocre, values between 0.7 and 0.8 are good, values between 0.8 and
0.9 are great and values above 0.9 are superb (Hutcheson & Sofroniou, 1999).
 Correlations between variables: SPSS will always find a factor solution to a set of
variables. However, the solution is unlikely to have any real meaning if the variables
analysed are not sensible. So before conducting a factor analysis, it is important to
check the inter-correlation between variables.
In Factor Analysis, Bartlett’s test is used to check that the original variables are suﬃciently
correlated. This tests the null hypothesis that there is no correlation between variables or the
correlation matrix is an identity matrix. An identity matrix is matrix in which all of the
diagonal elements are 1 and all off diagonal elements are 0. This test should come out
signiﬁcant by rejecting this null hypothesis or if p < 0.05 means that there is a correlation
between variables and the correlation matrix is not an identity matrix. If significant value is
not less than 0.05 than factor analysis will not be appropriate.
6
EduColl
CHAPTER IV
ANALYSIS AND RESULT REPORTING
6
EduColl
CHAPTER 4
ANALYSIS AND RESULT

REPORTING
OBJECTIVE 1:
PROFANITY DETECTION ON TEXT
FIGURE 4.1
Data has been collected in the form of comments (1.58 lakh comments). For the purpose
of our study, these comments can be classified into two categories: toxic and non-toxic.
Data has: 16,000 Toxic comments and 1.48 lakh Non-toxic comments.
Data imbalance has been ignored in the beginning
If a comment has any one or more of If a comment doesn’t have any one of
the six tags: hate, insult, obscene, the six tags: hate, insult, obscene,
severe-toxic, threat and toxic, it is severe-toxic, threat and toxic, it is
categorised as toxic categorised as non-toxic
After importing the necessary libraries and reading the

data, we split the data into Test and Train
Train=20%
Test=80%
Data is pre-processed, so that the actual

target is easily achieved. We have
performed stop word removal, tokenization
and lemmatization for this purpose.
6
EduColl
For modelling:
Using TF-IDF, we converted text into vector.
Then Logistic Regression (which is Sklearn’s
inbuilt model) is used.
Then, the predict function is defined. Using predict function, we send vectors
one by one and it tells whether it belongs to toxic or non-toxic category.
To check the effectiveness of the model, we built a confusion matrix.

The f1 score came out to be:
From the above confusion matrix, we can clearly see that a lot of Toxic comments
are being classified as Non-Toxic. One of the reasons for this could be that our
dataset has more than 1.4 lakh samples of Non-toxic Class and only 16K samples of
Toxic Class.
Therefore, we tried to improve our results by decreasing the samples of Non-Toxic
Class from 1.42 lakh to 20,000. These are selected randomly and the model is run
again. The result came out to be:
6
EduColl
As expected, our predictions have improved.
Then we further decreased the sample size of Non-Toxic Class to 17,000 random comments. And ran
Our score has improved by 1% but chances of our model classifying a non-toxic comment as tox
6
EduColl
CALCULATIONS
In this study, toxic comments are taken as positive whereas the non-toxic comments are taken
as negative.
FIGURE 4.2
Confusion Matrix 1: Confusion Matrix 2: Confusion Matrix 3:
TABLE 4.1
TN = 28558 3838 3179

FN = 150 181 211
FP = 1191 516 458
TP = 2016 2710 2797
Recall = 2016 = 2016 2710 =2710 2797 = 2797
2016+150 2166 2710+181 2891 2797+211 3008
= 0.93 = 0.94 = 0.93
Precision = 2016 = 2016 2710 = 2710 2797 = 2797

1191+2016 3207 516+2710 3226 458+2797 3255
= 0.63 = 0.84 = 0.86
F1 score = 2*0.93*0.63 = 2*0.94*0.84 = 2*0.93*0.86

0.93+0.63 0.94+0.84 0.93+0.86
= 1.17 = 1.58 = 1.60
1.56 1.78 1.79
= 0.75 = 0.88 = 0.89
6
EduColl
RESULT: OBJECTIVE
1
FIGURE 4.3
We can clearly see that whenever a comment is put into the system, predict function
tells us whether the comment is toxic or not.
FUTURE SCOPE OF STUDY

Although, the model performed better than Naïve Bayes model but there are some others
advanced models that are yet to be explored, including: LSVM, NB-SVM and LSTM.
DEPLOYMENT
RESPONSE
7
EduColl
PROFANITY DETECTION ON VIDEOS
For Building Image Profanity Detection, we used Deep Convolutional Neural Networks. For
this purpose, we provided labelled Profane Images to our CNN Model and Trained our
Model. After Training we got a Test Accuracy of 84 percent. The steps are as followed:
FIGURE 4.4
ed Images from classes such as Nude Images, Weapon Images, Images of Alcohol and other alcoholic beverages, etc and Im
Libraries were imported and data were read into the system.
Collected Pre-trained CNN Model weights, to be used for Transfer Learning technique.
Removed the last layer of the Pre-Trained Model (resnet50) that was
being used.
Input Training Images to the selected model
Evaluated Model using confusion matrix (i.e. Accuracy,

Precision, Recall). The test accuracy came out to be 82.4%.
RESULT: Whenever an image is put into the system, it will identify whether the
image is safe for work or not.
FUTURE SCOPE OF STUDY: Other sub-tasks under profanity detection on videos like
profanity detection on speech, etc shall be taken up.
7
EduColl
OBJECTIVE 2
FIGURE 4.5
After importing the required libraries,

the data was loaded into the system.
Since the goal of this analysis was to perform

topic modeling, we solely focused on the text
data from each paper, and therefore, droped
other metadata columns. Then
Tokenization,stop-word and punctuation
removal, lemmatization and lower casing was
performed. Regular expression was used to
remove any punctuation, and to lowercase the
text
Next, Exploratory analysis was performed to

verify whether the preprocessing happened
correctly. For this, a word cloud using
the wordcloud package was made, to get a
visual representation of most common
words. Word cloud is key to understanding
the data and ensuring that we are on the right
track, and if any more preprocessing is
necessary before training the model.
For preparing data for LDA analysis we,

transformed the textual data in a format that
will serve as an input for training LDA model.
This was done by converting text to vector
form using bag of words.
After training the model, the results were

analysed to visualize the topics for
interpretability. The accuracy came out to be
91%.
RESULT: 10 Random topics were being suggested to the new user. Hence, our BOT 1
was ready.
7
EduColl
FIGURE 4.6
EXAMPLE OF WORD CLOUD
Source: https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-
dirichlet-allocation-lda-35ce4ed6b3e0
7
EduColl
OBJECTIVE 3
GENERAL ANALYSIS OF DEMOGRAPHIC STATUS: In order to know about the
perception of students towards EduCollab, a sample of 103 respondents was taken. The
respondents are segmented by gender, model and make of their phones, proficiency in using
e- learning platforms, etc. This section involves the analysis of all these demographic
segments in the form of tables and figures.
 GENDER
TABLE 4.2
Gender Number of respondents Percentage (%)
Female 57 55
Male 46 45
Total 103 100
Interpretation: Table shows that out of 103 respondents, 57 are females and 46 are males.
FIGURE 4.7
GENDER
Male
45%
Female
55%
Male Female
Interpretation: Figure represents that 55% are females and 45% respondents are males.
7
EduColl
 MODEL AND MAKE OF MOBILE
TABLE 4.3
Model and make Number of responses
Samsung 14
Samsung Galaxy s9 11
Samsung M 20 9
Samsung J6 5
Samsung J6 Plus 2
Samsung A20 2
Samsung galaxy M31 1
Xiaomi Mi A1 12
Nokia 6.1+ 3
Redmi note 4 5
Redmi note 8 5
Redmi 6A , M1804C3CI 1
Real me xt 2
Real me 2 1
Mi A3 2
Google pixel 2 3
7
EduColl
Vivo v5 6
Vivo v9 1
Lenovo 8K plus 1
Iphone 14
Iphone xs 2
Iphone 6 plus 1
FIGURE 4.8
Series 1
16
14
12
10
8
6
4
2
0
Series 1
Interpretation: it can clearly be seen that the respondents have a large variety of model and
make of mobiles. Samsung constituting the major portion followed by iphones, Xiaomi,
Redmi, Vivo, etc
7
EduColl
 DO YOU FACE INTERNET CONNECTIVITY ISSUES?
TABLE 4.4 FIGURE 4.9
Number
Interpretation: of and the
the table Percentage
figure
Yes 31%
respondents
show that only 32 respondents (%)
out of 103
Yes
face 32 31
No 71 69
Total 103 100
No 69%
Yes No
internet connectivity issues. These 32 respondents constitute 31% of the total. While the
majority of the respondents constituting 69% that represents 71 respondents, do not face
internet connectivity issues. Probably due to the fact that India is becoming technologically
equipped.
 HAVE YOU USED ANY E-LEARNING PLATFORM BEFORE EDUCOLLAB?
Interpretation: the table

Number of andPercentage
the figure No
24%
above show that out of 103 respondents,
respondents (%)
Yes 78 76
No 25 24
Total 103 100
Yes
76%
Yes No
7
EduColl
majority of respondents i.e. 78 that constitute 76% have used E-learning platforms before
Educollab on the other hand only 25 respondents that constitute 24% have not used any E-
learning platform before EduCollab. This indicates that many people have already resorted to
E-learning.
7
EduColl
 DO YOU THINK YOU ARE FULLY PROFICIENT IN USING THE E-

LEARNING PLATFORMS?
Number of Percentage No
18%
respondents (%)
Yes 84 82
No 19 18
Total 103 100
Yes
82%
Interpretation: the table and the figure
Yes No
show that out of 103, majority of the
respondents i.e. 84 respondents think that they are proficient in using the E-learning
platforms. They represent 82% of the total respondents. On the other hand, only 19
respondents out of 103, think they are not proficient in using the E-learning platforms. They
represent only 18% of the total respondents. This indicates that most of the people are
confident in using e-learning platforms.
 DO YOU LIKE THE CONCEPT OF PEER TO PEER LEARNING?
Number of Percentage No
20%
respondents (%)
Yes 82 80
No 21 20
Total 103 100
Yes
Interpretation: the table and the figure show 80%
Yes No
that the majority of the respondents i.e. 82
out of 103 like the concept of peer to peer learning while only 21 out of 103 respondents do
not like the concept of peer to peer learning. Yes category is represented by 80% and no
category is represented by 20% which indicates that most of the people like the concept of
peer to peer learning.
7
EduColl
 HAVE YOU EVER ENCOUNTERED CYBERBULLYING?
Interpretation: from of
Number the table and the
Percentage Yes 24%
figure we canrespondents
see that 25 respondents
(%) out
Yes 25 24
No 78 76
Total 103 100
No 76%
Yes No
8
EduColl
of 103 have encountered cyber-bullying. Although the cases are less in number but they do
exist. 78 out 103 have not encountered cyber-bullying. They constitute 76% of the total. This
indicates that the cases of cyber-bullying do exist.
 ARE YOU SATISFIED WITH THE PLATFORM?
No
Number of Percentage 13%
respondents (%)
Yes 90 87
No 13 13
Total 103 100

Yes 87%
YesNo

show that the majority of the respondents i.e. 90 out of 103 are satisfied with platform while
only 13 out of 103 respondents are not. Yes, category is represented by 87% and No category
is represented by 13%. This indicates that we are successful in fulfilling our objective.
8
EduColl
 DID YOU ENCOUNTER ANY LOGIN PROBLEM?
Number of Percentage Yes 21%
Interpretation: the table and the

respondents (%)figure
Yes 22 21
No 81 79
Total 103 100 No 79%
Yes No
8
EduColl
show that out of 103 respondents, the majority i.e. 81 did not encounter any login problem.
Only 22 respondents did, indicating that only a few respondents faced problem while logging
in.
 DID YOU ENCOUNTER ANY BUG?
Yes No
Number of Percentage
Yes 33%
respondents (%)
Yes 34 33
No 69 67
No
67%
Total 103 100
Interpretation: the table and the figure show that out of 103 respondents, the majority i.e. 69
did not encounter any bug. Only 34 respondents did. This indicates that majority of
respondents did not encounter bugs. And those of who did face it, faced it probably because
of the fact that it was the first release of the app.
8
EduColl
 IF YES, DID YOU INFORM THE TEAM?
Number of Percentage Yes No
respondents (%) No
33%
Yes 69 67
No 34 33
Yes
67%
Total 103 100
Interpretation: from the above it can be seen that out of 103 respondents, the majority
constituting 67% informed the team about the bug. Only 33% respondents did not.
 IF YES, WAS THE TEAM EFFICIENT IS RESOLVING THE ISSUES?
No
respondents (%)
Yes 90 87
No 13 13
Total 103 100

Yes 87%
YesNo
Interpretation: it can clearly be seen that most of the respondents i.e. 90% think that the
team was efficient in resolving the issues and only 13% think they were not efficient. This
indicates that the team has been efficient in resolving the issues.
84
EduColl
 DID THE EDUCOLLAB TEAM WORK UP TO YOUR EXPECTATIONS?
No 6%
respondents (%)
Yes 97 94
No 6 6
Total 103 100

Yes 94%
YesNo

show that out of 103 respondents, almost all the respondents i.e. 97 think that the team
worked up to their expectations and only 6 respondents think that it did not, indicating that
the team has done a good job.
 WILL YOU KEEP USING THE PLATFORM?

No 3%
respondents (%)
Yes 100 97
No 3 3
Total 103 100

Yes 97%
YesNo
Interpretation: it can be seen that out of 103, all the respondents except for 3 think that they
will keep using the platform.
85
EduColl
 DO YOU THINK EDUCOLLAB IS BETTER THAN OTHER E-

LEARNING PLATFORMS?
Interpretation: the table

Number of and the figure
Percentage No
19%
respondents (%)
Yes 83 81
No 20 19
Total 103 100

Yes
81%
Yes No
86
EduColl
show that out of 103 respondents, the majority i.e. 83 thinks that EduCollab is better than
other e-learning platforms. Only 20 respondents did not think so. These 20 respondents might
be having a liking towards other e-learning platforms or might be using those for years.
 PLEASE STATE YOUR REASON FOR THE ABOVE
Majority of the students stated the following points in favour of Educollab:
 It is better because it provides interactive learning sessions.

 Screen recording for answers is a good feature.
 Because of peer to peer concept.
 The courses are easily accessible.
 The platform is user-friendly.
 It provides a safe platform.
 It is interactive and has more meaning content.
 No advertisements.
 Efficient coin system
 It has better doubt solving mechanism.
Majority of students stated the following points not in favour of Educollab;
 It lacks in many under-the-hood-features.

 It needs improvement to gain that status in market.
 It needs to add more content.
87
EduColl
 WOULD YOU ENCOURAGE YOUR FRIENDS/RELATIVES TO USE

THIS PLATFORM?
No
respondents (%)
Yes 93 90
No 10 10
Total 103 100

Yes
90%
YesNo
show that out of 103 respondents, 90% of the respondents would encourage their
friends/relatives to use this platform while remaining 10% would not do so.
 WOULD YOU PREFER BOOK LEARNING TO LEARNING FROM

THIS PLATFORM?
Yes No
Interpretation: we can see from the above
respondents (%)
No
Yes 59 57 43%
Yes
No 44 43 57%
Total 103 100

that 57% of respondents would prefer book learning to learning from this platform while
the remaining 43% would prefer learning from this platform to book learning. Although
there is not much difference but people still do prefer face to face learning to digital
learning.
88
EduColl
 YOUR RECOMMENDATION TO IMPROVE THE PLATFORM
Given below are some of the recommendations of the students as future plan of action for
EduCollab:
 Launch IOS version of the app.

 Allow answers in the form of text.
 Allow users to answer the questions on desktop.
 Make the user experience more interactive via some sort of notification when user's
question is answered, answer is accepted, when there is a comment on his answer,
etc.
 Integrate feature for redeeming coins
 Introducing the dark mode.
 Add more content.
 Make it easier to operate.
 Colour scheme can be improved.
 Courses should be segregated nicely.
89
EduColl
RELIABILITY ANALYSIS
In this research for reliability analysis of questionnaire, Cronbach’s alpha test is used to
assess the internal consistency of a questionnaire (or survey) that contains multiple Likert-
type scales and items. Total number of questions or items in the questionnaire are 21 which
are measured on Likert-scale. All these items are responded to on a 5-point Likert-scale,
where 1 = strongly agree, 2 = agree, 3 = neutral, 4 = disagree and 5 = strongly disagree. To
check reliability, Cronbach's Alpha is run in SPSS on 21 statements or items collectively. The
output of the analysis is discussed as follow:
TABLE 4.19
RELIABILITY STATISTICS
Cronbach's Alpha Number of Items
.941 21
Interpretation: The above table 4.19 labelled as Reliability Statistics provides the value of
Cronbach’s alpha coefficient. The value of Cronbach’s alpha varies between zero and one.
The value closer to one, means greater the internal consistency of the items of specific
sample. N of items means number of items (statements) that are tested. In this research, N of
items are 21 items which are tested for reliability analysis.
In this study, all 21 items are scaled same on 5-point Likert-scale. So, value of simple
Cronbach's Alpha is reported for final results of reliability analysis.
According to Reliability statistics, Cronbach’s alpha coefficient value (α) = .941, which
shows that questionnaire is very highly reliable. In other words, Cronbach’s alpha is greater
than 0.9 (i.e. 0.941) which indicates excellent reliability Also, it indicates high level of
internal consistency of scale of all items in the questionnaire.
90
EduColl
FACTOR ANANLYSIS
In this research work, the analysis of statements relating to perception of students towards
EduCollab after its Beta release is done with the help of factor analysis in SPSS software.
Factor analysis is used to transforms a set of variables (21 statements) into a new set of
composite variables (factors) that are correlated with each other.
Initially, there are 21 statements, relating to student’s perception towards EduCollab, on

which factor analysis is run. All statements are properly coded on 5-point Likert-scale where,
1- Strongly agree, 2-agree, 3-Neutral, 4-Disagree and 5-Strongly disagree. There is no
missing value in data. The data does not have any unengaged response by any respondent.
The unengaged response is that response in which respondent selects common option for all
statements indicating that the respondent didn’t fill the questionnaires after reading the
statements. Therefore, the data set is adequate to run Factor Analysis.
The final result of factor analysis on the perception of students is summarized as follow with
the help of various tables of output of the analysis:
TABLE 4.20
KMO AND BARTLETT'S TEST
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .908
Approx. Chi-Square 1576.943
Bartlett's Test of Sphericity df 210
Sig. .000
In the analysis, The Kaiser-Meyer-Olkin Test is used to check the assumption of adequate
sample size and Bartlett’s Test of Sphericity is used to check the correlation between
variables. According to output of KMO and Bartlett's Test (as shown in above table 4.20),
The Kaiser- Meyer-Olkin is 0.908, which is more than 0.5 means sample size is adequate for
factor analysis. According to the recommendation of Hutcheson & Sofroniou (1999), the
KMO value ranging between 0.8-0.9 means that sample size is great. The Bartlett’s Test of
Sphericity is significant if its associated probability is less than 0.05 (i.e. sig. < 0.05). In the
analysis it is actually 0.000,
i.e. the significance level is small enough to reject the null hypothesis. This means that
91
EduColl
correlation matrix is not an identity matrix or there is correlation between each statements of
the analysis. It fulfils both assumptions of factor analysis. Thus, it is concluded that the data
is fit for factor analysis.
TABLE 4.21
COMMUNALITIES
Initial Extraction
S1 1.000 .784
S2 1.000 .836
S3 1.000 .793
S4 1.000 .751
S5 1.000 .845
S6 1.000 .679
S7 1.000 .749
S8 1.000 .652
S9 1.000 .559
S10 1.000 .750
S11 1.000 .633
S12 1.000 .727
S13 1.000 .828
S14 1.000 .702
S15 1.000 .703
S16 1.000 .485
S17 1.000 .512
S18 1.000 .734
S19 1.000 .710
S20 1.000 .784
S21 1.000 .487
Extraction Method:
Principal Component Analysis.
Interpretation: The table 4.21 illustrates the communalities value before and after extraction
of all statements. In this survey, a principal component analysis (PCA) was conducted on the
21 items with orthogonal rotation (Varimax). Principal component analysis works on initial
assumption that all variance is common; therefore, before extraction (initial) all
communalities are 1. The communalities value in Extraction labelled column reflect the
common variance shared by specific items (variables). Like the communality value after
extraction for statement 1 is 0.784 means 78.4% of variance associated with statement 1 is
common or shared variance.
92
EduColl
TABLE 4.22
TOTAL VARIANCE EXPLAINED
Component Initial Eigenvalues Extraction Sums of Squared Rotation Sums of Squared
Loadings Loadings
Total % of Cumulative Total % of Cumulative Total % of Cumulative
Varianc % Varianc % Varianc %
e e e
1 10.417 49.602 49.602 10.417 49.602 49.602 5.584 26.589 26.589
2 1.939 9.233 58.835 1.939 9.233 58.835 3.796 18.078 44.667
3 1.286 6.125 64.960 1.286 6.125 64.960 2.711 12.910 57.577
4 1.061 5.052 70.012 1.061 5.052 70.012 2.611 12.435 70.012
5 .943 4.489 74.500
6 .683 3.255 77.755
7 .583 2.775 80.530
8 .555 2.644 83.174
9 .519 2.473 85.647
10 .415 1.978 87.625
11 .397 1.891 89.516
12 .360 1.715 91.231
13 .341 1.624 92.855
14 .293 1.396 94.250
15 .277 1.320 95.571
16 .212 1.007 96.578
17 .186 .885 97.463
18 .172 .818 98.281
19 .157 .748 99.029
20 .114 .545 99.574
21 .090 .426 100.000
Extraction Method: Principal Component Analysis.
Interpretation: The tale 4.22 shows the Total Variance Explained by various components
(factors) along with their eigenvalues. By default, SPSS uses Kaiser’s criterion of retaining
factors with eigenvalues greater than 1. This analysis constitutes four factors with
Eigenvalues above one (i.e. 10.417 of first, 1.939 of second, 1.286 of third and 1.061 of
fourth factor). It also displays the eigenvalue in terms of the percentage of variance explained
by all factors; where first factor explains 49.602% of the variance, the second 9.233%, the
third 6.125% and the fourth 5.052% so on. The middle part of table columns labelled,
Extraction Sums of Squared Loadings displays the same values as before extraction, except
that the values for the discarded factors are ignored (hence, the table is blank after the fourth
factor). In the final part of the table (labelled Rotation Sums of Squared Loadings), the
eigenvalues of the factors after rotation are displayed. Rotation is used for optimizing the
93
EduColl
factor structure and equalizing the
94
EduColl
relative importance of the four factors. Before rotation, factor 1 accounted for more variance
than the remaining three (i.e. 49.602% as compared to 9.233%, 6.125% and 5.052%), but
after extraction it accounts for only 26.589% of variance, (compared to 18.078%, 12.910%,
12.435% of remaining three factors respectively). All these four factors are cumulatively
explaining 70.012% of the total variance for the entire set of variables.
So, it can be concluded that from 21 statements 4 factors are extracted by SPSS those having
Eigenvalues more than one and these four factors explain the total of 70.012% of the variance
for all statements in the analysis.
TABLE 4.23
ROTATED COMPONENT MATRIXA
VARIABLES Component
1 2 3 4
S5 .858
S4 .813
S20 .770
S13 .709
S14 .702
S12 .691
S15 .650
S10 .700
S7 .690
S8 .646
S6 .628
S11 .616
S21 .603
S9 .570
S2 .901
S3 .800
S1 .719
S18 .764
S19 .757
S16 .650
S17 .555
Extraction Method: Principal Component Analysis.

Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 6 iterations.

Interpretation: table 4.23 shows the loadings of the 21 variables on the 4 extracted factors
after the rotation. In the analysis Principal component analysis method and Varimax rotation
method are used. The rotated component matrix is converged in 6 iterations.
95
EduColl
Table 4.24
FACTORS LABELS AND STATEMENTS:

FACTOR 1: Effective Learning
Sr. No. STATEMENTS FACTOR
LOADING
1 S5 EduCollab has made teaching and learning more effective. .858
2 S4 It provides a great medium for interaction. .813
3 S20 Its coin system promotes user participation and learning. .770
4 S13 It provides an easy feedback mechanism. .709
5 S14 It promotes personalized learning for the students. .702

S12 There can be interactive peer to peer learning .691
6
sessions through this forum.
7 S15 You are able to get all your doubts cleared. .650
TABLE 4.25
FACTOR 2: Interactive Learning

LOADING
1 S10 It has appropriate tools to ask and answer the questions. .700
2 S7 Students get easy access to learning material. .690
3 S8 Appropriate answers are provided within limited time. .646
4 S6 Course content is organized and well planned. .628
5 S11 There is an interactive communication between the .616

instructor and the students.
6 S21 The app size is big. .603
7 S9 Answers through screen recording are clear. .570
96
EduColl
TABLE 4.26
FACTOR 3: Attractive Learning

LOADING
1 S2 It has an attractive colour scheme. .901
2 S3 Logo is appropriate. .800
3 S1 EduCollab is user-friendly. .719
TABLE 4.27
FACTOR 4: Safe Learning

LOADING
1 S18 It is safe from cyber-bullying. .764
2 S19 Students are recommended questions as per their .757
preference.
3 S16 Various technical issues are faced while using EduCollab. .650
4 S17 It has an appropriate rating system. .555
Interpretation: the above 4 tables define the division of statements in various factors. This
step involves identification of name for all factors on the basis of common meaning
concluded from all statements underlying that factor. On the basis of this, all the factors are
defined with a name. Factor 1 is named as Effective Learning because it involves statements
related to students’ perception on effective learning provided by EduCollab. Factor 2 as
Interactive Learning, it involves statements about students’ perception towards interactive
learning provided on EduCollab. Factor 3 as Attractive Learning, since it covers all the
statements related with how attractive the app is. Factor 4 as Safe Learning because it
involves statements explaining students’ perception towards safety provided by EduCollab.
97
EduColl
CHAPTER V
SUMMARY, CONCLUSION & RECOMM
98
EduColl
SUMMARY
E-learning platforms are bringing a measurable difference in students' engagement and
performance. It is reducing gaps in the delivery of education and giving a new dimension to
the education space.
The E-learning industry in India is a prolific one, witnessing a steady growth rate of 25 per
cent year-on-year and is projected to be a $1.96 billion industry by 2021. With a network of
more than 1.5 million schools and 18,000 higher education institutes, the market for digital
education in India is enormous. Today, digital learning is no longer a luxury but the
implementation of digital tools of learning has become a necessity in schools.
The key drivers for growth of E-learning being
 Growth in internet and smartphone penetration
 Low cost of online education
 Traditional model unable to fulfil additional capacity
 Digital- friendly government policies
 Demand among working professionals and job-seekers.
Moreover, the schools have been shut all across the world due to the COVID-19. Globally,
over 1.2 billion children are out of the classroom. Due to which, education has changed
dramatically, with the distinctive rise of e-learning, whereby teaching is undertaken remotely
and on digital platforms.
work, practice papers, entrance exam preparations, career counselling to parent connect
facilities. The list is endless and so is its evolution and competition. Despite this, there
seems to be a gap in the education technology market. This is so because there is still
enough room for innovation and advancement.
So, a need was felt to create a platform which not only promotes peer to peer learning
but also protects the students from the crimes/signs of cyberbullying and inappropriate
content. The project EDUCOLLAB was initiated with this objective.
99
EduColl
EduCollab is an Interactive Educational Platform which provides a forum for collaborating

for Happy Learning. It is based on the idea that the best way to learn is to teach. Becoming
India’s first AI driven learning app for the age of 6-16 years, it provides for cyber bullying
identification which distinguishes it from its competitors. It has been a finalist project at
Young Founder’s Summit at Beijing last year(month?). (proposed by, was it a winner?). The
project development is currently on stream.
market (in India). The objectives to be achieved are:
1. To develop a model to protect the students from cyberbullying and inappropriate

content.
2. To create a Recommender System that provides the students with the questions and
3. To understand the perception of the students towards the E-learning platforms amid
the current pandemic situation.
OBJECTIVE 1: The first objective was divided into two parts: profanity detection on text
and profanity detection on videos.
Profanity detection on text was taken up to detect the use of cuss words or any such words, in
the questions; answers and comments put by students, which could hurt the feelings of other
students or have negative impact on them so that appropriate action can be taken. To meet
this objective Text classification technique of NLP was used. Here, an attempt was made to
classify a comment into toxic and non-toxic, using Logistic Regression model. The accuracy
came out to be .88%. Therefore, whenever an image was put into the system, it identified
whether the text was toxic or not.
The model performed better than Naïve Bayes model but there are some others advanced
models that are yet to be explored, as a future scope of study including: LSVM, NB-SVM
and LSTM.
To give an edge to our platform, profanity detection task is not performed only on text but
also on the video content. For the profanity detection on videos, the sub-task of image
detection was taken up. In this, various image layers were identified and passed through the
pre-trained RESNET 50 Model, after removing its last layer. The test accuracy came out
to be 84%.
10
EduColl
Therefore, whenever an image was put into the system, it identified whether the image was
safe for work or not.
As a future scope of study, other sub-tasks under profanity detection on videos like profanity
detection on speech, etc shall be taken up.
OBJECTIVE 2: the objective of creating a recommender system was taken up so as to

provide the students with the learning material, questions and answers that best suit their
needs. And also, to direct the questions asked, to those students most likely to give the best
and appropriate answers. To fulfil the second objective, Hybrid approach was used since it
addresses the shortcomings of both content-based filtering and collaborative filtering. The
topic modelling technique was used to recommend topics randomly to the user on their first
visit. For this, LDA model was used. The accuracy came out to be 91%. Therefore, whenever
a new user, uses the app 10 random topics are suggested to him.
For the future scope of study, the BOT 2 shall be taken up which extracts user interests from
the clickstream data that can then be used for further personalisation for (her)his news feed.
OBJECTIVE 3: This objective was taken up to know the students’ perception towards
EduCollab after its Beta release. A sample of 103 respondents was taken for conducting
survey. The survey conducted through the questionnaires which were filled up online by all
respondents. Data was analysed using the Statistical Package for Social Sciences (SPSS).
Descriptive, Factor analysis and Cronbach’s alpha have been used in the analysis.
FINDINGS:
Demographics: in the PART 1 of the questionnaire it was found that out of 103 respondents:
 55% were females and 45% were males.
 Majority of the respondents had Samsung phone, followed by iphone users, Xiaomi,
Redmi, Vivo, etc.
 Majority of the respondents i.e. 69% did not face internet connectivity issues while only
31% faced them. Probably due to the fact that India is becoming technologically
equipped.
 Major portion of the respondents constituting 76% had used E-learning platforms before
EduCollab while 24% had not. This indicates that many people have already resorted to
E- learning.
10
EduColl
 82% of respondents think that they are proficient in using E-learning platforms whereas
only 18% respondents think they are not proficient in using them, indicating that most of
the people are confident is using e-learning platforms.
 80% of respondents like the concept of peer to peer learning whereas only 20% of the
respondents do not like it which indicates that most of the people like the concept of peer
to peer learning.
 76% have not encountered cyber-bullying while 24% have encountered cyber-bullying.
This indicates that the cases of cyber-bullying do exist.
 Except for 13%, 87% of respondents were satisfied with the platform. This indicates that
we are successful in fulfilling our objective.
 Only 22 respondents encountered login problem while 81 respondents did not face any
login problem, indicating that only a few respondents faced problem while logging in.
 Only 33% of respondents encountered bug while 67% did not encounter any bug,
indicating that majority of respondents did not encounter bugs. And those of who did face
it, faced it probably because of the fact that it was the first release of the app.
 69 respondents informed the team about the bugs but 34 respondents did not inform the
team.
 90 respondents think that the team was efficient in resolving the issues while only 13
thinks otherwise. This indicates that the team has been efficient in resolving the issues.
 almost all the respondents i.e. 97 thinks that the team worked up to their expectations and
only 6 respondents think that it did not, indicating that the team has done a good job.
 all the respondents except for 3 say that they will keep using the platform.
 majority i.e. 83 thinks that EduCollab is better than other e-learning platforms. Only 20
respondents did not think so. These 20 respondents might be having a liking towards
other e-learning platforms or might be using those for years.
 Various reasons were stated for the previous statement like it provides interactive learning
sessions, screen recording feature is good, peer to peer feature is great, app is more
accessible, easy to use, new concept, etc.
10
EduColl
 90% of the respondents would encourage their friends/relatives to use this platform while
remaining 10% would not do so.
 57% of respondents would prefer book learning to learning from this platform while the
remaining 43% would prefer learning from this platform to book learning. Although there
is not much difference but people still do prefer face to face learning to digital learning.
 Some of the recommendations of the students to improve the app were: to add more
courses, to improve the colour scheme, logo, etc. Most of the recommendations were
directed towards improving the user interface.
FACTORS OF STUDENTS’S PERCEPTION TOWARDS EDUCOLLAB AFTER ITS

BETA RELEASE:
In this, 21 statements (items) related with students’ perception towards EduCollab were
analysed in SPSS to identify their factors. First of all, reliability of these 21 statements were
checked with the help of Cronbach’s alpha test in which value of Cronbach’s alpha
coefficient came out to be 0.941. it stated that there was a high level of internal consistency
between all statements which meant that the data is reliable for further analysis. Then, 21
statements were analysed on Data Reduction technique i.e. Factor Analysis. Finally, Principal
Component Analysis (PCA) was conducted on these statements with Varimax rotation
method. In this analysis, four factors were retained by SPSS which had Eigen value greater
than 1. These four factors comprehensively explained the total 70.012% of the variance for
all statements. These factors identified the perception of students towards EduCollab. All the
statements underlying under each factor measured the level of agreement or disagreement of
customers on 5-point scale; where 1-strongly agree, 2-agree, 3-neutral, 4-disagree and 5-
strongly disagree. All these factors were defined with a name on the basis of common
meaning extracted from the statements underlying each factor.
10
EduColl
The following four factors came out from the study:
TABLE 5.1
RANGE of
FACTOR NAME of FACTORS ITEMS % of VARIANCE FACTOR
LOADING
1 Effective Learning 7 26.589% .650 to .858
2 Interactive Learning 7 18.078% .570 to .700
3 Attractive Learning 3 12.910% .719 to .901
4 Safe Learning 4 12.435% .555 to .764
TOTAL 21 70.012% .555 to .901
 Factor 1 is named as Effective Learning comprises 7 statements reported on 5-point

Likert scale that explains 26.589% of the variance with factor loadings from .650 to
.858.
 Factor 2 is named as Interactive Learning consists of 7 statements reported on 5-point

Likert scale that explains 18.078% of the variance with factor loadings from .570 to
.700.
 Factor 3 is named as Attractive Learning consists of 3 statements reported on 5-point

Likert scale that explains 12.910% of the variance with factor loadings from .719 TO
.901.
 Factor 4 is named as Safe Learning comprises 4 statements reported on 5-point Likert

scale that explains 12.435% of the variance with factor loadings from .555 to .764.
10
EduColl
CONCLUSION
To conclude it can be said that the educational institutions have always considered
educational apps or digital learning as a supplementary tool and may have had difficulty in
mainstreaming it, mostly due to not having fully understood its efficacy. However, the
current situation has given us a fillip to accelerate the adoption of technology and experiment
with online learning and measure its success. No doubt, digital learning can never replace
teacher-student interface and has various other limitations like lack of broadband or required
structure at home, lack of skills in using it, lack of supervision, concentration, etc and has
many evils like cyber-bullying but it is the only solution today. Keeping all these things in
mind, our project was taken up. Although, initially our aim was to fill the gap in the
educational technology market by popularising the concept of peer to peer learning through a
platform which was safe from the evils of Cyber-bullying but later on it was given impetus by
the pandemic situation.
From the study we can conclude that majority of users of EduCollab liked the app and are
satisfied with it. They think that the team has been efficient and has worked up to their
expectations. They intend on continuing to using it in future and suggesting it to their
relatives as well.
The respondents have provided some recommendations which will be worked upon before
the next release.
10
EduColl
RECOMMENDATIONS
These recommendations can be put as:
 Launch IOS version of the app.

 Allow answers in the form of text.
 Allow users to answer the questions on desktop.
 Make the user experience more interactive via some sort of notification when user's
question is answered, answer is accepted, when there is a comment on his answer, etc.
 Integrate feature for redeeming coins
 Introducing the dark mode.
 Add more content.
 Make it easier to operate.
 Colour scheme can be improved.
 Improve the segregation of the topics.
FUTURE SCOPE OF STUDY

As far as the first objective is concerned, the model used for profanity on text i.e. Logistic
Regression performed better than Naïve Bayes model but there are some others advanced
models that are yet to be explored, including: LSVM, NB-SVM and LSTM.
In case of profanity detection on videos, other sub-tasks like profanity detection on speech
shall be taken up after the completion of profanity detection on images.
The building up of BOT 2 shall be taken up which extracts user interests from the clickstream
data that can then be used for further personalisation for (her)his news feed in case of
Recommender Systems.
Lastly, the recommendations provided by the respondents/users of the app shall be worked
upon before the next release of the app.
10
EduColl
BIBLIOGRAPHY
10
EduColl
BIBLIOGRAPHY
Algorithmia (2016). Everything you need to know about natural language processing.
Available online at <https://algorithmia.com/blog/introduction-natural-language-processing-
nlp> Accessed on April 22, 2020.
Banerjee, J. & Bose, I. (2011). Higher Education Through Mobile Learning: An Analysis of
Students from Kolkata. Indian Journal of Commerce and Management Studies 2 (1), 123-
134.
Bansal, S. (2016). Beginners Guide to Topic Modeling in Python. Available online at

<https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-
python/> Accessed on May 26, 2020.
Bansal, S. (2017). How India’s ed-tech sector can grow and the challenges it must
overcome. Available online at <https://www.vccircle.com/the-present-and-future-of-indias-
online- education-industry/> Accessed on April 19, 2020.
Barber, J. Available online at <https://rstudio-pubs-

static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html> Accessed
on May 22, 2020.
Boud (2001). Peer Learning in Higher Education: Learning from and with each other,
London.
Bronshtein, A. (2017). A quick introduction to the “Pandas” python library. Available online
at <https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library-
f1b678f34673> Accessed on April 22, 2020.
Brownlee, J. (2019). Introduction to Python Deep Learning with Keras. Available online at
<https://machinelearningmastery.com/introduction-python-deep-learning-library-keras/>
Accessed on May 4, 2020.
Ceobanu and Boncu (2014). The Challenges of the Mobile Technology in the Young Adult
Education. Procedia - Social and Behavioral Sciences 142.
10
EduColl
Dwivedi, P. (2019). Understanding and Coding a ResNet in Keras. Available online at

<https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33>
Accessed on May 4, 2020.
Education World Special Report (2018). The e-learning evolution. Available online at
<https://www.educationworld.in/the-e-learning-evolution/> Accessed on April 6, 2020.
Hiltbrand, T. (2018). 5 Advanced Analytics Algorithms for Your Big Data Initiatives.
Available online at <https://tdwi.org/articles/2018/07/02/adv-all-5-algorithms-for-big-
data.aspx> Accessed on April 25, 2020
IMS Proschool (2018). Digitization in India: Several opportunities for growth &
transformation. Available online at <https://www.proschoolonline.com/blog/digitization-in-
india-several-opportunities-for-growth-transformation> Accessed on April 19, 2020.
Jain, K. (2015). Scikit-learn(sklearn) in Python – the most important Machine Learning tool
I learnt last year! Available online at <https://www.analyticsvidhya.com/blog/2015/01/scikit-
learn-python-machine-learning-tool/> Accessed on April 23, 2020.
Kaka, N., Madgavkar, A., Kshirsagar, A., Gupta, R., Manyika, J., Bahl, K. and Gupta, S.
(2019). Digital India: Technology to transform a connected nation. Available online at
<https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-india-
technology-to-transform-a-connected-nation> Accessed on April 18, 2020.
Kakoty Sangeeta et al. (2011). E-learning as a Research Area: An Analytical Approach.

International Journal of Advanced Computer Science and Applications 2(9).
Kapadia, S. (2019). Topic Modeling in Python: Latent Dirichlet Allocation (LDA).

Available online at <https://towardsdatascience.com/end-to-end-topic-modeling-in-python-
latent- dirichlet-allocation-lda-35ce4ed6b3e0> Accessed on May 26, 2020.
Kawatra, P. S. and Singh, N. K. (2006). E-learning in LIS education in India. Available

online at <https://www.researchgate.net/profile/Neeraj_Singh24/publication/228959695_E-
learning_in_LIS_education_in_India/links/0f317533435c84625d000000/E-learning-in-LIS-
education-in-India.pdf> Accessed on April 19, 2020. Pg. 605
Keller Christina and Cernerud lars (2002). Student’s Perceptions of E-Learning in University
Education. Journal of Educational Media 55:67.
10
EduColl
Konwar, I. H. (2017). A study on attitude of college students towards E-learning.

International Journal of Information Science and Education. Research India Publications.
Volume 4.
Landry et al. (2006). Measuring Student Perceptions of Blackboard Using the Technology
Acceptance Model. Wiley Online Library 4(1).
Lee et al. (2007). The Influence of Learning Styles on Learners in E-Learning Environments:
An Empirical Study. Information Systems Department, Qatar University.
Lopez, D. (2019). Your guide to natural language processing (NLP). Available online at
<https://towardsdatascience.com/your-guide-to-natural-language-processing-nlp-
48ea2511f6e1> Accessed on April 22, 2020.
Maklin, C. (2019). TF IDF | TFIDF Python Example. Available online at <

https://towardsdatascience.com/natural-language-processing-feature-engineering-using-tf-idf-
e8b9d00e7e76 > Accessed on March 26, 2020.
Makoe, M. (2012). Teaching Digital Natives.: Identifying competencies for mobile learning
facilitors in distance education. South African Journal of Higher Education, 26(1), 91-104.
Malhotra, M. (2018). E-Learning Is Transforming the Face Of Education In India.

Available online at <http://www.businessworld.in/amp/article/E-Learning-Is-Transforming-
The-Face- Of-Education-In-India/01-12-2018-164717/> Accessed on April 2, 2020.
Narkhede, S. (2018). Understanding Confusion Matrix. Available online at

<https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62> Accessed
on April 28, 2020.
O’Donnell and King (1999). Cognitive Perspectives on Peer Learning. Lawrence

Erlbaum Associates, Mahwah, NJ.
Pandey, P. (2019). Data Preprocessing : Concepts. Available online at

<https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825> Accessed on
April 23, 2020.
Purohit, A. (2017). Digitization, analytics go hand in hand. Available online at

<https://www.livemint.com/Opinion/Iggt2kFbQGAXvxR9wuu3oI/Digitization-analytics-go-
hand-in-hand.html> Accessed on April 12, 2020.
11
EduColl
Roberts (2008). Sociogenomic Personality Psychology. Journal of Personality Volume 76,

Issue 6.
Roca et al. (2006). Understanding e-Learning continuance intention: An extension of the

technology acceptance model. International Journal of Human-Computer Studies.
International Journal of Human-Computer Studies 64(8):683-696.
Rueckert, D. Kim, J.D. & Seo, D. (2013). Students’ perceptions and experiences of mobile
learning. University of Hawaii National Foreign Language Resource Center; Michigan State
University Center for Language Education and Research.
Saha, S. (2018). A Comprehensive Guide to Convolutional Neural Networks — the ELI5

way. Available online at <https://towardsdatascience.com/a-comprehensive-guide-to-
convolutional-neural-networks-the-eli5-way-3bd2b1164a53> Accessed on April 25,
2020.
Sarrab, M. Elgamel, L. & Aldabbas, H. (2012). Mobile Learning (M-Learning) and Educational
Environments. International Journal of Parallel Emergent and Distributed Systems 3(4):31-38.
Scott, W. (2019). TF-IDF from scratch in python on real world dataset. Available online at
<https://towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-
real-world-dataset-796d339a4089> Accessed on April 27, 2020.
Tolia, S. (2017). Digital bridge to growth. Available online at

<https://www.pwc.in/services/international-business-groups/us-business-group/taking-us-
india-economic-relations-to-the-next-level/digital-bridge-to-growth.html> Accessed on April
5, 2020.
Vanderheyden, T. (2018). Pickle in Python: Object Serialization. Available online at

<https://www.datacamp.com/community/tutorials/pickle-python-tutorial> Accessed on April
23, 2020.
Voulgaris, Z. (2018). Data Science Modeling and Featurization. Available online at

<https://data-science-blog.com/blog/2018/01/12/data-science-modeling-and-featurization/>
Accessed on April 25, 2020.
Ziad, N. (2016). How data science is the driving force behind successful digital
transformation. Available online at <https://www.information-age.com/data-science-driving-
force-behind-successful-digital-transformation-123462527/> Accessed on April 6, 2020.
11
EduColl
SITES
https://sabudh.org/
http://tatrasdata.com/
https://www.mckinsey.com/
https://byjus.com/
https://brainly.in/
https://www.quora.com/
https://www.vedantu.com/
https://www.toppr.com/
https://www.chegg.com/
https://www.khanacademy.org/
https://www.meritnation.com/
https://quizlet.com/
https://www.edu-collab.com/
https://techterms.com/definition/python
https://www.geeksforgeeks.org/numpy-in-python-set-1-introduction/
https://www.w3schools.com/python/numpy_intro.asp
https://www.geeksforgeeks.org/understanding-python-pickling-example/
https://www.guru99.com/nltk-tutorial.html
https://en.wikipedia.org/wiki/Data_pre-processing
https://www.techopedia.com/definition/14650/data-preprocessing
https://www.techopedia.com/definition/13698/tokenization
https://en.wikipedia.org/wiki/Stop_words
https://www.geeksforgeeks.org/removing-stop-words-nltk-python/
11
EduColl
https://www.geeksforgeeks.org/python-lemmatization-with-nltk/
https://en.wikipedia.org/wiki/Feature_extraction
https://www.geeksforgeeks.org/confusion-matrix-machine-learning/
https://en.wikipedia.org/wiki/Convolutional_neural_network
https://www.geeksforgeeks.org/os-module-python-examples/
https://www.geeksforgeeks.org/os-module-python-examples/
https://opencv-python-
tutroals.readthedocs.io/en/latest/py_tutorials/py_gui/py_image_display/py_image_display.ht
ml
https://monkeylearn.com/blog/introduction-to-topic-modeling/
https://en.wikipedia.org/wiki/Topic_model
https://seaborn.pydata.org/introduction.html
https://docs.python.org/2/library/string.html
https://pypi.org/project/gensim/
https://en.wikipedia.org/wiki/Gensim
https://monkeylearn.com/text-classification/
https://en.wikipedia.org/wiki/Descriptive_statistics
https://www.statisticshowto.com/cronbachs-alpha-spss/
https://www.statisticssolutions.com/factor-analysis-sem-factor-analysis/
https://www.statisticshowto.com/factor-analysis/
11
EduColl
ANNEXURE
11
EduColl
ANNEXURE
QUESTIONNAIRE
This questionnaire is aimed at soliciting information on student's perception about
EduCollab after its Beta Release. This analysis is exclusively for research purposes.
Your co-operation will be highly appreciated.
PART 1 – GENERAL INFORMATION

1. What is your name?
2. What is your gender?
 Male
 Female
3. What is the model and make of your phone?
4. What is the android version of your mobile?
5. Do you face internet connectivity issues?
 Yes
 No
6.Have you used any E-learning platform before EduCollab?
 Yes
 No
Do you think you are fully proficient in using the E-learning platforms?
 Yes
 No
8. Do you like the concept of peer to peer learning?
 Yes
 No
9. Have you ever encountered cyber-bullying?
 Yes
 No
11
EduColl
PART 2 -
On a five-point scale ranging from strongly agree to strongly disagree (1=strongly agree,
2=agree, 3=neutral, 4=disagree, 5=strongly disagree) please select the following:
ABOUT EDUCOLLAB 1 2 3 4 5
EduCollab is user friendly
It has an attractive colour scheme
Logo is appropriate
It provides a great medium for interaction
EduCollab has made teaching and learning more
effective because it integrates all forms of media,
print, audio, print, video and animation
Course content is organized and well-planned
Students get easy access to learning material
Appropriate answers are provided within limited

time
Answers through screen recording are clear
It has appropriate tools to ask and answers the

questions
There is an interactive communication between the
instructor and the students
There can be interactive peer to peer learning
sessions through this forum
It provides an easy feedback mechanism
It provides personalized learning for the students
You are able to get all your doubts cleared
Various technical issues are faced while using

EduCollab
It has an appropriate rating system
It is safe from cyber-bullying
Students are recommended courses as per their

preference
Its coin system promotes user participation and
learning
The app size is big
11
EduColl
Are you satisfied with the platform?

 Yes
 No
Did you encounter any login problem?
 Yes
 No
Did you encounter any bug?
 Yes
 No
If yes, did you inform the team?
 Yes
 No
If yes, was the team efficient is resolving the issues?
 Yes
 No
Did the EduCollab team work up to your expectations?
 Yes
 No
Will you keep using the platform?
 Yes
 No
Do you think EduCollab is better than other E-learning platforms?
 Yes
 No
Please state the reason for the above
Would you encourage your friends/relatives to use this platform?
 Yes
 No
Would you prefer book learning to learning from this platform?
 Yes
 No
Your recommendation to improve this platform
THANK YOU AND GOD BLESS!
11
EduColl
THANK YOU
11

Educollab: Artificial Intelligence Powered E-Learning Platform

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Educollab: Artificial Intelligence Powered E-Learning Platform

Uploaded by

Copyright:

Available Formats

EduCollab

ARTIFICIAL INTELLIGENCE POWERED E-LEARNING

SUPERVISED BY: SUBMITTED BY:

UNIVERSITY SCHOOL OF FINANCIAL STUDIES

Dated: July, 2020 Tavleen Singh

Dated: July, 2020 Tavleen Singh

CHAPTER PARTICULARS PAGE

I PROFILE OF THE ORGANIZATION 1

III RESEARCH METHODOLOGY 35

IV ANALYSIS & RESULT REPORTING 60

TABLE TITLE PAGE

1.1 ABOUT SABUDH 2

1.2 ABOUT TATRAS DATA 7

2.1 VARIOUS E-LEARNING PLATFORMS 16

2.2 BUSINESS MODEL CANVAS 20

3.1 CONFUSION MATRIX 45

3.2 DEFINITION OF TERMS 45

3.3 SAMPLE SELECTION OF STUDENTS 55

4.3 MODEL AND MAKE OF MOBILE 70

4.4 INTERNET CONNECTIVITY ISSUES 72

4.5 USE OF E-LEARNING PLATFORM BEFORE EDUCOLLAB 72

4.6 PROFICIENCY IN USING E-LEARNING PLATFORMS 73

4.7 LIKING THE CONCEPT OF PEER TO PEER LEARNING 73

4.8 ENCOUNTERED CYBER-BULLING 74

4.9 SATISFACTION WITH PLATFORM 74

4.10 LOGIN PROBLEM 75

4.11 BUGS ENCOUNTER 75

TABLE TITLE PAGE

4.12 INFORMING EDUCOLLAB TEAM 76

4.13 EFFICIENCY OF EDUCOLLAB TEAM 76

4.14 WORK UPTO EXPECTATIONS 77

4.15 KEEP USING EDUCOLLAB 77

4.16 COMPARISON OF EDUCOLLAB 78

4.17 ENCOURAGE FRIENDS 79

4.18 PREFER BOOK LEARNING TO E-LEARNING 79

4.19 RELIABILITY STATISTICS 80

4.20 KMO & BARTLETT’S TEST 80

4.22 TOTAL VARIANCE EXPLAINED 84

4.23 ROTATED COMPONENT MATRIXA 85

4.24 FACTOR 1: EFFECTIVE LEARNING 86

4.25 FACTOR 2: INTERACTIVE LEARNING 86

4.26 FACTOR 3: ATTRACTIVE LEARNING 87

4.27 FACTOR 4: SAFE LEARNING 87

5.1 FACTOR SUMMARY 94

FIGURE TITLE PAGE

2.1 THE INTERNET IN INDIA BY 2020 13

2.2 USE OF AI IN OUR PLATFORM 19

2.3 ARCHITECTURE OF APPLICATION 19

2.4 MARKETING STRATEGY 21

3.1 PROFANITY DETECTION 36

3.2 PROCESS OF PROFANITY DETECTION 37

3.3 LIBRARIES USED 39

3.5 DEPICTION OF LABELS AND COUNT OF COMMENTS 40

3.6 DEPICTION OF DATA 41

3.7 IMAGE LAYERS 47

3.8 LIBRARIES USED 48

3.10 WORKING OF RECOMMENDER SYSTEM 51

3.11 LIBRARIES USED 52

4.1 PROFANITY DETECTION ON TEXT 61

FIGURE TITLE PAGE

4.3 RESULT OF PROFANITY DETECTION ON TEXT 65

4.4 PROFANITY DETECTION ON VIDEOS 66

4.5 RECOMMENDER SYSTEM 67