Professional Documents
Culture Documents
PROJECT REPORT
SUBMITTED TO
GURU NANAK DEV UNIVERSITY, AMRITSAR
IN PARTIAL FULFILMENT FOR THE DEGREE OF
M.B.A (FINANCE)
(2018 - 2020)
DECLARATION
This is to certify that the project report entitled “Artificial Intelligence Powered E-Learning
Platform -EduCollab” submitted by me, in the partial fulfilment for the Degree of Masters of
Business Administration (Finance) is the result of my original and independent research-work
carried out under the guidance of Dr. Mandeep Kaur, Professor & Head, University School of
Financial Studies, Guru Nanak Dev University, Amritsar (Punjab). It has not been submitted
elsewhere for the award of any other degree, diploma, fellowship or other similar title of any
University or Institution. All the ideas and references have been duly acknowledged.
ACKNOWLEDGEMENT
I acknowledge my heartfelt gratitude to my supervisor, Dr. Mandeep Kaur, Professor &
Head, University School of Financial Studies, for her untiring guidance and support. Without
her, this journey would not have been possible. Her careful attention to the details of my
writing, valuable discussions and the constructive feedback enabled me to complete my
project report. I feel short of words to express how much I feel indebted to her.
I’m extremely grateful to Dr. Sarabjot Singh Anand, Co-Founder Tatras Data & Sabudh
Foundation, for giving me the golden opportunity to intern in his association. His dynamism,
vision, sincerity and motivation have deeply inspired me. It was a great privilege and honour
to work and study under his guidance.
I also wish to thank my mentors of Sabudh for their guidance, help and providing me with an
excellent atmosphere due to which I have been able to complete my internship. Interning at
Sabudh was a great opportunity. It gave me a wonderful platform to recognize my talent. The
whole team was very helpful and the working environment, teaching techniques and guidance
which I got here were marvellous.
A very special thanks to Dr. Jaspal Singh, Professor, University School of Financial Studies,
for always being my pillar of strength. His fatherly figure and wisdom have always guided
me to the right path.
I cannot express enough thanks to Mr. Bhavneet Bhalla, Senior Vice President at Lowe
Lintas and Partners, for being my Godfather. I’m forever indebted to him for his
thoughtfulness, enthusiasm, continuous support and encouragement. His keen interest in my
daily activities has gone a long way in my overall growth.
The completion of the project could not have been accomplished without the respondents of
the questionnaire. I also have a deep sense of gratitude for my family for their love, prayers,
support and help.
Above all, I thank God almighty for giving me the strength and capacity to complete this
project report and for letting me through all the difficulties.
TABLE OF CONTENTS
II PROBLEM DEFINITION 10
INTRODUCTION 10
RATIONALE 27
LITERATURE SUPPORT 29
V SUMMARY 88
CONCLUSION 95
RECOMMENDATIONS 96
BIBLIOGRAPHY 97
ANNEXURE 104
EduCollab
LIST OF TABLES
4.1 CALCULATIONS 64
4.2 GENDER 69
4.21 COMMUNALITIES 83
LIST OF FIGURES
3.4 PHASES 40
3.9 PHASES 49
3.12 PHASES 53
4.2 CALCULATIONS 64
EduCollab
4.7 GENDER 69
ABBREVIATIONS USED
AI Artificial Intelligence
ML Machine Learning
IT Information Technology
VR Virtual Reality
PU Perceived Usefulness
TF Term Frequency
CHAPTER I
PROFILE OF THE ORGANIZATION
EduColl
CHAPTER 1
The Organization’s aim is to work with Government, citizens on data-driven approach for
bringing change for social good on specific projects concerning problems in the areas of:
1
EduColl
TABLE 1.1
ABOUT SABUDH
Founded 2018
Parent Company Tatras Data
Industry Higher Education
Company Size 11-50 employees
Mentors Dr. Sarabjot Singh Anand, Dr. Satman Singh, Dr. Vikas Agrawal,
Prof. Bhiksha Raj, etc
Partners Tatras Data, Innosential, BJFI, PEC, Punjab Police, Punjab
University Patiala, Punjab Engineering College, etc
Headquarters Mohali, Punjab
Type Non-profit
Website http://sabudh.org/
Sabudh believes that the technological advances can be used not only in making the
businesses better but have varied applications in the areas of education, public policy, health
etc. that can actually help society for the better, in functioning more efficiently. The various
projects being carried out for social good are below:
TRAFFIC MANAGEMENT:
Sabudh has signed MOU with Punjab Traffic Police to work together to build innovative
solutions for traffic control and management. It has database of accidents across Punjab,
videos from the CCTV footage across Ludhiana and audio recordings from emergency call
centre in Punjab for stress detection from the speech.
With the analysis of this data and the use of machine learning and AI, the aim is to develop
intelligent solutions to issues of road safety and traffic control leading efficient policing of
our roads.
2
EduColl
The project "TRAFFIC COPS " deals with installing AI in traffic management system as a
part of our infrastructure. The main idea behind it to reduce the manual interface to minimal.
It is a set-up of an Intelligence Traffic Management System (ITMS) which includes radar-
based monitoring, automated traffic signals, CCTV cameras to capture motorists &
commuters breaking laws, Automated Number Plate Recognition (ANPR) to directly send
challan to their homes, tracing emergency calls and protecting traffic police personnel from
health-related problems. It is an initiative taken to make an improved version of "CITY
BRAIN".
PRECISION FARMING:
AI is being used to detect early signs of disease by using drone for Ariel images which
coordinates with a hand held device. The drones and their own multispectral sensors, as well
as developing tools to train a computer program to analyse the images and classify them
based on disease progression. The pesticides are then sprayed on the affected crops using the
drones to prevent harm leading to higher crop yield.
For-example in agriculture, there are now Agrobots and drones being used to gauge the health
of the harvest that can help farmers improve their crop yield and reduce costs. With the help
of advanced technologies, we’re able to save 90% of the spraying costs. These technologies
can help states like Punjab which has always been the food basket of India to rehabilitate
food security while improving crop health. The project “SAT SRI AKAL” caters to this and
aims at bringing Prosperity at the bottom of pyramid in Punjab.
Owing to increasing western influence Gurmukhi is losing its identity day by day. So here
arises the need to preserve its purity. It is widely known that translators are helpful in
understanding a foreign language. Ever wonder that Language Translators can also aid in
promoting and preserving Cultural Heritage. Well, the project “LINGUA FRANCA” is an
attempt to safeguard our Cultural Heritage by Diminishing the Language Barriers.
3
EduColl
INCLUSION OF SPECIALLY-ABLED:
According to World Health Organization (WHO), 466 million people across the world have
disabling hearing loss (over 5% of the world population). So, project “ABILITY
INFINITE”
– “We choose not to put ‘Dis’ in your ability”, is being carried out in an attempt to recognise
Indian sign language for their communication. It aims at using AI for social inclusion of
specially-abled people.
MEDICAL AID:
Medicine is another field where Artificial intelligence has progressed to make the right
diagnosis and detect the disease at the right time for it to be cured. Punjab has the highest rate
of cancer of India. 18 people succumb to the disease every day, according to a recent report
published by the state government. Having AI and machine learning algorithms to diagnose
the fatal disease at an early stage can significantly decrease the mortality rate.
The centre provides, the aspiring Data Scientists, to undergo six months internship and
become a SABUDH FELLOW, with potential employment offered after the completion of
the internship. Interns enrolling in this program get exposure to live projects having real
social impact, fellows, staff and partners. They are mentored by leading data scientists in the
industry and academia. During this period various topics like Core Python Concepts,
Introduction to Machine Learning, Data Exploration and Pre-processing, Supervised
Learning, Unsupervised Learning, etc are covered. Tests are conducted every now and then,
along with the weekly assignments to make sure that students are well versed with the
concepts. Each student participates in projects involving Text Analysis, Natural Language
Processing (NLP), etc. Along with this the interns are also made to participate in various
Kaggle competitions with includes forming 5-6 teams of students and conducting weekly
progress presentations. Students are expected to work closely and collaboratively with team
members onsite for the duration of the program. Hence, making the interns ‘Masters’ in Data
Science.
4
EduColl
Training in advanced technologies such as Machine Learning, AI, Cyber Security, IoT.
Learn how to win and compete on Kaggle – Mentorship form Kaggle Grandmasters.
Extensive network of worldwide academics and industry leaders to interact with
students.
Access to top companies in this space for student employment.
A conducive environment for technical and personal growth.
There are around 30 students in each batch which consists of B. Tech students. It was for the
first time that M.B.A students were given an opportunity to be a part. So, six students
including me, from our M.B.A(Finance) batch were fortunate enough to join in. Each of us
was allotted a project to work on as project manager. We were all given 5-6 B. Tech students
to work as a team and carry out the projects. I was also given access to Jira and Trello Board
to manage the project.
Our day started with a DSU (Daily Stand Up) at 10 a.m. where one by one the M.B.A
students and the Mentors there, stood up and listed out what was done by us the previous day
and our agenda for the day. The first half was devoted to the study session of 2:30 hours,
where the various topics of Data Science were taken up by the mentors. In the latter half, the
project work was carried out. Different mentors were allotted for each project according to
the domain knowledge they have about the project. They guided us as to the requirements of
the project and took daily updates on the work done by us and our teams. Dr. Sarabjot Singh
Anand, the founder, came on a weekly visit to enlighten us on the various important aspects
of Data Science. Along with this, weekly updates on the projects were presented to him by
us.
5
EduColl
learning in India such platform will come in handy.
6
EduColl
This platform is a product of Tatras Data, Sabudh’s Parent Company, but its team also had
members from Sabudh. Our Sabudh EduCollab team consisting of 5 B. Tech students and
myself helped the Tatras team in this project.
In the year 2012, Dr. Sarabjot Singh Anand and Noah Gresham came together to form Tatras
to help technology firms implement data science related initiatives and create a conducive
environment where Data Science can flourish. They understood the complexity of business
and the data science solutions that answer the big questions. They are excited about how
disruptive technologies like AI, blockchain, VR etc are quickly becoming the core of how
businesses operate across the globe and how Application of Data Science will fundamentally
change business from this day forward.
The company is committed to providing the highest level of “cradle to grave” services
ranging from the Need Assessment Stage to the Final Implementation Stage and value to their
clients while promoting a work environment that fosters learning, cooperation and respect
among team members. It
7
EduColl
THEIR APPROACH:
TABLE 1.2
ABOUT TATRAS
Founded 2012
Professionals 50+
Global Clients 20+
Domains 10+
Global Offices 2
8
EduColl
OTHER ASSOCIATIONS:
Tatras and Innosential Labs have joined hands for providing quality
training in data science. In August 2017, Tatras and Innosential Labs,
in association with Nasscom conducted a 5-day Data science
Masterclass at Bengaluru. The masterclass was delivered by the
leading lights in the industry from across the globe. The Data Science
Masterclass was a huge success with 200+ attendees and 500+
companies.
Bhai Jaitaji Foundation India (BJF India) helps empower the rural
youth, who are constrained due to their socio-economic condition, to
realize their potential through provision of specialized coaching and
continuous mentoring. They ensure eligible beneficiaries get all the
available financial and other support from the government and
other partner
organizations. Tatras has joined hands with BJFI to foster altruistic values and commitment
to social change among volunteers.
9
EduColl
CHAPTER II
PROBLEM DEFINITION
1
EduColl
CHAPTER 2
PROBLEM DEFINITION
INTRODUCTION
Digital transformation is the buzzword today. Propelled by the falling costs and rising
availability of smartphones and high-speed connectivity, India is already home to one of the
world’s largest and fastest-growing bases of digital consumers and is digitizing faster than
many mature and emerging economies. These technological advancements have an impact on
almost every aspect of our lives on an ongoing basis.
We often hear about how technologies like Artificial Intelligence, Machine Learning, Cloud
Computing have made our lives easier. But what is that one thing that underpins all these
revolutionary technologies! Is it the service? No! It’s DATA.
Data has also evolved dramatically in recent years, in type, volume, and velocity and has
become the new business currency.
However, data on its own is not enough to grow the business. We must know how to derive
useful insights from it. This is where Data Science comes into picture. Data Science is the
process of using data to find solutions to predict outcomes for a problem statement. For
example, Apple uses Data Science to build a watch that monitors health of an individual.
Similarly, Uber cab providers use algorithms to variate their cab fares according to the
demand and the peak times. Banks analyse various customer credentials/customer related info
such as payment history, products held, and credit history to determine the credit
worthiness of a
1
EduColl
customer and digitally embed it in various channels to offer pre-approved loans and instant
disbursements.
In addition, analytics and Artificial Intelligence (AI) are being integrated with digital
channels to provide instant recommendations on products that will best suit the customer. For
example, a college graduate can be offered an education loan while a married couple with
young children can be offered child investment plans and life insurance policies along with
relevant loan products.
While banks are still trying to achieve this effectively, many organizations have already taken
the lead and set awe-inspiring benchmarks—a case in point being recommendations provided
by Netflix and Amazon based on the customer’s viewing and purchase/browsing history.
Over and above, social information is providing a rich data of hitherto unknown insights
about the customers, such as interests and hobbies, which further helps in offering relevant
services.
Organizations can now self-determine the needs of customers through behavioural analytics.
Analytics can help them understand not only what customers say they want vis-a-vis what
they don’t say they want but also what they really need. It can help organizations understand
the profile of customers and their behaviour across digital touch points. These insights can
then be used for driving sales, services, and personalizing their digital channel experience
Purohit Anup (2017).
Data science is enabling the next generation of enterprise software, resulting in solutions that
tell users what is going to happen and what they should do about it today. It is the only sure-
fire way of creating and validating solutions to improve decision-making across the board.
For modern, forward-thinking businesses that find themselves with more data than they know
what to do with, the appliance of data science will be the difference between sink or swim
Nejmeldeen Ziad (2016).
Honestly, there is no segment of business or society alike which is untouched by the data
science interventions. The impacts of data science are also visible in the field of education
and have effected major changes in how education is being imparted and consumed. Rote
learning and reliance on printed material or book-based learning are fast becoming a
characteristic of the past.
1
EduColl
Till the end of the last century, the education system in India was working on the traditional
classroom-based learning, where the students didn’t get the opportunity to participate in the
interactive sessions. To face the challenges of the changing time, it became necessary to
make concepts clearer and students competent enough to cope up globally. Hence, the
concept of Digital Learning evolved in 2002 - 2003. With technology spreading its wing to
the education sector, the typical classroom which was once characterized by boring hour-long
sessions now transformed into an interesting, fun-filled environment. Digital education made
life easier for both, students and educators Malhotra Monika (2018).
These educational platforms act as a great asset to ones’ learning as they provide a
combination of innovative learning and core learning. Many apps include animated videos to
enhance the learning experience on their app. No matter what ones’ learning goals are, there
are apps for almost every subject, exam, or Interest.
Their importance has increased all the more due to the pandemic situation created by corona
throughout the world. It is through these platforms only that the students are still able to
pursue their education and develop various skills in the times of lockdown.
The E-learning industry in India is a prolific one, witnessing a steady growth rate of 25 per
cent year-on-year and is projected to be a $1.96 billion industry by 2021. With a network of
more than 1.5 million schools and 18,000 higher education institutes, the market for digital
education in India is enormous. Today, digital learning is no longer a luxury but the
implementation of digital tools of learning has become a necessity in schools Malhotra
Monika (2018).
1
EduColl
The number of internet users is expected to reach 730 million by 2020. India may replace
China to have the second largest users after the US.
FIGURE 2.1
Source: http://www.aurumequity.com/the-online-education-industry-in-india-present-and-
future/
The young demographic (15-40 years) who are the most active consumers of smartphones
and internet, look for online learning modules to fulfil their educational requirements at low
cost without having to move out of home, office or city. The internet offers huge accessibility
to enrol for distance courses, degrees and certifications from around the world to urban as
well as rural, and mentally or physically restrained population.
Online education providers can reach out to the masses without setting up a physical
infrastructure or incurring administrative costs such as staff salaries, stationery, books, etc.
Hence, the cost savings are passed to the users. Also, students do not have costs associated
with commuting to a campus, living expenses, etc.
1
EduColl
The aim of the government is to raise its current gross enrolment ratio to 30% by 2020. India
will have the world’s largest tertiary-age population and second largest graduate talent
pipeline globally by the end of 2020.
However, the existing educational infrastructure is unable to meet the additional capacity.
The e-learning can supplement the conventional model, and bridge the gap to a considerable
extent.
Several programmes under the initiatives such as ‘Digital India’ and ‘Skill India’ to spread
digital literacy, create a knowledge-based society in India, and implement three principles
‘access, equity and quality’ of the Education Policy have been launched. This will be helpful
in transforming our nation and creating opportunities for all citizens by harnessing digital
technologies
In order to establish digital infrastructure, the government has also launched National Optical
Fibre Network (NOFN) which aims to expand broadband connectivity and faster network.
The Indian job scenario is currently reeling under the twin pressure of layoffs and job
paucity, especially due to automation and slow-down in the global economy. According to a
World Bank report, automation is threatening 69% of jobs in India. There have been massive
layoffs in IT, BFSI, Telecom and Manufacturing sectors, and people are being replaced by
technology driven by machine learning and artificial intelligence.
Owing to all these factors, both job-seekers and working professionals feel a need to gain,
refresh or enhance skills through career advancement courses, which could increase their
chances of landing better jobs, switch jobs, get promotions, negotiate better pay packages and
stay industry-relevant. Online career courses are affordable, give hands-on knowledge, can be
completed in one-fourth time that of an offline course, and offer flexibility in terms of
personal schedule. They can be done anywhere, anytime at one’s convenience.
1
EduColl
EVOLUTION
Open and distance learning in India dates back to the 1960s. By the 1980s there were 34
universities offering correspondence education through departments designed for that
purpose. The first single mode Open University was established in Andhra Pradesh in 1982,
followed by the Indira Gandhi National Open University (IGNOU), and subsequently in
Bihar, Rajasthan, and Maharashtra, Madhya Pradesh, Gujarat, Karnataka, West Bengal, and
Uttar Pradesh (established throughout 1980s and 1990s).
E-Learning 2.0.
Online Education
E-Learning 1.0.
Web-Based Training
1
EduColl
Today, there are enormous number of e-learning platforms in India itself, providing a wide
variety of learning experiences and products ranging from personalised learning, course-
work, practice papers, entrance exam preparations, career counselling to parent connect
facilities. The list is endless and so is its evolution and competition. These platforms are in
the form of websites and apps. Some of these are shown in the table below.
TABLE 2.1
VARIOUS E-LEARNING PLATFORMS
1
EduColl
CHEGG
Chegg, Inc., known as Chegg, is an American education technology
company based in Santa Clara, California, with over three million
subscribers. It provides digital and physical textbook rentals, online
tutoring, and other student services like Textbook solutions, Expert Q
& A Writing- plagiarism checker, grammar checker, Flashcards, Math
Solver, Tutors, Internships, Test Preparation and scholarships.
1
EduColl
Today, the schools have been shut all across the world due to the COVID-19. Globally, over
1.2 billion children are out of the classroom. Due to which, education has changed
dramatically, with the distinctive rise of e-learning, whereby teaching is undertaken remotely
and on digital platforms. According to a study by Velocity MR, a leading market Research
and Analysis company, 72 per cent Indians prefer online or e-learning as compared to
traditional classroom training. Indian demography is ideal for online learning since many of
the learners come from rural or semi-rural areas where educational facilities, be it school,
college or entrance examination level, is below par.
Despite the presence of multiple e-learning platforms their demand far exceeds their supply.
This is due to the gap in the education technology market since there is still enough room for
innovation and advancement. The project EDUCOLLAB was initiated with this objective.
ABOUT EDUCOLLAB
WORKING/FUNCTIONALITY:
• EduCollab is easy to use and easy to get results with. The basic idea is that a student can
ask a question, answer a question asked by another student and personalise their learning.
Various courses are also available to the students.
• The answers are screen-recorded and hence it serves as an interactive platform for the
students.
• The recommender system deployed helps to organize the data and provide the user with
1
EduColl
the most appropriate content.
2
EduColl
FIGURE 2.2
USE OF AI IN OUR PLATFORM
AI
The platform has a Leader-board which shows the student of the week/month. This
student is chosen using the criteria of maximum coins earned, maximum answers given,
etc.
It has an appropriate coin system and a rating system for the content put up.
Built in Flutter, the app uses backend services with python and flask to access data stored
in mongo DB, the database. Videos are recorded using Youtube and Google helps in
language translation.
FIGURE 2.3
ARCHITECTURE OF THE APPLICATION
2
EduColl
Source: EduCollab Pdf
2
EduColl
TABLE 2.2
AI Engineers
App Stores
and Software
Developers Direct Download
Security
Accessibility
Simplicity
2
EduColl
A marketing strategy refers to a business's overall game plan for reaching prospective
consumers and turning them into customers of the products or services the business provides.
The aim of the platform is to create a hub where the ‘Manmohans’ and the ‘Kalams’ of the
present generation can educate their peers who have limited access to quality education.
Therefore, a well-planned market strategy has been proposed to reach out to the masses and
have a deep penetration of the platform in the entire country.
FIGURE 2.4
MARKET STRATEGY
Product: to provide a platform where the students can indulge in peer to peer
learning under the supervision of their instructors
Price: to charge nominal prices so that all the students can have an access to
unlimited quality education.
Place: to reach every knook and corner of the country through direct
download or using app store or going to the homepage of the website.
People: to provide an team who can meet the interest of the customers in the
best possible manner, be it customer care, after sales service, or anything.
Process: to make the platform user-friendly so that the students can operate
the platform without the help of their parents.
2
EduColl
• The app and the website will be launched in three phases: Sabudh release, Beta
release and the General release. Sabudh and Beta release has already been done and
we are just left with General release. In the Sabudh release, the platform was made
available to the Sabudh students and in the Beta release the platform was launched in:
In the General release, it will be made available to anyone who wants to enjoy the
platform.
The launch has been planned out in these phases so that it is tried and tested at every
phase and by the time it reaches the public, EduCollab is beyond perfect.
• It has also been planned to recruit further schools from the advisors’ networks
REWARD SYSTEM
2
EduColl
Our coin system is based on three basic functionalities – Coins to New User, deducting coins
on asking a question and adding coins on getting an answer approved.
Coins on answering are allocated only after the question has been marked ‘DONE’ by
the student asking the question.
A proposal at hand is that these coins could be redeemable e,g buying starbucks coffee,
fortnite game, etc.
I was assigned the work of project management. As mentioned earlier, I had a team of 5 B.
Tech students for this project. As a project manager, I reported and discussed matters with the
Sabudh mentors and the Tatras team working on EduCollab on a daily basis. We worked on
the various pieces of the platform both at backend and frontend. The major tasks included
profanity detection on text, profanity detection on videos, some inputs on recommender
system.
Profanity Detection on text: this task was taken up to detect the use of cuss words or any
such words, in the questions; answers and comments put by students, which could hurt
the feelings of other students or have negative impact on them. So that appropriate action
can be taken.
Profanity Detection on videos: this was done to detect and take action against the use of
images, signs, words, etc which could hurt the students’ feelings and take the form of
cyberbullying.
2
EduColl
Recommender System: it was worked upon so that the best possible data was directed
towards the user as per his taste.
Comparative analysis of the various e-learning platforms: it included various fields like
their investors (Funding rounds, Funding amount, total number of investors, Number of
lead investors), the products and services they deal in, their cost and revenue streams, etc.
In depth study of Brainly.
tracking and listing out bugs and recommendations as and when the updated link came.
figuring out the basic functionality needed.
listing out the various ways the students can be cyberbullied.
planning out the coin system and creating documentation on it.
proposing various layouts for the homepage of the website.
creating content for the homepage.
suggesting logo design.
communicating the issues and recommendations given by the students after the Sabudh
release of the app.
Conducting survey to gauge the students’ perception towards EduCollab.
For the communication purposes with the Tatras Team, I was made a member of the Slack
and also of Jira, the Scrum-board for project management where, to post epics and stories of
the tasks. I also used Trello board for managing project with my B- Tech team.
CURRENT STATUS
It has been released at Sabudh, where students are using it every-day to go through the daily
lecture sessions plus for clearing each other’s doubts by asking and answering questions.
It has become all the more useful since the students can carry out their regular studies in this
time of lockdown.
Students also report the bugs they come across, if any and recommendations they have for
feature enhancements. All these reports were tracked by me and were forwarded to the Tatras
Team.
Soon there will a beta release of the app, for the further testing before the general release
2
EduColl
2
EduColl
2
EduColl
RATIONALE
Today, there are enormous number of e-learning platforms in India itself, providing a wide
variety of learning experiences and products ranging from personalised learning, course-
work, practice papers, exam preparation, career counselling to parent connect facilities. The
list is endless and so is its evolution and competition. Despite this, there seems to be a gap in
the education technology market. This is so because there is still enough room for innovation
and advancement.
It has been observed that students who teach in peer to peer sessions in schools themselves
learn better. It helps to fill the gap in their own understanding of the topic. Quora is widely
used by adults for peer to peer learning but it is an open system hence an unsafe environment
for students.
Unfortunately, the advancement in technology has exposed people to serious risk of abuse
online. Not all people on the Internet are interested in participating nicely, some perceive it as
an avenue to vent their rage, insecurity, and prejudices. As a result, the cases of cyber-
bullying are an area of concern. Each case of cyberbullying has the potential to cause damage
and their severity and impact are highly dependent upon the vulnerability of individual
victims. Bullying, no matter whether it is traditional bullying or cyberbullying, causes
significant emotional and psychological distress, leading to anxiety, fear, depression, and low
self-esteem.
As high as 37% of parents in India report that their children are subject to cyberbullying.
(Source) In fact, just like any other victim of bullying, cyberbullied kids experience anxiety,
fear, depression, and low self-esteem leading to significant emotional and psychological
distress.
Taking a cue from this, a need was felt to create a platform which not only promotes
peer to peer learning but also protects the students from the crimes/signs of
cyberbullying and inappropriate content. The project EDUCOLLAB was initiated with
this objective.
The need for a safe platform has increased all the more amidst this pandemic situation due
the fact that the entire world has resorted to online channels whether be it for education or
work.
3
EduColl
OBJECTIVES:
The main purpose of the present study is to reduce the gap in the education technology
market (in India). The objectives to be achieved are:
1. To develop a model to protect the students from cyberbullying and inappropriate content.
2. To create a Recommender System that provides the students with the questions and
answers of their preference.
3. To understand the perception of the students towards EduCollab after its Beta release.
1. Finding out the right source of data for both profanity on text and images.
3. Checking whether the data we collected for images was biased or not.
3
EduColl
LITERATURE SUPPORT
Review of Literature is the backbone of every research study. It is important to review the
literature to have an overview of what kinds of studies have been conducted and what are the
gaps in the literature. Here, an attempt has been made has been made to present an evaluative
report of studies found in the literature related to researchers selected area on evolution and
growth of E-learning and the upcoming trends therein. This review will be helpful in
identifying the research gaps, underlying the need of the present study.
To develop a model to protect the students from cyberbullying and inappropriate content.
To create a Recommender System that provides the students with the questions and
answers of their preference.
To understand the perception of the students towards EduCollab.
O'Donnell & King (1999) claimed that peer-learning strategies are valuable tools.
They argued that the outcomes of peer learning ultimately depended on learning design
strategy, course outcomes or objectives, teachers’ facilitating skills, and the commitment of
students and teachers. Importantly, the teacher must consciously orchestrate the learning
activity and choose the appropriate method for undertaking peer learning. Only then will
students in fact engage in peer learning and reap the benefits.
Söderlund (2000) pointed out that an important supporting structure for learners is the
social interaction with other learners, in which they are able to form and give expression for
their thoughts, exchange ideas and share these with others, and jointly reflect on various
phenomena. This in turn establishes a ground for processes within the individual learner, and
deepens the understanding of the learning process. In their learning processes, learners use
different resources that are only partly created or offered by the teacher. Learners also use
resources available in their close environment, at work or at home. To this one can add the
ever-increasing use of computers and different communication technologies as yet other
learning resources.
3
EduColl
Boud (2001) claimed that students in peer learning situation construct their own
meaning and understanding of what they need to learn. Essentially, students would be
involved in searching for, collecting, analysing, and evaluating, integrating and applying
information to complete an assignment or solve a problem. Thus, engaging themselves
intellectually, emotionally and socially in “constructive conversation” and learning by talking
and questioning each other’s views and reaching consensus or dissent.
Reed et al. (2001) conducted a research focused on the frequency and scheduling of
rewards and their relation to motivation and results. Rewards may be given in a continuous
fashion (on a pre-determined schedule) or on a variable schedule (Skinner, 1938). The
frequency with which rewards are given has been found in a number of studies to affect the
probability that desired behaviour would be repeated.
Keller Christina & Cernerud Lars (2002) conducted a study with students at
Jönköping University in 55 Sweden as an example. The students had experiences from two
years of e‐learning on campus. Students (n = 150) filled in a questionnaire with closed as well
as open‐ended questions. The answers were analysed in a multiple regression analysis,
putting the students’ perceptions in relation to gender, age, previous knowledge of computers,
attitudes to new technology, learning styles and the way of implementing e‐learning at the
university. Advantages and disadvantages of e‐leaming were categorized in a qualitative
content analysis. The main conclusion from the study was that the strategy of implementing
the e‐leaming system at the university was more important in influencing students’
perceptions than the individual background variables. Students did not regard access to e‐
learning on campus as a benefit. Male students, students with previous knowledge of
computers and students with positive attitudes to new technologies were all less positive to e‐
learning on campus than other students. Another aspect that must be considered is that of
gender. It is of great importance especially when luring students to a university.
Johansson (2003) stressed that the conditions for socialisation and learning change
with the introduction of new media technology. Learning and education can also be affected
as students and teachers become dependent on the technology.
3
EduColl
They found the difference in presentation types as well as concentration to have a significant
impact on usage intentions.
Landry et al. (2006) made use of TAM (Technology Acceptance Model) to measure
student's acceptance of web-based e-learning tools. In both studies TAM is found to perform
well with the main hypotheses being supported and a total variance in usage intentions
explained with a little less than 40%. The relationship between university students'
perceptions of ease of use and usage of Blackboard elements was fully supported but varied
at different levels. As originally hypothesized by Davis (1989); Landry's et al. (2006)
findings suggest that if students perceive Blackboard to be easy to use, they would also
perceive Blackboard to be useful. Usefulness turned out to be the strongest determinant of
usage intentions.
Robert Agnew (2006) in his “General Strain Theory,” hypothesized that the strain and
stress exerted on an individual as a result of bullying “can manifest itself in problematic
emotions that lead to deviant behavior,” possibly leading to delinquency. This theory stresses
the vicious cycle that many teens may go through while being victimized. The cyclical
repercussions of this process are particularly alarming if it leads a victim to antisocial
behaviors when they try to find an outlet for their emotions.
Manochehr Nick (2007) attempted to compare the effects of e-learning versus those of
traditional instructor-based learning, on student learning, based on student learning styles.
Another goal was to determine if e-learning is more effective for those with a particular
learning style. They examined the dependent variable of student knowledge based on the
learning style of each subject and the learning method to which each was exposed. The
results revealed that for the instructor-based learning class (traditional), the learning style was
irrelevant, but for the web-based learning class (e-learning), the learning style was
significantly
3
EduColl
51 important. The results of this research paper revealed that students’ learning styles were
statistically significant for knowledge performance.
Lim et al. (2008) used questionnaires adapted from the research instruments used by
Poon, Low, and Yong (2004) to measure distance learners’ acceptance of e-learning. They
measured learners’ acceptance by students’ characteristics, instructors’ characteristics,
technology support and system, institutional support, course content and knowledge
management, and online tasks and discussion groups. They highlighted that well-designed
course content provided students with better learning experiences and helped students with
easily information access. In their study, the results indicated that students had moderate level
of e-learning acceptance for the factor of technology and system. It was also stated that that
an e-learning system or a web-page with harmonious configuration of colour and background
enhanced students’ interest to study. Attractive combination of colors with appropriate
graphics and animations on web sites were useful in delivering information in a user-friendly
way.
Roberts (2008) stated that peer learning can lead to development of self-directed
learning skills; critical and creative thinking and problem-solving skills; communication,
interpersonal and teamwork skills; learning through self, peer assessment and critical
reflection; and increased understanding of concepts, skills and enhancing self-image.
Banerjee, J. & Bose, I. (2011) have conducted a study entitled as “Higher Education
Through Mobile Learning: An Analysis of Students From Kolkata”. The objectives of the
study were to find the percentage of respondents who were interested in M-learning mode of
management education and the reasons for preference of mobile based education compared to
traditional method. It was found that 80% of the respondents were aware of the M-learning
platform and that 78% of the respondents were willing to opt for M-learning courses. Also,
56% of the respondents were willing to take management courses on M-learning mode, if
offered. Findings revealed that awareness level regarding the M-learning and number of
people willing to take courses through M-learning mode were quite high. Indicating there is
the tendency of high deviation regarding choice of course through M-learning and that Mean
of preference of management course if offered through M-learning is above average.
Kakoty Sangeeta et al. (2011) analysed the current e-learning and recent market of e-
learning procedure. This study shows that globalization of education, cross-culture aspects
and culturally complex student support system in distance education as well as in e-learning
environment is a prospective research area. Improvements in these areas could be made by
3
EduColl
integrating new technologies and ICT tools. The ELAM (E-learning Acceptance Model)
identifies four determinants of e-learning acceptance are – (1) Performance expectancy, (2)
Effort expectancy, (3) Social influence and (4) facilitating conditions. The main contribution
of the paper is that it presents a framework to understand e-learning acceptance as governed
by the teacher, students and institutional factors.
Rueckert, D. Kim, J.D. & Seo, D. (2013) have conducted a Study entitled as
“Students’ Perceptions and Experiences of Mobile Learning” with the objective to find out as
to how students perceive the use of mobile devices to create a personalized learning
experience outside the classroom. The findings of this study suggested that mobile
technologies have the potential to provide new learning experiences. The fact that the
students’ TACI scores dropped significantly after participating in these activities indicates
that the use of mobile technologies in these classes opens up new avenues for interaction and
learning. The participants became more willing to adopt new technologies into their own
lives, which revolve around teaching English as a profession. The t-test results indicated
statistically significant changes in their views towards mobile technology.
3
EduColl
Konwar (2017) found that college students have positive attitudes towards e- learning
and there is no significant difference in attitude towards e- learning between male and female
students on the basis of their locality.
3
EduColl
CHAPTER III
RESEARCH METHODOLOGY
3
EduColl
CHAPTER 3
RESEARCH METHODOLOGY
The main purpose of the present study is to reduce the gap in the education technology
market (in India).
Based on the conceptual and theoretical framework, this chapter focuses on describing the
database, data collection, data pre-processing, modelling and the detailed elucidation of the
Statistical and Data Science tools used in the present study. The research framework to
achieve the objectives using different research techniques and approaches have been
discussed below.
This is the most important objective of the present study and catering to it gives an edge to
our platform. With the advancement of technology, the cases of cyberbullying have been on
the rise. As mentioned earlier, cyberbullying causes significant emotional and psychological
distress among students hence the students should be provided with a platform where
appropriate measures of profanity detection are taken up. Profanity here means use of any
language, act or anything that can hurt the sentiments of the students.
To fulfil this objective, our platform uses Artificial Intelligence (AI). Profanity Detection
Task is divided into two sub-tasks profanity detection on text and profanity detection on
videos.
FIGURE 3.1
Profanity Detection
Text Videos
3
EduColl
This task was taken up to detect the use of cuss words or any such words, in the questions,
answers and comments put by students, which could hurt the feelings of other students or
have negative impact on them so that appropriate action can be taken. This appropriate action
is in the form of not showing the word or the sentence of any such nature on the screen and
warning the student for such an act. If the student gives no consideration to the warning and
continues putting in such words, then reducing his coins and finally banning him from using
the account for a certain time. Now, this is where toxic text classification comes to the rescue.
FIGURE 3.2
Process
will not be displayed on the screen
students usingcomment
Non-toxic such words would be warned
Toxicand if the action persists they can be banned from using their account
comment
LANGUAGE USED
used for developing desktop GUI applications, websites and web applications. It is open
source, which means it is free to use, even for commercial applications and it is considered a
scripting language. For our study, Python 3 has been used.
4
EduColl
TECHNIQUE USED
Since the objective of profanity detection on text can be fulfilled only when the machine is
able to derive meaning underlying the words as well as sentiments of the students writing
such words or sentences hence the technique of NLP has been used.
Natural Language Processing or NLP is a field of Artificial Intelligence that gives the
machines the ability to read, understand and derive meaning from human languages. By
utilizing NLP, developers can organize and structure knowledge to perform tasks such as
automatic summarization, translation, named entity recognition, relationship
extraction, sentiment analysis, speech recognition, and topic segmentation.
Data generated from conversations, declarations or even tweets are examples of unstructured
data. Unstructured data doesn’t fit neatly into the traditional row and column structure of
relational databases, and represent the vast majority of data available in the actual world. It is
messy and hard to manipulate. Nowadays it is no longer about trying to interpret a text or
speech based on its keywords (the old-fashioned mechanical way), but about understanding
the meaning behind those words (the cognitive way). This way it is possible to detect figures
of speech like irony, or even perform sentiment analysis. Lopez Diego (2019).
The various use cases of NLP are: recognition and prediction of disease, sentiment analysis,
financial trading, identifying fake news, talent recruitment, etc.
In the present task NLP has been used to distinguish between positive and negative words
and sentences as well as the context in which they have been used. It also helps in finding out
the sentiments underlying those words and sentences.
TEXT CLASSIFICATION
Text classification (also known as text categorization or text tagging) is one of the
fundamental tasks in Natural Language Processing (NLP) with broad applications such as
sentiment analysis, topic labelling, spam detection, and intent detection. It is the process of
assigning tags or categories to text according to its content. Text classifiers can be used to
organize, structure, and categorize pretty much anything.
4
EduColl
LIBRARIES USED
FIGURE 3.3
LIBRARIES USED
NumPy: NumPy is a module for Python. The name is an acronym for "Numeric Python" or
"Numerical Python". It has been used for scientific computing and for performing different
operations. NumPy enriches the programming language Python with powerful data structures,
implementing multi-dimensional arrays and matrices.
Pandas: “Python Data Analysis Library ” is a fast, powerful, flexible and easy to use open
source data analysis and manipulation tool, built on top of the Python programming language.
It is has been used for machine learning in form of data-frames and for importing data of
various file formats such as csv, excel etc.
Pickle: Python pickle module is used for serializing and de-serializing a Python object
structure. This process is also called marshalling or flattening. Objects are pickled so that
they can be saved on disk, and loaded in a program again later on.
Sklearn: The sklearn library contains a lot of efficient tools for machine learning and
statistical modelling including classification, regression, clustering and dimensionality
reduction. Therefore, sklearn has been used to build machine learning models.
NLTK: The Natural Language Toolkit (NLTK) is one of the most powerful NLP libraries
which contains packages to make machines understand human language and reply to it with
an appropriate response. This is one of the most usable and mother of all NLP libraries. It
contains text processing libraries and hence is has been used to perform tokenization, parsing,
classification, stemming, tagging and semantic reasoning. NLTK was preferred over spacy
and textblob which are other NLP libraries, since it showed better results.
4
EduColl
This task was catered to in five phases: data collection, data pre-processing, feature extraction,
modelling and evaluation.
FIGURE 3.4
PHASES
1. DATA COLLECTION:
Data collection is the first phase of the profanity task. Here, data is in the form of comments
and has been collected from various sources like Google, Kaggle competitions, etc. It has
been collected manually and also through web-scaping. It is present in csv (comma separated
value) format. The total comments being 1,58,000 in number. It consists of both toxic and
non-toxic comments where, at the beginning toxic comments were 16,000 and non-toxic were
1,42,000 in number. And later on, changed due to data imbalance. The toxic comments could
be placed under any one or more of the six labels: hate, insult, obscene, threat, severe toxic
and toxic.
FIGURE 3.5
Source: Kaggle
4
EduColl
We wanted to focus more on detecting toxicity rather than identifying the sub-class of
toxicity. Therefore, we converted our problem into binary classification by merging all the
comments which had at least 1 of the 6 tags into toxic class and comments which had no tags
were added to the non-toxic class. After this non-toxic class had almost 1.5 lakh comments
and the toxic class had 16 thousand comments.
So, basically it was a multi-classification problem which was converted into binary
classification problem by merging all the 6 labels into 1, in an attempt to reduce data
imbalance. Therefore, the two classes that emerge are toxic and non-toxic.
FIGURE 3.6
DEPICTION OF DATA
Source: Kaggle
2. DATA PRE-PROCESSING:
In this phase the data is cleaned so that the actual target is easily achieved. Data pre-
processing is a data mining technique which is used to transform the raw data in a useful and
efficient format. Raw data (real world data) is always incomplete and that data cannot be sent
through a model. That would cause certain errors. That is why we need to pre-process data
before sending through a model.
4
EduColl
In any Machine Learning process, Data Pre-processing is that step in which the data gets
transformed, or Encoded, to bring it to such a state that now the machine can easily parse it.
In other words, the features of the data can now be easily interpreted by the algorithm.
1. Data Cleaning: Data is cleansed through processes such as filling in missing values,
smoothing the noisy data, or resolving the inconsistencies in the data.
2. Data Integration: Data with different representations are put together and conflicts
within the data are resolved.
4. Data Reduction: This step aims to present a reduced representation of the data in a
data warehouse.
Data pre-processing varies according to the data at our disposal and need of the study. And
so, for pre-processing of the comments data that we had, we only had to perform the
following:
4
EduColl
3. FEATURE EXTRACTION:
When the input data to an algorithm is too large to be processed and it is suspected to be
redundant, then it can be transformed into a reduced set of features (also named a feature
vector). Feature Extraction is a process of dimensionality reduction by which an initial set of
raw data is reduced to more manageable groups for processing. These new reduced set of
features should then be able to summarize most of the information contained in the original
set of features. For the purpose of my study I have used the TF-IDF technique of feature
extraction.
DF measures the importance of document in whole set of corpus, this is very similar to TF.
The only difference is that TF is frequency counter for a term t in document d, whereas DF is
the count of occurrences of term t in the document set N. In other words, DF is the number of
documents in which the word is present
4. MODELING:
Data modeling is an essential part of the data science pipeline. It is the one that often receives
the most attention among data science learners. A big part of data science modeling involves
evaluating a model, that is, making sure that it is robust and therefore reliable. Also, it is
closely linked to creating an information rich feature set. Moreover, it entails a variety of
other processes that ensure that the data at hand is harnessed as much as possible.
4
EduColl
several
4
EduColl
types of predictive analytics models. It puts data in categories based on what it learns from
historical data. Also, sometimes called a Decision Tree, classification is one of several
methods intended to make the analysis of very large datasets effective. Two major
Classification techniques that stand out are Logistic Regression and Discriminant Analysis
Logistic regression sounds similar to linear regression but is actually focused on problems
involving categorization instead of quantitative forecasting. In other words, the goal of
logistic regression is to categorize whether an instance of an input variable either fits within a
category or not. The output of logistic regression is a value between 0 and 1. Hence, the
output variable values are discrete and finite rather than continuous and with infinite values
as in case of linear regression.
Results closer to 1 indicate that the input variable more clearly fits within the category.
Results closer to 0 indicate that the input variable likely does not fit within the category.
Therefore, Logistic regression is often used to answer clearly defined yes or no questions. For
Example, will a customer buy again. To meet our objective, we have used Logistic
Regression to categorize whether a comment fits within toxic and non-toxic comments
category.
5. EVALUATION:
When we get the data, after data cleaning, pre-processing and wrangling, we feed it to an
outstanding model and get output in probabilities. Then we measure the effectiveness of our
model because better the effectiveness, better the performance and that’s exactly what we
want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a
performance measurement for machine learning classification.
A confusion matrix is a table that is often used to describe the performance or accuracy of a
classification model (or “classifier”) on a set of test data for which the true values are known.
It is also known as an error matrix.
4
EduColl
TABLE 3.1
CONFUSION MATRIX
TABLE 3.2
DEFINITION OF TERMS
Positive (P) Observation is positive (for example: is an app)
Negative (N) Observation is not positive (for example: is not an app)
True Positive (TP) Observation is positive, and is predicted to be positive
False Negative (FN) Observation is positive, but is predicted negative
True Negative (TN) Observation is negative, and is predicted to be negative
False Positive (FP) Observation is negative, but is predicted positive
However, there are problems with accuracy. It assumes equal costs for both kinds of errors. A
99% accuracy can be excellent, good, mediocre, poor or terrible depending upon the problem.
Recall: Recall can be defined as the ratio of the total number of correctly classified positive
examples divide to the total number of positive examples. In other words, out of all the
positive classes, how much we predicted correctly. It should be high as possible.
High Recall indicates the class is correctly recognized (a small number of FN).
4
EduColl
Precision: To get the value of precision we divide the total number of correctly classified
positive examples by the total number of predicted positive examples. In other words, out of
all the positive classes we have predicted correctly, how many are actually positive.
High Precision indicates an example labelled as positive is indeed positive (a small number of
FP).
High recall, low precision: This means that most of the positive examples are correctly
recognized (low FN) but there are a lot of false positives.
Low recall, high precision: This shows that we miss a lot of positive examples (high FN) but
those we predict as positive are indeed positive (low FP)
F-measure: Since we have two measures (Precision and Recall) it helps to have a
measurement that represents both of them. We calculate an F-measure using Harmonic Mean
in place of Arithmetic Mean as it punishes the extreme values
more. The F-Measure will always be nearer to the smaller value of Precision or Recall.
To give an edge to our platform, profanity detection task is not performed only on text but
also on the video content. Since the answers on EduCollab are mostly in the form of screen
recording there is a high chance of use of inappropriate and profane content on the platform
hence the profanity detection task becomes all the more important. The task is a big one and
requires a lot of research and study, since a lot of work has not already been done in this field.
To work on this, this task was subdivided into various sub-tasks. The first sub-task being
profanity detection on images. During the duration of my internship, we were able to work
only on this sub-task.
5
EduColl
For profanity detection on images, various image layers were framed to be tackled one by one.
FIGURE 3.7
Nudity
Explicit images of surgery, diseases, or body parts such as graphic photographs of open w
Racism/Meme
Blurriness
Drugs
LANGUAGE
USED PYTHON 3
TECHNIQUE USED
5
EduColl
1. Convolution
2. Non-Linearity (ReLU)
LIBRARIES USED
FIGURE 3.8
LIBRARIES USED
Os: It is a library in python that provides functions for interacting with the operating system. It
comes under Python’s standard utility modules. This module provides a portable way of
using operating system dependent functionality. In other words, it allows you to interface
with the underlying operating system that Python is running on – be that Windows, Mac or
Linux.
Matplotlib: Matplotlib is a plotting library used for creating static, animated, and interactive
visualizations in Python. It is an amazing visualization library in Python for 2D plots of
arrays. One of the greatest benefits of visualization is that it allows us visual access to huge
amounts of data in easily digestible visuals. Matplotlib consists of several plots like line, bar,
scatter, histogram etc.
Like the text task, this was performed in three phases: data collection, data pre-
processing and modelling.
FIGURE 3.9
1. DATA COLLECTION: Data was collected in the form of labelled images as per the
image layer selected. A sample of 450 images of each category were taken up in case of each
layer. The dataset consisted of safe for work images as well as not safe for work images. The
images were collected manually from google.
DATA PRE-PROCESSING: For data pre-processing of images data, image re-sizing was
done to convert them into suitable and identical sizes.
MODELING: Resnet50 Model of CNN has been used for this task. Resnet, short for
Residual Networks is a classic neural network used as a backbone for many computer vision
tasks.
ResNet-50 is a deep residual network. The “50” refers to the number of layers it has. It’s a
subclass of convolutional neural networks, with ResNet most popularly used for image
classification.
5
EduColl
Analytics and Artificial Intelligence (AI) are being integrated with digital channels to provide
instant recommendations on products that will best suit the customer. Many organizations
have already taken the lead and set awe-inspiring benchmarks, like recommendations
provided by Netflix and Amazon based on the customer’s viewing and purchase/browsing
history.
Therefore, this objective of creating a recommender system was taken up so as to provide the
students with the learning material, questions and answers that best suit their needs. And also,
to direct the questions asked, to those students most likely to give the best and appropriate
answers. Recognising the fact that retention (defined here as a visit after the first visit) is a
huge issue for apps, there is a need to make an impact on the first visit itself. But unlike, other
applications, no prior information is available/gathered about the user. All the
recommendations are based on the actions of the user while using the app.
1. Content based Filtering –in this the user consumption is tracked and items similar to
those consumed in the past are commended.
2. Collaborative Filtering – in this the user consumption all users are tracked and the items
consumed by users who have a similar consumption pattern to the user of interest are
recommended. There is no need for item descriptions in this case.
5
EduColl
Figure 3.10
Recommender system
Scree Item set
n
interfa
user
User’s item
consumption history
BUILD STRATEGY
● Reduce bias in data collection (Example Bias: topics that get served often and ranked
higher, have a higher likelihood of being consumed (obtaining a clickthrough))
● Exploit the current user profile but still explore new user interests
To fulfil this objective, the task was divided into two parts called Bots.
BOT 1: This bot selects topics to serve a user. Inputs to the bot is the corpus of topics and a
user profile if available. It provides the new user with randomly selected topics such that
(her)his screen shows different topics and then his actions are gauged.
BOT 2: Once the user starts consuming topics, (s)he leaves behind a clickstream. This bot
extracts user interests from such data that can then be used for further personalisation for
(her)his news feed.
5
EduColl
LANGUAGE
USED PYTHON 3
TECHNIQUE
Dimensionality Reduction
Unsupervised Learning
Tagging
Topic Models are very useful for the purpose for document clustering, organizing large
blocks of textual data, information retrieval from unstructured text and feature selection. For
Example
– New York Times are using topic models to boost their user – article recommendation engines.
There are several existing algorithms that can be used to perform the topic modelling. The
most common of it are, Latent Semantic Analysis (LSA/LSI), Probabilistic Latent
Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA)
LIBRARIES USED
FIGURE 3.11
LIBRARIES USED
5
EduColl
Seaborn: It is a library for making statistical graphics in Python. Built on top of matplotlib
and closely integrated with pandas data structures, it aims to make visualization a central part
of exploring and understanding data. Seaborn provides a high-level interface for drawing
attractive and informative statistical graphics.
String: It’s a built-in module and we have to import it before using any of the constants and
classes. The string module contains a number of useful constants and classes, as well as some
deprecated legacy functions that are also available as methods on strings.
FIGURE 3.12
DATA COLLECTION: the data required for this task was in the form of questions and
answers. Therefore, a corpus of questions and answers was created to train and test the
model. The data was created using data from google and also by manually adding them. It
took around 2 weeks to collect the data.
5
EduColl
Punctuation removal.
MODELLING: LDA (Latent Dirichlet Allocation) model has been used in this particular
task to perform the topic modelling.
LDA is a generative probabilistic model that assumes each topic is a mixture over an
underlying set of words, and each document is a mixture of over a set of topic probabilities.
In other words, it is a topic model that generates topics based on word frequency from a set of
documents. LDA is particularly useful for finding reasonably accurate mixtures of topics
within a given document set.
PARAMETERS OF LDA
Beta parameter is the same prior concentration parameter that represents topic-word density
— with high beta, topics are assumed to made of up most of the words and result in a more
specific word distribution per topic.
5
EduColl
The beta launch of the platform took place on April 27, 2020 amid the pandemic situation.
This is the time when the schools are shut all over the world and the learning is taking place
digitally. Hence, the platform needed to be perfect to fulfil all the requirements of the
students. And the only way we could do this was by analysing the perception of the beta users
and making appropriate modifications.
Taking all these points into consideration, a study was conducted to understand the
perception of the students. To achieve this objective, the research technique and approach
used are mentioned below:
DATABASE: The present objective is primarily based on the primary data collected from
103 respondents (Sabudh Students and associates). The respondents were interviewed
through a non-disguised structured questionnaire in English language.
UNIVERSE OF STUDY: The universe of study is all the school students of age group 12
and above.
SAMPLE AND SAMPLING DESIGN: The present study focuses on the Sabudh students.
Survey method was used for the collection of data. It was conducted during the period of
April to May, 2020. Convenience sampling technique has been used. The questionnaire was
prepared and shared online with the respondents consisting of 42 questions. Out of 120
questionnaires distributed, 103 usable responses were used for the analysis purpose.
TABLE 3.3
5
EduColl
DATA COLLECTION: Both primary data and secondary data was collected for the
completion of this objective. The respondents were shared the questionnaire through the
social media platforms like Whatsapp, Instagram, etc.
First version of questionnaire was pre-tested on 20 students. There were slight modifications
in the content of the questionnaire which were later modified before the actual data collection
process took place. The responses collected in the pretesting of questionnaire were not
considered for the analysis of the data. The description of the questionnaire is given in the
following paragraphs:
PART 1
This part of the questionnaire consisted of personal information about the respondents
regarding their name, gender, mobile model and make, android version, connectivity issues
faced by them, their proficiency in using the e-learning platforms, etc.
PART 2
This part of the questionnaire consisted of questions to gauge the perception of the students
about EduCollab. It consisted of questions like whether EduCollab is user friendly,
interactive, attractive, useful to the students, etc. The responses were based on 5-point Likert-
scale, where 1 = strongly agree, 2 = agree, 3 = neutral, 4 = disagree, 5 = strongly disagree.
After collecting the data using questionnaires, it was analysed by transferring it to SPSS
software. The first step in the analysis involved ensuring that all questions were answered and
rectifying any missing or wrong entries. This was done to ensure accuracy of the data and to
make it more appropriate for analysis.
Data collected from the survey was analysed using different statistical, mathematical and data
interpretation techniques. The analysis techniques used in this study are:
6
EduColl
Reliability analysis: Reliability analysis is used to measure the validity and reliability
of data to obtain high-quality research result. Reliability means that a measure (or in
this case questionnaire) should consistently reflect the construct that it is measuring.
In statistical terms, the usual way to look at reliability is based on the idea that
individual items (or set of items) should produce results consistent with overall
questionnaire. In this study, Cronbach’s Alpha, a reliability analysis test, is
conducted within SPSS Software to measure the internal consistency of items in the
questionnaire. This test is most commonly used when the questionnaire is developed
using multiple Likert-scale statements. It is used to determine whether the scale is
reliable or not. Cronbach’s Alpha reliability coefficient normally ranges between 0
and 1. The closer coefficient value of Cronbach’s alpha is to 1 means greater the
internal consistency of the items in the scale. According to general decision of
reliability test; if the value of Cronbach’s Alpha is greater than 0.7 then the
questionnaire items are dictated reliable. If the value of Cronbach’s Alpha is less than
0.7 then questionnaire items are considered as unreliable.
George and Mallery (2003) provide the following rules of thumb to analyse reliability of
questionnaire with the help of Cronbach’s Alpha reliability coefficient value:
6
EduColl
According to this rule, if value of Cronbach’s Alpha is less than 0.5 then the data is unreliable
& unacceptable for further analysis, if the value lies in 0.5 – 0.7 then the analyst can take
decision to accept the data or to modify it for making it more reliable. If the value is greater
than 0.7 then the data is acceptable. The value of Cronbach’s Alpha greater than 0.8 indicates
the data is good and Cronbach’s Alpha greater than 0.9 indicates the data is excellent for
further analysis.
Factor Analysis: In this research factor analysis is used to analyse 21 statements related to
students’ perception towards EduCollab.
Factor Analysis: it is a technique that is used to reduce a large number of variables into
fewer numbers of factors. It extracts maximum common variance from all variables and puts
them into a common score. As an index of all variables, this score can be used for further
analysis. In other words, Factor analysis is a way to take a mass of data and shrinking it to a
smaller data set that is more manageable and more understandable. It’s a way to find hidden
patterns, show how those patterns overlap and show what characteristics are seen in multiple
patterns. It groups variables with similar characteristics together. It is used to allocate
correlated statements into single factor. The reduced factors can also be used for further
analysis.
Metric data: Factor analysis is conducted on metric scale. The term metric scale
summarizes interval scales, ratio scales and absolute scales. In other words, each
question is a statement followed by a Likert scale.
Sample size: The sample size should be adequate to run the factor analysis on the
data. The reliability of factor analysis is dependent on sample size because correlation
coefficients fluctuate much more so in small samples than in large. Much has been
written about the necessary sample size for factor analysis resulting in many ‘rules of
thumb’. The common rule is to suggest that a researcher has at least 10–15
participants per variable.
6
EduColl
In SPSS, The KMO (Kaiser-Meyer-Olkin) measures the sampling adequacy. The KMO
can be calculated for individual and multiple variables and represents the ratio of the
squared correlation between variables to the squared partial correlation between variables.
The KMO statistic varies between 0 and 1. There is universal agreement that factor
analysis is inappropriate when value of KMO is below 0.50. Kaisen (1974) recommend
accepting values greater than 0.5 as barely acceptable (values below this should lead to
either collect more data or review other variables to include). Furthermore, KMO statistic
values between
0.5 and 0.7 are mediocre, values between 0.7 and 0.8 are good, values between 0.8 and
0.9 are great and values above 0.9 are superb (Hutcheson & Sofroniou, 1999).
Correlations between variables: SPSS will always find a factor solution to a set of
variables. However, the solution is unlikely to have any real meaning if the variables
analysed are not sensible. So before conducting a factor analysis, it is important to
check the inter-correlation between variables.
In Factor Analysis, Bartlett’s test is used to check that the original variables are sufficiently
correlated. This tests the null hypothesis that there is no correlation between variables or the
correlation matrix is an identity matrix. An identity matrix is matrix in which all of the
diagonal elements are 1 and all off diagonal elements are 0. This test should come out
significant by rejecting this null hypothesis or if p < 0.05 means that there is a correlation
between variables and the correlation matrix is not an identity matrix. If significant value is
not less than 0.05 than factor analysis will not be appropriate.
6
EduColl
CHAPTER IV
ANALYSIS AND RESULT REPORTING
6
EduColl
CHAPTER 4
FIGURE 4.1
Data has been collected in the form of comments (1.58 lakh comments). For the purpose
of our study, these comments can be classified into two categories: toxic and non-toxic.
Data has: 16,000 Toxic comments and 1.48 lakh Non-toxic comments.
Data imbalance has been ignored in the beginning
If a comment has any one or more of If a comment doesn’t have any one of
the six tags: hate, insult, obscene, the six tags: hate, insult, obscene,
severe-toxic, threat and toxic, it is severe-toxic, threat and toxic, it is
categorised as toxic categorised as non-toxic
6
EduColl
For modelling:
Using TF-IDF, we converted text into vector.
Then Logistic Regression (which is Sklearn’s
inbuilt model) is used.
Then, the predict function is defined. Using predict function, we send vectors
one by one and it tells whether it belongs to toxic or non-toxic category.
From the above confusion matrix, we can clearly see that a lot of Toxic comments
are being classified as Non-Toxic. One of the reasons for this could be that our
dataset has more than 1.4 lakh samples of Non-toxic Class and only 16K samples of
Toxic Class.
Therefore, we tried to improve our results by decreasing the samples of Non-Toxic
Class from 1.42 lakh to 20,000. These are selected randomly and the model is run
again. The result came out to be:
6
EduColl
Then we further decreased the sample size of Non-Toxic Class to 17,000 random comments. And ran
Our score has improved by 1% but chances of our model classifying a non-toxic comment as tox
6
EduColl
CALCULATIONS
In this study, toxic comments are taken as positive whereas the non-toxic comments are taken
as negative.
FIGURE 4.2
TABLE 4.1
6
EduColl
RESULT: OBJECTIVE
1
FIGURE 4.3
We can clearly see that whenever a comment is put into the system, predict function
tells us whether the comment is toxic or not.
DEPLOYMENT
RESPONSE
7
EduColl
For Building Image Profanity Detection, we used Deep Convolutional Neural Networks. For
this purpose, we provided labelled Profane Images to our CNN Model and Trained our
Model. After Training we got a Test Accuracy of 84 percent. The steps are as followed:
FIGURE 4.4
ed Images from classes such as Nude Images, Weapon Images, Images of Alcohol and other alcoholic beverages, etc and Im
Libraries were imported and data were read into the system.
Collected Pre-trained CNN Model weights, to be used for Transfer Learning technique.
Removed the last layer of the Pre-Trained Model (resnet50) that was
being used.
RESULT: Whenever an image is put into the system, it will identify whether the
image is safe for work or not.
FUTURE SCOPE OF STUDY: Other sub-tasks under profanity detection on videos like
profanity detection on speech, etc shall be taken up.
7
EduColl
OBJECTIVE 2
FIGURE 4.5
RESULT: 10 Random topics were being suggested to the new user. Hence, our BOT 1
was ready.
7
EduColl
FIGURE 4.6
Source: https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-
dirichlet-allocation-lda-35ce4ed6b3e0
7
EduColl
OBJECTIVE 3
GENERAL ANALYSIS OF DEMOGRAPHIC STATUS: In order to know about the
perception of students towards EduCollab, a sample of 103 respondents was taken. The
respondents are segmented by gender, model and make of their phones, proficiency in using
e- learning platforms, etc. This section involves the analysis of all these demographic
segments in the form of tables and figures.
GENDER
TABLE 4.2
Female 57 55
Male 46 45
Interpretation: Table shows that out of 103 respondents, 57 are females and 46 are males.
FIGURE 4.7
GENDER
Male
45%
Female
55%
Male Female
Interpretation: Figure represents that 55% are females and 45% respondents are males.
7
EduColl
TABLE 4.3
Samsung 14
Samsung Galaxy s9 11
Samsung M 20 9
Samsung J6 5
Samsung J6 Plus 2
Samsung A20 2
Xiaomi Mi A1 12
Nokia 6.1+ 3
Redmi note 4 5
Redmi note 8 5
Redmi 6A , M1804C3CI 1
Real me xt 2
Real me 2 1
Mi A3 2
Google pixel 2 3
7
EduColl
Vivo v5 6
Vivo v9 1
Lenovo 8K plus 1
Iphone 14
Iphone xs 2
Iphone 6 plus 1
FIGURE 4.8
Series 1
16
14
12
10
8
6
4
2
0
Series 1
Interpretation: it can clearly be seen that the respondents have a large variety of model and
make of mobiles. Samsung constituting the major portion followed by iphones, Xiaomi,
Redmi, Vivo, etc
7
EduColl
Number
Interpretation: of and the
the table Percentage
figure
Yes 31%
respondents
show that only 32 respondents (%)
out of 103
Yes
face 32 31
No 71 69
Total 103 100
No 69%
Yes No
internet connectivity issues. These 32 respondents constitute 31% of the total. While the
majority of the respondents constituting 69% that represents 71 respondents, do not face
internet connectivity issues. Probably due to the fact that India is becoming technologically
equipped.
No 25 24
Total 103 100
Yes
76%
Yes No
7
EduColl
majority of respondents i.e. 78 that constitute 76% have used E-learning platforms before
Educollab on the other hand only 25 respondents that constitute 24% have not used any E-
learning platform before EduCollab. This indicates that many people have already resorted to
E-learning.
7
EduColl
Number of Percentage No
18%
respondents (%)
Yes 84 82
No 19 18
Yes
82%
Interpretation: the table and the figure
Yes No
show that out of 103, majority of the
respondents i.e. 84 respondents think that they are proficient in using the E-learning
platforms. They represent 82% of the total respondents. On the other hand, only 19
respondents out of 103, think they are not proficient in using the E-learning platforms. They
represent only 18% of the total respondents. This indicates that most of the people are
confident in using e-learning platforms.
Number of Percentage No
20%
respondents (%)
Yes 82 80
No 21 20
Total 103 100
Yes
Interpretation: the table and the figure show 80%
Yes No
that the majority of the respondents i.e. 82
out of 103 like the concept of peer to peer learning while only 21 out of 103 respondents do
not like the concept of peer to peer learning. Yes category is represented by 80% and no
category is represented by 20% which indicates that most of the people like the concept of
peer to peer learning.
7
EduColl
Interpretation: from of
Number the table and the
Percentage Yes 24%
figure we canrespondents
see that 25 respondents
(%) out
Yes 25 24
No 78 76
Total 103 100
No 76%
Yes No
8
EduColl
of 103 have encountered cyber-bullying. Although the cases are less in number but they do
exist. 78 out 103 have not encountered cyber-bullying. They constitute 76% of the total. This
indicates that the cases of cyber-bullying do exist.
No
Number of Percentage 13%
respondents (%)
Yes 90 87
No 13 13
YesNo
8
EduColl
Yes 22 21
No 81 79
Yes No
8
EduColl
show that out of 103 respondents, the majority i.e. 81 did not encounter any login problem.
Only 22 respondents did, indicating that only a few respondents faced problem while logging
in.
Yes No
Number of Percentage
Yes 33%
respondents (%)
Yes 34 33
No 69 67
No
67%
Total 103 100
Interpretation: the table and the figure show that out of 103 respondents, the majority i.e. 69
did not encounter any bug. Only 34 respondents did. This indicates that majority of
respondents did not encounter bugs. And those of who did face it, faced it probably because
of the fact that it was the first release of the app.
8
EduColl
respondents (%) No
33%
Yes 69 67
No 34 33
Yes
67%
Total 103 100
Interpretation: from the above it can be seen that out of 103 respondents, the majority
constituting 67% informed the team about the bug. Only 33% respondents did not.
No
Number of Percentage 13%
respondents (%)
Yes 90 87
No 13 13
YesNo
Interpretation: it can clearly be seen that most of the respondents i.e. 90% think that the
team was efficient in resolving the issues and only 13% think they were not efficient. This
indicates that the team has been efficient in resolving the issues.
84
EduColl
No 6%
Number of Percentage
respondents (%)
Yes 97 94
No 6 6
YesNo
Yes 100 97
No 3 3
YesNo
Interpretation: it can be seen that out of 103, all the respondents except for 3 think that they
will keep using the platform.
85
EduColl
Yes 83 81
No 20 19
86
EduColl
show that out of 103 respondents, the majority i.e. 83 thinks that EduCollab is better than
other e-learning platforms. Only 20 respondents did not think so. These 20 respondents might
be having a liking towards other e-learning platforms or might be using those for years.
87
EduColl
No
Number of Percentage 10%
respondents (%)
Yes 93 90
No 10 10
Yes No
Number of Percentage
Interpretation: we can see from the above
respondents (%)
No
Yes 59 57 43%
Yes
No 44 43 57%
88
EduColl
Given below are some of the recommendations of the students as future plan of action for
EduCollab:
89
EduColl
RELIABILITY ANALYSIS
In this research for reliability analysis of questionnaire, Cronbach’s alpha test is used to
assess the internal consistency of a questionnaire (or survey) that contains multiple Likert-
type scales and items. Total number of questions or items in the questionnaire are 21 which
are measured on Likert-scale. All these items are responded to on a 5-point Likert-scale,
where 1 = strongly agree, 2 = agree, 3 = neutral, 4 = disagree and 5 = strongly disagree. To
check reliability, Cronbach's Alpha is run in SPSS on 21 statements or items collectively. The
output of the analysis is discussed as follow:
TABLE 4.19
RELIABILITY STATISTICS
.941 21
Interpretation: The above table 4.19 labelled as Reliability Statistics provides the value of
Cronbach’s alpha coefficient. The value of Cronbach’s alpha varies between zero and one.
The value closer to one, means greater the internal consistency of the items of specific
sample. N of items means number of items (statements) that are tested. In this research, N of
items are 21 items which are tested for reliability analysis.
In this study, all 21 items are scaled same on 5-point Likert-scale. So, value of simple
Cronbach's Alpha is reported for final results of reliability analysis.
According to Reliability statistics, Cronbach’s alpha coefficient value (α) = .941, which
shows that questionnaire is very highly reliable. In other words, Cronbach’s alpha is greater
than 0.9 (i.e. 0.941) which indicates excellent reliability Also, it indicates high level of
internal consistency of scale of all items in the questionnaire.
90
EduColl
FACTOR ANANLYSIS
In this research work, the analysis of statements relating to perception of students towards
EduCollab after its Beta release is done with the help of factor analysis in SPSS software.
Factor analysis is used to transforms a set of variables (21 statements) into a new set of
composite variables (factors) that are correlated with each other.
The final result of factor analysis on the perception of students is summarized as follow with
the help of various tables of output of the analysis:
TABLE 4.20
Sig. .000
In the analysis, The Kaiser-Meyer-Olkin Test is used to check the assumption of adequate
sample size and Bartlett’s Test of Sphericity is used to check the correlation between
variables. According to output of KMO and Bartlett's Test (as shown in above table 4.20),
The Kaiser- Meyer-Olkin is 0.908, which is more than 0.5 means sample size is adequate for
factor analysis. According to the recommendation of Hutcheson & Sofroniou (1999), the
KMO value ranging between 0.8-0.9 means that sample size is great. The Bartlett’s Test of
Sphericity is significant if its associated probability is less than 0.05 (i.e. sig. < 0.05). In the
analysis it is actually 0.000,
i.e. the significance level is small enough to reject the null hypothesis. This means that
91
EduColl
correlation matrix is not an identity matrix or there is correlation between each statements of
the analysis. It fulfils both assumptions of factor analysis. Thus, it is concluded that the data
is fit for factor analysis.
TABLE 4.21
COMMUNALITIES
Initial Extraction
S1 1.000 .784
S2 1.000 .836
S3 1.000 .793
S4 1.000 .751
S5 1.000 .845
S6 1.000 .679
S7 1.000 .749
S8 1.000 .652
S9 1.000 .559
S10 1.000 .750
S11 1.000 .633
S12 1.000 .727
S13 1.000 .828
S14 1.000 .702
S15 1.000 .703
S16 1.000 .485
S17 1.000 .512
S18 1.000 .734
S19 1.000 .710
S20 1.000 .784
S21 1.000 .487
Extraction Method:
Principal Component Analysis.
Interpretation: The table 4.21 illustrates the communalities value before and after extraction
of all statements. In this survey, a principal component analysis (PCA) was conducted on the
21 items with orthogonal rotation (Varimax). Principal component analysis works on initial
assumption that all variance is common; therefore, before extraction (initial) all
communalities are 1. The communalities value in Extraction labelled column reflect the
common variance shared by specific items (variables). Like the communality value after
extraction for statement 1 is 0.784 means 78.4% of variance associated with statement 1 is
common or shared variance.
92
EduColl
TABLE 4.22
TOTAL VARIANCE EXPLAINED
Component Initial Eigenvalues Extraction Sums of Squared Rotation Sums of Squared
Loadings Loadings
Total % of Cumulative Total % of Cumulative Total % of Cumulative
Varianc % Varianc % Varianc %
e e e
1 10.417 49.602 49.602 10.417 49.602 49.602 5.584 26.589 26.589
2 1.939 9.233 58.835 1.939 9.233 58.835 3.796 18.078 44.667
3 1.286 6.125 64.960 1.286 6.125 64.960 2.711 12.910 57.577
4 1.061 5.052 70.012 1.061 5.052 70.012 2.611 12.435 70.012
5 .943 4.489 74.500
6 .683 3.255 77.755
7 .583 2.775 80.530
8 .555 2.644 83.174
9 .519 2.473 85.647
10 .415 1.978 87.625
11 .397 1.891 89.516
12 .360 1.715 91.231
13 .341 1.624 92.855
14 .293 1.396 94.250
15 .277 1.320 95.571
16 .212 1.007 96.578
17 .186 .885 97.463
18 .172 .818 98.281
19 .157 .748 99.029
20 .114 .545 99.574
21 .090 .426 100.000
Extraction Method: Principal Component Analysis.
Interpretation: The tale 4.22 shows the Total Variance Explained by various components
(factors) along with their eigenvalues. By default, SPSS uses Kaiser’s criterion of retaining
factors with eigenvalues greater than 1. This analysis constitutes four factors with
Eigenvalues above one (i.e. 10.417 of first, 1.939 of second, 1.286 of third and 1.061 of
fourth factor). It also displays the eigenvalue in terms of the percentage of variance explained
by all factors; where first factor explains 49.602% of the variance, the second 9.233%, the
third 6.125% and the fourth 5.052% so on. The middle part of table columns labelled,
Extraction Sums of Squared Loadings displays the same values as before extraction, except
that the values for the discarded factors are ignored (hence, the table is blank after the fourth
factor). In the final part of the table (labelled Rotation Sums of Squared Loadings), the
eigenvalues of the factors after rotation are displayed. Rotation is used for optimizing the
93
EduColl
factor structure and equalizing the
94
EduColl
relative importance of the four factors. Before rotation, factor 1 accounted for more variance
than the remaining three (i.e. 49.602% as compared to 9.233%, 6.125% and 5.052%), but
after extraction it accounts for only 26.589% of variance, (compared to 18.078%, 12.910%,
12.435% of remaining three factors respectively). All these four factors are cumulatively
explaining 70.012% of the total variance for the entire set of variables.
So, it can be concluded that from 21 statements 4 factors are extracted by SPSS those having
Eigenvalues more than one and these four factors explain the total of 70.012% of the variance
for all statements in the analysis.
TABLE 4.23
ROTATED COMPONENT MATRIXA
VARIABLES Component
1 2 3 4
S5 .858
S4 .813
S20 .770
S13 .709
S14 .702
S12 .691
S15 .650
S10 .700
S7 .690
S8 .646
S6 .628
S11 .616
S21 .603
S9 .570
S2 .901
S3 .800
S1 .719
S18 .764
S19 .757
S16 .650
S17 .555
95
EduColl
Table 4.24
3 S20 Its coin system promotes user participation and learning. .770
7 S15 You are able to get all your doubts cleared. .650
TABLE 4.25
96
EduColl
TABLE 4.26
TABLE 4.27
Interpretation: the above 4 tables define the division of statements in various factors. This
step involves identification of name for all factors on the basis of common meaning
concluded from all statements underlying that factor. On the basis of this, all the factors are
defined with a name. Factor 1 is named as Effective Learning because it involves statements
related to students’ perception on effective learning provided by EduCollab. Factor 2 as
Interactive Learning, it involves statements about students’ perception towards interactive
learning provided on EduCollab. Factor 3 as Attractive Learning, since it covers all the
statements related with how attractive the app is. Factor 4 as Safe Learning because it
involves statements explaining students’ perception towards safety provided by EduCollab.
97
EduColl
CHAPTER V
SUMMARY, CONCLUSION & RECOMM
98
EduColl
SUMMARY
E-learning platforms are bringing a measurable difference in students' engagement and
performance. It is reducing gaps in the delivery of education and giving a new dimension to
the education space.
The E-learning industry in India is a prolific one, witnessing a steady growth rate of 25 per
cent year-on-year and is projected to be a $1.96 billion industry by 2021. With a network of
more than 1.5 million schools and 18,000 higher education institutes, the market for digital
education in India is enormous. Today, digital learning is no longer a luxury but the
implementation of digital tools of learning has become a necessity in schools.
Moreover, the schools have been shut all across the world due to the COVID-19. Globally,
over 1.2 billion children are out of the classroom. Due to which, education has changed
dramatically, with the distinctive rise of e-learning, whereby teaching is undertaken remotely
and on digital platforms.
Today, there are enormous number of e-learning platforms in India itself, providing a wide
variety of learning experiences and products ranging from personalised learning, course-
work, practice papers, entrance exam preparations, career counselling to parent connect
facilities. The list is endless and so is its evolution and competition. Despite this, there
seems to be a gap in the education technology market. This is so because there is still
enough room for innovation and advancement.
So, a need was felt to create a platform which not only promotes peer to peer learning
but also protects the students from the crimes/signs of cyberbullying and inappropriate
content. The project EDUCOLLAB was initiated with this objective.
99
EduColl
The main purpose of the present study is to reduce the gap in the education technology
market (in India). The objectives to be achieved are:
OBJECTIVE 1: The first objective was divided into two parts: profanity detection on text
and profanity detection on videos.
Profanity detection on text was taken up to detect the use of cuss words or any such words, in
the questions; answers and comments put by students, which could hurt the feelings of other
students or have negative impact on them so that appropriate action can be taken. To meet
this objective Text classification technique of NLP was used. Here, an attempt was made to
classify a comment into toxic and non-toxic, using Logistic Regression model. The accuracy
came out to be .88%. Therefore, whenever an image was put into the system, it identified
whether the text was toxic or not.
The model performed better than Naïve Bayes model but there are some others advanced
models that are yet to be explored, as a future scope of study including: LSVM, NB-SVM
and LSTM.
To give an edge to our platform, profanity detection task is not performed only on text but
also on the video content. For the profanity detection on videos, the sub-task of image
detection was taken up. In this, various image layers were identified and passed through the
pre-trained RESNET 50 Model, after removing its last layer. The test accuracy came out
to be 84%.
10
EduColl
Therefore, whenever an image was put into the system, it identified whether the image was
safe for work or not.
As a future scope of study, other sub-tasks under profanity detection on videos like profanity
detection on speech, etc shall be taken up.
For the future scope of study, the BOT 2 shall be taken up which extracts user interests from
the clickstream data that can then be used for further personalisation for (her)his news feed.
OBJECTIVE 3: This objective was taken up to know the students’ perception towards
EduCollab after its Beta release. A sample of 103 respondents was taken for conducting
survey. The survey conducted through the questionnaires which were filled up online by all
respondents. Data was analysed using the Statistical Package for Social Sciences (SPSS).
Descriptive, Factor analysis and Cronbach’s alpha have been used in the analysis.
FINDINGS:
Demographics: in the PART 1 of the questionnaire it was found that out of 103 respondents:
Majority of the respondents had Samsung phone, followed by iphone users, Xiaomi,
Redmi, Vivo, etc.
Majority of the respondents i.e. 69% did not face internet connectivity issues while only
31% faced them. Probably due to the fact that India is becoming technologically
equipped.
Major portion of the respondents constituting 76% had used E-learning platforms before
EduCollab while 24% had not. This indicates that many people have already resorted to
E- learning.
10
EduColl
82% of respondents think that they are proficient in using E-learning platforms whereas
only 18% respondents think they are not proficient in using them, indicating that most of
the people are confident is using e-learning platforms.
80% of respondents like the concept of peer to peer learning whereas only 20% of the
respondents do not like it which indicates that most of the people like the concept of peer
to peer learning.
76% have not encountered cyber-bullying while 24% have encountered cyber-bullying.
This indicates that the cases of cyber-bullying do exist.
Except for 13%, 87% of respondents were satisfied with the platform. This indicates that
we are successful in fulfilling our objective.
Only 22 respondents encountered login problem while 81 respondents did not face any
login problem, indicating that only a few respondents faced problem while logging in.
Only 33% of respondents encountered bug while 67% did not encounter any bug,
indicating that majority of respondents did not encounter bugs. And those of who did face
it, faced it probably because of the fact that it was the first release of the app.
69 respondents informed the team about the bugs but 34 respondents did not inform the
team.
90 respondents think that the team was efficient in resolving the issues while only 13
thinks otherwise. This indicates that the team has been efficient in resolving the issues.
almost all the respondents i.e. 97 thinks that the team worked up to their expectations and
only 6 respondents think that it did not, indicating that the team has done a good job.
all the respondents except for 3 say that they will keep using the platform.
majority i.e. 83 thinks that EduCollab is better than other e-learning platforms. Only 20
respondents did not think so. These 20 respondents might be having a liking towards
other e-learning platforms or might be using those for years.
Various reasons were stated for the previous statement like it provides interactive learning
sessions, screen recording feature is good, peer to peer feature is great, app is more
accessible, easy to use, new concept, etc.
10
EduColl
90% of the respondents would encourage their friends/relatives to use this platform while
remaining 10% would not do so.
57% of respondents would prefer book learning to learning from this platform while the
remaining 43% would prefer learning from this platform to book learning. Although there
is not much difference but people still do prefer face to face learning to digital learning.
Some of the recommendations of the students to improve the app were: to add more
courses, to improve the colour scheme, logo, etc. Most of the recommendations were
directed towards improving the user interface.
In this, 21 statements (items) related with students’ perception towards EduCollab were
analysed in SPSS to identify their factors. First of all, reliability of these 21 statements were
checked with the help of Cronbach’s alpha test in which value of Cronbach’s alpha
coefficient came out to be 0.941. it stated that there was a high level of internal consistency
between all statements which meant that the data is reliable for further analysis. Then, 21
statements were analysed on Data Reduction technique i.e. Factor Analysis. Finally, Principal
Component Analysis (PCA) was conducted on these statements with Varimax rotation
method. In this analysis, four factors were retained by SPSS which had Eigen value greater
than 1. These four factors comprehensively explained the total 70.012% of the variance for
all statements. These factors identified the perception of students towards EduCollab. All the
statements underlying under each factor measured the level of agreement or disagreement of
customers on 5-point scale; where 1-strongly agree, 2-agree, 3-neutral, 4-disagree and 5-
strongly disagree. All these factors were defined with a name on the basis of common
meaning extracted from the statements underlying each factor.
10
EduColl
TABLE 5.1
RANGE of
FACTOR NAME of FACTORS ITEMS % of VARIANCE FACTOR
LOADING
1 Effective Learning 7 26.589% .650 to .858
2 Interactive Learning 7 18.078% .570 to .700
3 Attractive Learning 3 12.910% .719 to .901
4 Safe Learning 4 12.435% .555 to .764
TOTAL 21 70.012% .555 to .901
10
EduColl
CONCLUSION
To conclude it can be said that the educational institutions have always considered
educational apps or digital learning as a supplementary tool and may have had difficulty in
mainstreaming it, mostly due to not having fully understood its efficacy. However, the
current situation has given us a fillip to accelerate the adoption of technology and experiment
with online learning and measure its success. No doubt, digital learning can never replace
teacher-student interface and has various other limitations like lack of broadband or required
structure at home, lack of skills in using it, lack of supervision, concentration, etc and has
many evils like cyber-bullying but it is the only solution today. Keeping all these things in
mind, our project was taken up. Although, initially our aim was to fill the gap in the
educational technology market by popularising the concept of peer to peer learning through a
platform which was safe from the evils of Cyber-bullying but later on it was given impetus by
the pandemic situation.
From the study we can conclude that majority of users of EduCollab liked the app and are
satisfied with it. They think that the team has been efficient and has worked up to their
expectations. They intend on continuing to using it in future and suggesting it to their
relatives as well.
The respondents have provided some recommendations which will be worked upon before
the next release.
10
EduColl
RECOMMENDATIONS
These recommendations can be put as:
In case of profanity detection on videos, other sub-tasks like profanity detection on speech
shall be taken up after the completion of profanity detection on images.
The building up of BOT 2 shall be taken up which extracts user interests from the clickstream
data that can then be used for further personalisation for (her)his news feed in case of
Recommender Systems.
Lastly, the recommendations provided by the respondents/users of the app shall be worked
upon before the next release of the app.
10
EduColl
BIBLIOGRAPHY
10
EduColl
BIBLIOGRAPHY
Algorithmia (2016). Everything you need to know about natural language processing.
Available online at <https://algorithmia.com/blog/introduction-natural-language-processing-
nlp> Accessed on April 22, 2020.
Banerjee, J. & Bose, I. (2011). Higher Education Through Mobile Learning: An Analysis of
Students from Kolkata. Indian Journal of Commerce and Management Studies 2 (1), 123-
134.
Bansal, S. (2017). How India’s ed-tech sector can grow and the challenges it must
overcome. Available online at <https://www.vccircle.com/the-present-and-future-of-indias-
online- education-industry/> Accessed on April 19, 2020.
Boud (2001). Peer Learning in Higher Education: Learning from and with each other,
London.
Bronshtein, A. (2017). A quick introduction to the “Pandas” python library. Available online
at <https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library-
f1b678f34673> Accessed on April 22, 2020.
Brownlee, J. (2019). Introduction to Python Deep Learning with Keras. Available online at
<https://machinelearningmastery.com/introduction-python-deep-learning-library-keras/>
Accessed on May 4, 2020.
Ceobanu and Boncu (2014). The Challenges of the Mobile Technology in the Young Adult
Education. Procedia - Social and Behavioral Sciences 142.
10
EduColl
Education World Special Report (2018). The e-learning evolution. Available online at
<https://www.educationworld.in/the-e-learning-evolution/> Accessed on April 6, 2020.
Hiltbrand, T. (2018). 5 Advanced Analytics Algorithms for Your Big Data Initiatives.
Available online at <https://tdwi.org/articles/2018/07/02/adv-all-5-algorithms-for-big-
data.aspx> Accessed on April 25, 2020
IMS Proschool (2018). Digitization in India: Several opportunities for growth &
transformation. Available online at <https://www.proschoolonline.com/blog/digitization-in-
india-several-opportunities-for-growth-transformation> Accessed on April 19, 2020.
Jain, K. (2015). Scikit-learn(sklearn) in Python – the most important Machine Learning tool
I learnt last year! Available online at <https://www.analyticsvidhya.com/blog/2015/01/scikit-
learn-python-machine-learning-tool/> Accessed on April 23, 2020.
Kaka, N., Madgavkar, A., Kshirsagar, A., Gupta, R., Manyika, J., Bahl, K. and Gupta, S.
(2019). Digital India: Technology to transform a connected nation. Available online at
<https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-india-
technology-to-transform-a-connected-nation> Accessed on April 18, 2020.
Keller Christina and Cernerud lars (2002). Student’s Perceptions of E-Learning in University
Education. Journal of Educational Media 55:67.
10
EduColl
Landry et al. (2006). Measuring Student Perceptions of Blackboard Using the Technology
Acceptance Model. Wiley Online Library 4(1).
Lee et al. (2007). The Influence of Learning Styles on Learners in E-Learning Environments:
An Empirical Study. Information Systems Department, Qatar University.
Lopez, D. (2019). Your guide to natural language processing (NLP). Available online at
<https://towardsdatascience.com/your-guide-to-natural-language-processing-nlp-
48ea2511f6e1> Accessed on April 22, 2020.
Makoe, M. (2012). Teaching Digital Natives.: Identifying competencies for mobile learning
facilitors in distance education. South African Journal of Higher Education, 26(1), 91-104.
11
EduColl
Rueckert, D. Kim, J.D. & Seo, D. (2013). Students’ perceptions and experiences of mobile
learning. University of Hawaii National Foreign Language Resource Center; Michigan State
University Center for Language Education and Research.
Sarrab, M. Elgamel, L. & Aldabbas, H. (2012). Mobile Learning (M-Learning) and Educational
Environments. International Journal of Parallel Emergent and Distributed Systems 3(4):31-38.
Scott, W. (2019). TF-IDF from scratch in python on real world dataset. Available online at
<https://towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-
real-world-dataset-796d339a4089> Accessed on April 27, 2020.
Ziad, N. (2016). How data science is the driving force behind successful digital
transformation. Available online at <https://www.information-age.com/data-science-driving-
force-behind-successful-digital-transformation-123462527/> Accessed on April 6, 2020.
11
EduColl
SITES
https://sabudh.org/
http://tatrasdata.com/
https://www.mckinsey.com/
https://byjus.com/
https://brainly.in/
https://www.quora.com/
https://www.vedantu.com/
https://www.toppr.com/
https://www.chegg.com/
https://www.khanacademy.org/
https://www.meritnation.com/
https://quizlet.com/
https://www.edu-collab.com/
https://techterms.com/definition/python
https://www.geeksforgeeks.org/numpy-in-python-set-1-introduction/
https://www.w3schools.com/python/numpy_intro.asp
https://www.geeksforgeeks.org/understanding-python-pickling-example/
https://www.guru99.com/nltk-tutorial.html
https://en.wikipedia.org/wiki/Data_pre-processing
https://www.techopedia.com/definition/14650/data-preprocessing
https://www.techopedia.com/definition/13698/tokenization
https://en.wikipedia.org/wiki/Stop_words
https://www.geeksforgeeks.org/removing-stop-words-nltk-python/
11
EduColl
https://www.geeksforgeeks.org/python-lemmatization-with-nltk/
https://en.wikipedia.org/wiki/Feature_extraction
https://www.geeksforgeeks.org/confusion-matrix-machine-learning/
https://en.wikipedia.org/wiki/Convolutional_neural_network
https://www.geeksforgeeks.org/os-module-python-examples/
https://www.geeksforgeeks.org/os-module-python-examples/
https://opencv-python-
tutroals.readthedocs.io/en/latest/py_tutorials/py_gui/py_image_display/py_image_display.ht
ml
https://monkeylearn.com/blog/introduction-to-topic-modeling/
https://en.wikipedia.org/wiki/Topic_model
https://seaborn.pydata.org/introduction.html
https://docs.python.org/2/library/string.html
https://pypi.org/project/gensim/
https://en.wikipedia.org/wiki/Gensim
https://monkeylearn.com/text-classification/
https://en.wikipedia.org/wiki/Descriptive_statistics
https://www.statisticshowto.com/cronbachs-alpha-spss/
https://www.statisticssolutions.com/factor-analysis-sem-factor-analysis/
https://www.statisticshowto.com/factor-analysis/
11
EduColl
ANNEXURE
11
EduColl
ANNEXURE
QUESTIONNAIRE
This questionnaire is aimed at soliciting information on student's perception about
EduCollab after its Beta Release. This analysis is exclusively for research purposes.
Your co-operation will be highly appreciated.
Male
Female
3. What is the model and make of your phone?
4. What is the android version of your mobile?
5. Do you face internet connectivity issues?
Yes
No
6.Have you used any E-learning platform before EduCollab?
Yes
No
Do you think you are fully proficient in using the E-learning platforms?
Yes
No
8. Do you like the concept of peer to peer learning?
Yes
No
9. Have you ever encountered cyber-bullying?
Yes
No
11
EduColl
PART 2 -
On a five-point scale ranging from strongly agree to strongly disagree (1=strongly agree,
2=agree, 3=neutral, 4=disagree, 5=strongly disagree) please select the following:
ABOUT EDUCOLLAB 1 2 3 4 5
EduCollab is user friendly
It has an attractive colour scheme
Logo is appropriate
It provides a great medium for interaction
EduCollab has made teaching and learning more
effective because it integrates all forms of media,
print, audio, print, video and animation
Course content is organized and well-planned
Students get easy access to learning material
11
EduColl
Yes
No
Would you prefer book learning to learning from this platform?
Yes
No
Your recommendation to improve this platform
11
EduColl
THANK YOU
11