AYASKANTA PARIDA - Report

Figure 24 : File Structure
MOVIE RECOMMENDER SYSTEM
USING
MACHINE LEARNING LIBRARIES
Ayaskanta Parida
Regd No. – 1901289071
Department of Computer Science and Engineering

TRIDENT ACADEMY OF TECHNOLOGY
Bhubaneswar- 751024, Odisha, India
June 2023
Major Project Report on
MOVIE RECOMMENDER SYSTEM
USING
MACHINE LEARNING LIBRARIES
Submitted in partial fulfillment of the

requirement for the Award of the Degree of
Bachelor of Technology
in
Computer Science and Engineering
Submitted by
AYASKANTA PARIDA
Regd no.- 1901289071
Under the Guidance of
Dr. Subhra Swetanisha
Asst. Professor, Dept. of CSE

Bhubaneswar- 751024, Odisha, India
June 2023
CADEMy
oF TECHN

Bhubaneswar- 751024, Odisha. India
CERTIFICATE OF APPROVAL
This B.Tech Viva-Voce Examination of the Major Project work

submitted by the candidate Ayaskanta Parida bearing BPUT Regd.No.
1901289071 is held during 25" June 2023 and is accepted in partial
fulfillment of the requirement for the award of the Bachelor of Technology
in Computer Science & Engineering of Biju Patnaik University of
Technology, Odisha.
MENTOR HEAD OF THE DEPARTMENT
(Dr.Subhra Swetanisha) (Mrs. Padmabatí Chand)

Associate Professor, Dept. of CSE Professor& HOD of CSE
Trident Academy of Technology Trident Academy of Technology
Bhubaneswar, Odisha Bhubaneswar, Odisha
Date : 25 June 2023 Date : 25" June 2023
DECLARATION
I, Ayaskanta Parida declare that the Major Report presented through
this report was carried out by me in accordance with the requirements and in
compliance of the Academic Regulations ofthe Biju Patnaik University of
Technologyfor the Bachelor of Technology (B.Tech.) Degree Programmed
in Computer Science &Engineering & that it has not been submitted for any
other academicaward. Except where indicated by specific reference in the text,
the work is solely my own work. Work done in collaboration with, or with the
assistance of, others, has been acknowledged and is indicated as such. Any
views expressed in the report are those of the author.
Hyaskonta Poccda
Place: Bhubaneswar Ayaskanta Parida
Date : 25" June 2023 Regd No : 1901289071
ACKNOWLEDGEMENT
Itake this opportunity to express my gratitude to the people who
have been instrumental in the successful completion of this project. I am, in the
first place, obliged and grateful to my parents without whose support and care
Icould not have completed this project. I express my deep gratitude towards
my guide, Dr. Subhra Swetanisha, Associate Professor, Dept. of CSE,
Trident Academy of Technology, Bhubaneswar, for his tremendous support,
encouragement and help.
Iconvey my sincere thanks to our HOD, Department of CSE and the

Principal of Trident Academy of Technology, Bhubaneswar, for their
permission and cooperation in the completion of the project without
experiencing any hurdles. I would like to extend my gratitude to the
Department of Information Technology, Trident Academy of Technology,
Bhubaneswar, for their support and cooperation. Finally, I extend my
appreciation to all my friends, teaching and non-teaching staffs, who directly
or indirectly helped me in this endeavor.
Place: Bhubaneswar
Ayaskonta Pari do
Ayaskanta Parida
Date: 25th June 2023 Regd No:1901289071
ABSTRACT
The movie industry has experienced exponential growth,

rtesulting in a vast collection of films available to viewers. However, the sheer
number of movies makes it chalenging for users to discover films that align
uath their preferences. This paper presents an abstract for a movie recommender
:5 ystem that incorporates content-based filtering with IMDb ratings to deliver
(Qccurate and relevant movie recommendations. The proposed movie
Tecommender system employsa content-based filtering approach that considers
movie features such as genre, director, and actors to generate recommendations.
The system leverages the IMDb rating, a widely used metric to measure the
quality of movies, as a key component in the recommendation process. By
integrating IMDb ratings, the system ensures that the recommended movies not
only match the user's selected movies but also possess high-quality ratings. To
implement the movie recommender system, a comprehensive dataset is
:Collected, comprising movie metadata, including genre, director, actors, and
IMDb ratings. The system uses this data to build a profile for each user,
: Capturing their movie preferences based on the features and IMDb ratings of
movies they have rated highly in the past. The recommendation process begins
by analyzing the user's profile and identifying movies with similar features and
high IMDb ratings.
Ayaskanta Porcda
Place : Bhubaneswar Ayaskanta Parida
Date: 25" June 2023 Regd No : 1901289071
ACADEMy
OF TECHA

Bhubaneswar- 75 1024, Odisha, India
CERTIFICATE OF APPROVAL
This B.Tech Viva-Voce Examination of the Major Project work

submitted by the candidate Ayaskanta Parida bearing BPUT Regd.No.
1901289071 is held during 25th June 2023 and is accepted in partial
fulfillment of the requirement for the award of the Bachelor of Technology
in Computer Science & Engineering of Biju Patnaik University of
Technology, Odisha.
Place: Bhubaneswar Salanti Sachi padma

Date: June 25, 2023 (External Signature)
Place: Bhubaneswar
Date: June 25, 2023 Dr. Subhra Swetanisha
Dept. of CSE, (Project Guide)

Place: Bhubaneswar
Date: June 25, 2023 HOD, Dept. of CSE

CONTENTS
Sl Page no.
No Topic
.
1 INTRODUCTION 1-3
2 WHAT IS MACHINE LEARNING (ML) ? 3-5
3 COMMON MACHINE LEARNING ALGORITHMS 5-6
4 REAL WORLD MACHINE LEARNING USE CASES 6-7
5 CHALLENGES OF MACHINE LEARNING 7-9
6 WHAT IS LEARNING ? 10-11
7 AI VS ML VS DEEP LEARNING 12-14
8 IMPORTANCE OF MACHINE LEARNING 14-19
9 FEATURES OF MACHINE LEARNING 19-24
10 TYPES OF MACHINE LEARNING 24-31

a. Supervised Learning
b. Unsupervised Learning
c. Reinforcement Learning
11 APPLICATIONS OF MACHINE LEARNING 31-32
12 MOVIE RECOMMENDER SYSTEM USING ML LIBRARIES 33-34
a. Problem Statement
b. Objectives
13 RECOMMENDER SYSTEM APPROACH 35-41
14 LIBRARIES REQUIRED 41-82

a. Numpy
b. Pandas
c. NLTK
d. Pickle
e. Streamlit
15 ARCHITECTURE DIAGRAM 82
16 WORKING (including Code with Output) 83-100
17 WEBPAGE 101-103
18 APPLICATIONS 103-104
19 ADVANTAGES 104-105
20 LIMITATIONS OF THE SYSTEM 106

21 CONCLUSION 106
22 REFERENCES 107
MOVIE RECOMMENDER SYSTEM USING MACHINE LEARNING LIBRARIES
INTRODUCTION:
In today's digital age, the availability of vast amounts of data and the
proliferation of streaming platforms have led to an overwhelming number of movies being
produced and made accessible to viewers. However, this abundance of choices can also make
it challenging for users to find movies that align with their preferences. Movie recommender
systems offer a solution to this problem by leveraging data analysis and machine learning
techniques to provide personalized movie recommendations to users. The entertainment
industry has undergone a significant transformation with the advent of streaming platforms and
the proliferation of online content. With an extensive library of movies and TV shows available
at their fingertips, viewers are faced with an overwhelming array of choices. However, the
challenge lies in finding content that aligns with their personal preferences and interests. Movie
recommender systems have emerged as a solution to this problem by leveraging data analysis
and machine learning techniques to provide personalized movie recommendations to users.
Movie recommender systems have gained tremendous importance in the streaming industry,
offering numerous benefits for both users and content providers. For users, these systems
enhance their overall viewing experience by helping them discover movies that cater to their
unique tastes and preferences.
The primary goal of a movie recommender system is to bridge the gap between users and the
ever-expanding world of movies. By understanding user preferences, these systems aim to
curate a tailored selection of movies that align with individual tastes, ensuring that users
discover relevant content they are likely to enjoy. This personalized approach not only saves
users time and effort in searching for movies but also introduces them to new titles, genres, and
filmmakers they may not have previously considered. The development of a movie
recommender system involves several key components and methodologies. These include data
collection, where user preferences, movie metadata, and historical interactions are gathered;
preprocessing, which involves cleaning and transforming the data into a suitable format for
analysis; feature extraction, where relevant movie and user features are identified; and the
utilization of various recommendation algorithms, such as collaborative filtering, content-
based filtering, and hybrid approaches, to generate accurate and diverse movie
Page | 1
recommendations. Despite their undeniable benefits, movie recommender systems also face
challenges. Scalability is another concern, as recommender systems must handle vast amounts
of data and ensure real-time recommendations even as the user base grows. Additionally, the
sparsity of user ratings and interactions for many movies can hinder the system's ability to
provide precise recommendations. Furthermore, maintaining diversity and avoiding the over-
recommendation of popular choices pose additional challenges for these systems. By
delivering personalized recommendations, users can explore new genres, discover hidden
gems, and avoid spending time on content that doesn't resonate with them. This personalized
approach leads to increased user satisfaction, engagement, and retention on streaming
platforms. From the perspective of content providers, movie recommender systems offer a
powerful tool to promote and showcase their offerings to a broader audience. By understanding
users' viewing habits, preferences, and interests, content providers can strategically recommend
movies to enhance discoverability and drive user engagement.
These systems help optimize the content discovery process, ensuring

that users find movies they are likely to enjoy, thereby increasing their likelihood of returning
to the platform and boosting revenue for the providers. In this report, we will delve into the key
components and methodologies employed in developing an effective movie recommender
system. We will explore the data collection process, preprocessing techniques, and feature
extraction methods utilized to understand user preferences and movie characteristics.
Furthermore, we will discuss various recommendation algorithms, such as collaborative
filtering, content-based filtering, and hybrid approaches, that are employed to generate
personalized movie recommendations.
Finally, we will examine popular implementations of movie recommender

systems in the industry, showcasing examples from streaming giants like Netflix, Amazon
Prime Video, and YouTube. By understanding real-world implementations, we can gain
insights into the practical application of recommendation algorithms and their impact on user
experience and business success. Overall, movie recommender systems have become
indispensable tools for streaming platforms, addressing the challenge of content overload and
enhancing user engagement. By leveraging advanced algorithms and user data, these systems
provide tailored movie recommendations, revolutionizing the way viewers discover and enjoy
movies.
Page | 2
Figure 1: Movies Poster
WHAT IS MACHINE LEARNING ?
Machine learning is programming computers to optimize a performance

criterion using example data or past experience. We have a modeldefined up to some
parameters, and learning is the execution of a computer program to optimize the
parameters of the model using the training data or past experience. The model may
be predictive to make predictions in the future, or descriptive to gain knowledge from
data, or both. Arthur Samuel, an early American leader in the field of computer gaming
and artificial intelligence, coined the term “Machine Learning” in 1959 while at IBM. He
defined machinelearning as “the field of study that gives computers the ability to learn
without being explicitly programmed.” However, there is no universally accepted
definition for machine learning. Different authors define the term differently.
Page | 3
Figure 2: Machine Learning
Machine learning (ML) is a field of inquiry devoted to understanding and

building methods that 'learn', that is, methods that leverage data to improve
performance on some set of tasks. It is seen as a part of artificial intelligence. Machine
learning algorithms build a model based on sample data, known as training data, in order
to make predictions or decisions without being explicitly programmed to do so.
Machine learning algorithms are used in a wide variety of applications,such as in

medicine, email filtering, speech recognition, and computer vision, where it is difficult
or unfeasible to develop conventional algorithms to perform the needed tasks. A
Subset of Machine Learning i s closely related to computational statistics, which
focuses on making predictions using computers, but not all machine learning is
statistical learning. The studyof mathematical optimization delivers methods, theory
and application domains to the field of machine learning.
Page | 4
Figure 3: Machine Learning Working
COMMON MACHINE LEARNING ALGORITHMS
A number of machine learning algorithms are commonly used. These include:
• Neural networks: Neural networks simulate the way the human brain works, with a
huge number of linked processing nodes. Neural networks are good at recognizing
patterns and play an important role in applications including natural language translation,
image recognition, speech recognition, and image creation.
• Linear regression: This algorithm is used to predict numerical values, based on a
linear relationship between different values. For example, the technique could be used to
predict house prices based on historical data for the area.
• Logistic regression: This supervised learning algorithm makes predictions for
categorical response variables, such as“yes/no” answers to questions. It can be used for
applications such as classifying spam and quality control on a production line.
• Clustering: Using unsupervised learning, clustering algorithms can identify patterns
in data so that it can be grouped. Computers can help data scientists by identifying
differences between data items that humans have overlooked.
Page | 5
• Decision trees: Decision trees can be used for both predicting numerical values
(regression) and classifying data into categories. Decision trees use a branching sequence
of linked decisions that can be represented with a tree diagram. One of the advantages of
decision trees is that they are easy to validate and audit, unlike the black box of the neural
network.
• Random forests: In a random forest, the machine learning algorithm predicts a value
or category by combining the results from a number of decision trees.
REAL-WORLD MACHINE LEARNING USE CASES
Here are just a few examples of machine learning you might encounter every day:
• Speech recognition: It is also known as automatic speech recognition (ASR),

computer speech recognition, or speech-to-text, and it is a capability which uses natural
language processing (NLP) to translate human speech into a written format. Many mobile
devices incorporate speech recognition into their systems to conduct voice search—e.g.
Siri—or improve accessibility for texting.
• Customer service: Customer service: Online chatbots are replacing human agents
along the customer journey, changing the way we think about customer engagement
across websites and social media platforms. Chatbots answer frequently asked questions
(FAQs) about topics such as shipping, or provide personalized advice, cross-selling
products or suggesting sizes for users. Examples include virtual agents on e-commerce
sites; messaging bots, using Slack and Facebook Messenger; and tasks usually done by
virtual assistants and voice assistants.
• Computer vision: This AI technology enables computers to derive meaningful
information from digital images, videos, and other visual inputs, and then take the
appropriate action. Powered by convolutional neural networks, computer vision has
applications in photo tagging on social media, radiology imaging in healthcare, and self-
driving cars in the automotive industry.
• Recommendation engines: Using past consumption behavior data, AI algorithms
can help to discover data trends that can be used to develop more effective cross-selling
Page | 6
strategies. This approach is used by online retailers to make relevant product

recommendations to customers during the checkout process.
• Automated stock trading: Designed to optimize stock portfolios, AI-driven high-
frequency trading platforms make thousands or even millions of trades per day without
human intervention.
• Fraud detection: Banks and other financial institutions can use machine learning to
spot suspicious transactions. Supervised learning can train a model using information
about known fraudulent transactions. Anomaly detection can identify transactions that
look atypical and deserve further investigation.
CHALLENGES OF MACHINE LEARNING
As machine learning technology has developed, it has certainly made our lives easier.
However, implementing machine learning in businesses has also raised a number of ethical
concerns about AI technologies. Some of these include:
• Technological singularity
While this topic garners a lot of public attention, many researchers are not concerned
with the idea of AI surpassing human intelligence in the near future. Technological singularity
is also referred to as strong AI or superintelligence. Philosopher Nick Bostrum defines
superintelligence as “any intellect that vastly outperforms the best human brains in practically
every field, including scientific creativity, general wisdom, and social skills.” Despite the fact
that superintelligence is not imminent in society, the idea of it raises some interesting
questions as we consider the use of autonomous systems, like self-driving cars. It’s unrealistic
to think that a driverless car would never have an accident, but who is responsible and liable
under those circumstances? Should we still develop autonomous vehicles, or do we limit this
technology to semi-autonomous vehicles which help people drive safely? The jury is still out
on this, but these are the types of ethical debates that are occurring as new, innovative AI
technology develops.
Page | 7
• AI impact on jobs
While a lot of public perception of artificial intelligence centers around job losses, this
concern should probably be reframed. With every disruptive, new technology, we see that the
market demand for specific job roles shifts. For example, when we look at the automotive
industry, many manufacturers, like GM, are shifting to focus on electric vehicle production
to align with green initiatives. The energy industry isn’t going away, but the source of energy
is shifting from a fuel economy to an electric one. In a similar way, artificial intelligence will
shift the demand for jobs to other areas. There will need to be individuals to help manage AI
systems. There will still need to be people to address more complex problems within the
industries that are most likely to be affected by job demand shifts, such as customer service.
The biggest challenge with artificial intelligence and its effect on the job market will be
helping people to transition to new roles that are in demand.
• Privacy
Privacy tends to be discussed in the context of data privacy, data protection, and
data security. These concerns have allowed policymakers to make more strides in recent
years. For example, in 2016, GDPR legislation was created to protect the personal data of
people in the European Union and European Economic Area, giving individuals more control
of their data. In the United States, individual states are developing policies, such as the
California Consumer Privacy Act (CCPA), which was introduced in 2018 and requires
businesses to inform consumers about the collection of their data. Legislation such as this has
forced companies to rethink how they store and use personally identifiable information (PII).
As a result, investments in security have become an increasing priority for businesses as they
seek to eliminate any vulnerabilities and opportunities for surveillance, hacking, and
cyberattacks.
• Bias and discrimination
Instances of bias and discrimination across a number of machine learning systems

have raised many ethical questions regarding the use of artificial intelligence. How can we
safeguard against bias and discrimination when the training data itself may be generated by
Page | 8
biased human processes? While companies typically have good intentions for their
automation efforts, Reuters (link resides outside IBM) ) highlights some of the unforeseen
consequences of incorporating AI into hiring practices. In their effort to automate and simplify
a process, Amazon unintentionally discriminated against job candidates by gender for
technical roles, and the company ultimately had to scrap the project. Harvard Business
Review (link resides outside IBM) has raised other pointed questions about the use of AI in
hiring practices, such as what data you should be able to use when evaluating a candidate for
a role.
Bias and discrimination aren’t limited to the human resources function either; they
can be found in a number of applications from facial recognition software to social media
algorithms. As businesses become more aware of the risks with AI, they’ve also become
more active in this discussion around AI ethics and values. For example, IBM has sunset its
general purpose facial recognition and analysis products. IBM CEO Arvind Krishna wrote:
“IBM firmly opposes and will not condone uses of any technology, including facial
recognition technology offered by other vendors, for mass surveillance, racial profiling,
violations of basic human rights and freedoms, or any purpose which is not consistent with
our values and Principles of Trust and Transparency.”
• Accountability
Since there isn’t significant legislation to regulate AI practices, there is no real

enforcement mechanism to ensure that ethical AI is practiced. The current incentives for
companies to be ethical are the negative repercussions of an unethical AI system on the
bottom line. To fill the gap, ethical frameworks have emerged as part of a collaboration
between ethicists and researchers to govern the construction and distribution of AI models
within society. However, at the moment, these only serve to guide. Some research (link
resides outside IBM) (PDF, 1 MB) shows that the combination of distributed responsibility
and a lack of foresight into potential consequences aren’t conducive to preventing harm to
society.
Page | 9
DEFINITION OF LEARNING:
Definition:
A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks T, as measured by P,
improves with experience E.
Another Definition
A computer program which learns from experience is called a Machine Learning

Program or simply a Learning Program. Such a program is sometimes also referred to
as a Learner.
• Learning is any process by which a system improves performance from

experience. It is the acquisition of information, knowledge and skills.
• Machine learning is a growing technology which enables computers to learn
automatically from past data. Machine learning uses various algorithms for
building mathematical models and making predictions using historical data
or information.
• Currently, it is being used for various tasks such as imagerecognition, speech

recognition, email filtering, Facebook auto- tagging, recommender system,
and many more. It gives the machine an ability to learn and perform certain tasks
without being explicitlyprogrammed.
Examples
a) Handwriting Recognition Writing Problem:
✓ Task T: Recognising classifying handwritten words within images.

✓ Performance P: Percent of words correctly classified.
✓ Training experience E : A dataset of handwritten words with given
Page | 10
classifications.
b) A Robot Learning Problem
✓ Task T: Driving on highways using vision sensors.
✓ Performance measure P: Average distance traveled before an error.
✓ Training experience E: A sequence of images and steering commands

recorded while observing a human driver.
c) A Chess Learning Problem
✓ Task T: Playing chess
✓ Performance measure P: Percent of games won against opponents.
✓ Training experience E: Playing practice games against itself.
Figure 4: Human VS Machine

Page | 11
AI VS ML VS DEEP LEARNING:
SL.NO ARTIFICIAL MACHINE DEEP
INTELLIGENCE LEARNING LEARNING
(AI) (ML) (DL)
1. AI stands for Artificial ML stands for Machine DL stands for Deep

Intelligence, and is Learning,and is the study Learning, and is the
basically the study thatuses statistical study that makes use of
/process which enables methods enabling Neural Networks
machines to mimic machines to improve with (similar to neurons
human behaviour experience. present in human brain)
through particular to imitate functionality
algorithm. just like a human
brain.
2. AI is the broader family ML is the subset of AI. DL is the subset of ML.

consisting of ML and
DL as it’s components.
3. AI is a computer algorithm ML is an AI algorithm DL is a ML algorithm
which which allows system to that uses deep(more than
exhibits intelligence learn from data. one layer) neural
through decisionmaking. networks to analyze data
and provide output
accordingly.
Page | 12
4. Search Trees and much If you have a clear idea If you are clear about the
complex math is involved about the logic(math) math involved in it but
in AI. involved in behind andyou don’t have idea about
can visualize thecomplex the features ,so you
functionalities like break the
K- complex functionalities
Mean, Support Vector into linear/lower
Machines, etc., then it dimension features by
defines the ML aspect. adding more
layers, then it defines
the DL aspect.
5. The aim is to basically The aim is to increase It attains the highestrank

increase chances ofsuccess accuracy not caring much in terms of accuracy
and not accuracy. about thesuccess ratio. when it is trained with
large
amount of data.
6. Examples of AI Examples of ML Examples of DL
applications include: applications include: applications include:
Google’s AI - Powered Virtual Personal Sentiment based news
Predictions, Ridesharing Assistants: Siri, Alexa, aggregation, Image
Apps Like Uber and Google, etc., Email Spam analysis and caption
Lyft, Commercial and Malware Filtering. generation, etc.
Flights Use an AI
Autopilot,
etc.
Page | 13
7. Three broad Three broad DL can be considered as

categories/types of AI are: categories/types of ML neural networks
Artificial Narrow are: with a large numberof
Intelligence (ANI), a) SupervisedLearning parameters layerslying in
Artificial General b) Unsupervised one of thefour
Intelligence (AGI) and Learning fundamental network
Artificial Super c) Reinforcement architectures:
Intelligence (ASI) Learning Unsupervised Pre-
trained Networks,
Convolutional Neural
Networks, Recurrent
Neural Networks and
Recursive Neural
Networks.
8. The efficiency Of AI is Less efficient than DL as More powerful than ML
basically the efficiency it can’t work for longer as it can easily work for
provided by ML and DL dimensions or larger sets of
respectively. higher amount of data. data.
IMPORTANCE OF MACHINE LEARNING:
Machine Learning is one of the most popular sub-fields of Artificial

Intelligence. Machine learning concepts are used almost everywhere, such asHealthcare,
Finance, Infrastructure, Marketing, Self-driving cars, recommendation systems,
chatbots, social sites, gaming, cyber security, andmany more. It is important because
it gives enterprise a view of trends in customer behaviour and business operational
patterns, as well as supports and develops systems that can automatically adapt and
customize themselves to individual users.
Page | 14
The nearly limitless quantity of available data, affordable data storage, and
growth of less expensive and more powerful processing has propelled the growth of
ML. Now many industries are developing more robust models capable of analyzing
bigger and more complex data while delivering faster, more accurate results on vast
scales.
ML tools enable organizations to more quickly identify profitable opportunities
and potential risks.
Figure 5: Importance of Machine Learning
1. Machine learning improves video games
Machine learning could transform gaming. With advanced algorithms, elements

of a game – including objects, non-player characters, and the game world itself –
could react and change based on a player’s actions. A player’s experience would be
unique based on their choices, making gameplay more engaging. Some video games
(like versions of chess) already use machine learning a bit, but there’s still lots of room
for advancement.
Page | 15
2. Machine learning is essential for self-driving cars
While many are rightfully wary of self-driving cars right now, these cars will
become more common. The secret is machine learning. The algorithms collect data via
sensors and cameras, analyze the data, and decide what the car shoulddo. One team at
Boston University recently created a “watch and learn” algorithm that taught self-
driving cars to drive by watching other cars. In a test set in two virtual towns, the self-
driving neural networks got into very few accidents and reached their destinations
92% of the time.
Figure 6: Machine Learning in the field of Self -Driving Cars
Page | 16
3. Machine learning could take over dangerous jobs:
Many jobs put human life at risk. Nuclear cleanup is a big one. In 2021, scientists
participated in a consortium focused on using AI and robotics in nuclear
environments. Using machine learning, robots can be trained to recognize the
differences between radioactive waste types. This would help humans safely identify
and get rid of nuclear waste. Machine learning could also make robots very effective
at jobs involving dangerous chemicals, extreme heavy lifting.
4. Machine learning could help with environmental protections:
Environmental monitoring is essential to protecting animals, humans, and the

environment in general. When storms and other natural disasters strike, toxic materials
from a variety of facilities can mix with waterways, including the systems people
depend on for drinking. With machine learning algorithms, regulators can collect data
by industry, location, material usage, and more. Withthis information, regulators can
identify high-risk areas and prevent futureproblems.
5. Machine learning can improve elder care
Many people struggle with transitioning into older age. Machine learning andAI
could help. Remote patient monitoring (RPM) is just one example. From wearable
devices, RPM collects information like heart rate, oxygen levels, blood pressure, and
more. It’s a great way for clinicians to monitor patients with chronic diseases without
them needing to come in for constant in-person visits.RPM can also help predict future
health issues.
Page | 17
6. Machine learning can help hospitals

Managing hospital patient flow is one of the biggest issues hospitals and other
healthcare systems deal with. Overcrowded emergency rooms, delays, cancellations, and
more all affect patient outcomes. Machine learning can helpreduce many of these issues
by creating predictive models based on real-time data. It can play a part in scheduling
overtime, improving unloading management, reducing waiting times, and so on! This
saves money and lets hospitals provide better care.
Figure 7: Machine Learning in the field of Healthcare
7. Machine learning improves cancer treatment

Because cancer is so complex, it’s hard to predict drug responses. A machine-
learning model could help predict the chances of a patient responding to first- line
therapies. If the model found that they wouldn’t respond, it could make good
predictions about which drug to try instead. In 2021, the Georgia Institute of
Technology and Ovarian Cancer Institute used machine learning algorithms to
create predictive models for 15 distinct cancer types. When compared to a clinical
dataset, the model ended up showing an overall predictive accuracy of 91%.
Page | 18
8. Machine learning can improve banking
The banking industry is complicated. Can machine learning streamline anything? It has
many uses, but fraud detection is a noteworthy one. Hackers are becoming more
advanced and banks are struggling. Thanks to their ability to process huge volumes of
data very quickly, machine learning algorithms designed for fraud detection can
identify malicious activity, verify user identity and respond immediately to attacks.
For banks, this reduces the risk of databreaches and cyberattacks.
9. Machine learning can both threaten and improve cybersecurity
When hackers use machine learning, each attack – successful or not – becomes a
learning experience. The AI gathers more and more information, making each attack
smarter and more effective. It’s a common problem with technological advancements:
there are always malicious actors. To combat these advanced threats and more outdated
(but still dangerous) attacks, organizations need defences that are just as effective.
Machine learning can analyze past attacks, respond to activity in real-time, automate
tasks, and help save money.
FEATURES OF MACHINE LEARNING
In order to understand the actual power of machine learning, you have to

consider the characteristics of this technology. There are lots of examples thatecho the
characteristics of machine learning in today’s data-rich world. Hereare seven key
characteristics of machine learning for which companiesshould prefer it over other
technologies.
Page | 19
1. The ability to perform automated data visualization
A massive amount of data is being generated by businesses and common people on a

regular basis. Byvisualizing notable relationships in data, businesses can not only make
better decisions but build confidence aswell. Machine learning offers a number of
tools that provide rich snippets of data which can be applied to both unstructured
and structured data. With the help of user-friendly automated data visualization
platforms in machine learning, businesses can obtain a wealth of new insights in an
effort to increase productivity in their processes.
Figure 8: Machine Learning ability to perform data visualisation

2. Automation at its best
One of the biggest characteristics of machine learning is its ability to
automate repetitive tasks and thus, increasing productivity. A huge number of
organizations are already using machine learning-powered paperwork and email
automation. In the financial sector, for example, a huge number of repetitive, data-
heavyand predictable tasks are needed to be performed.
Because of this, this sectoruses different types of machine learning solutions to

a great extent. The make accounting tasks faster, more insightful, and more accurate.
Page | 20
Some aspects thathave been already addressed by machine learning include addressing
financial queries with the help of chatbots, making predictions, managing expenses,
simplifying invoicing, and automating bank reconciliations.
Figure 9: Machine Learning Workflow
3. Customer engagement like never before

For any business, one of the most crucial ways to drive
engagement, promote brand loyalty and establish long-lasting
customer relationships is by triggering meaningful
conversations with its target customer base. Machine
learning plays a critical role in enabling businesses and
brands to spark more valuable conversations in terms of
customer engagement.
Figure 10: Customer Engagement
Page | 21
4. The ability to take efficiency to the next level when

merged with IoT
Thanks to the huge hype surrounding the IoT, machine learning has experienced
a great rise in popularity. IoT is being designated as a strategicallysignificant area by
many companies. Many of these businesses have failed to address it. In this scenario,
machine learning is probably the best technology that can be used to attain higher
levels of efficiency. By merging machine learning with IoT, businesses can boost the
efficiency of their entire productionprocesses.
5. The ability to change the mortgage market
It’s a fact that fostering a positive credit score usually takes discipline, time, and
lots of financial planning for a lot of consumers. When it comes to the lenders, the
consumercredit score is one of the biggest measures of creditworthiness that involve a
number of
factors including payment history, total debt, length of credit history etc. Theycan
now predict whether the customer is a low spender or a high spender and understand
his/her tipping point of spending. Apart from mortgage lending, financial
institutions are using the same techniques for other types of consumerloans.
Figure 11: Machine Learning Changed Mortgage Market

Page | 22
6. Accurate data analysis

Traditionally, data analysis has always been encompassing trial and error method,
an approach which becomes impossible when we are working with large and
heterogeneousdatasets. Machine learning comes as the best solution to all these issues
by offering effective alternatives to analyzing massivevolumes of data. By developing
efficient and fast algorithms, as well as, data-driven models for processing of data in
real-time, machine learning is able to generate accurate analysis and results.
Figure 12: Machine Learning in data analysis
7. Business intelligence at its best

Machine learning characteristics, when merged with big data analytical work, can
generate extreme levels of business intelligence with the help of which several
different industries are making strategic initiatives. From retail to financial services
to healthcare, and many more — machine learning has already become one of the
most effective technologies to boost business operations.
Page | 23
Figure 13: Machine Learning in Business
TYPES OF MACHINE LEARNING
Machine learning is a subset of AI, which enables the machine to automatically

learn from data, improve performance from past experiences, and make predictions.
Machine learning contains a set of algorithms that work on a huge amount of data. Data
is fed to these algorithms to train them, and on the basis of training, they build the
model & perform a specific task.
These ML algorithms help to solve different business problems like Regression,

Classification,Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided intomainly
four types, which are:
Page | 24
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Reinforcement Learning
Figure 14: Types of Machine Learning
1. SUPERVISED LEARNING:
As the name suggests Supervised Machine Learning, it is based on supervision.It

means in the supervised learning technique, we train the machines using the "labelled"
dataset, and based on the training, the machine predicts the output. Here, the labelled
data specifies that some of the inputs are already mapped tothe output.
Page | 25
The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y). Some real-world applications of supervised
learning are Risk Assessment, Fraud Detection, Spam filtering, etc.
CATEGORIES OF SUPERVISED MACHINE LEARNING
Supervised machine learning can be classified into two types of problems, which
are given below:
i. CLASSIFICATION
ii. REGRESSION
Figure 15 : Supervised Learning
Page | 26
a) CLASSIFICATION
Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or Blue,
etc. The classification algorithms predict the categories present in the dataset. Some
real-world examples of classification algorithms are Spam Detection, Email filtering,
etc.
Some popular classification algorithms are given below:
✓ Random Forest Algorithm
✓ Decision Tree Algorithm
✓ Logistic Regression Algorithm
✓ Support Vector Machine Algorithm
b) REGRESSION
Regression algorithms are used to solve regression problems in which there is a

linear relationship between input and output variables. These are used to predict
continuous output variables, such as market trends, weather prediction, etc.
Some popular Regression algorithms are given below:
✓ Simple Linear Regression Algorithm
✓ Multivariate Regression Algorithm
✓ Decision Tree Algorithm
✓ Lasso Regression
Page | 27
2. UNSUPERVISED MACHINE LEARNING
In unsupervised learning, the models are trained with the data that is neither
classified nor labelled, and the model acts on that data without any supervision.The
main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns,and differences. Machines are
instructed to find the hidden patterns from the input dataset.
Figure 16 : Unsupervised Learning
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which are givenbelow:
i. Clustering
ii. Association
Page | 28
a) Clustering
The clustering technique is used when we want to find the inherent groups from
the data. It is a way to group the objects into a cluster such that the objectswith the most
similarities remain in one group and have fewer or no similarities with the objects of
other groups. An example of the clustering algorithm is grouping the customers by
their purchasing behaviour.
Some of the popular clustering algorithms are given below:
✓ K-Means Clustering algorithm
✓ Mean-shift algorithm
✓ DBSCAN Algorithm
✓ Principal Component Analysis
✓ Independent Component Analysis
b) Association
Association rule learning is an unsupervised learning technique, which finds

interesting relations among variables within a large dataset. The main aim of this
learning algorithm is to find the dependency of one data item on another data item and
map those variables accordingly so that it can generate maximum profit. This algorithm
is mainly applied in Market Basket analysis, Web usage mining, continuous
production, etc.
Page | 29
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat,
FP-growth algorithm.
3. REINFORCEMENT LEARNING
Reinforcement learning works on a feedback-based process, in which an AI agent

(A software component) automatically explore its surrounding by hitting & trail, taking
action, learning from experiences, and improving its performance.
In reinforcement learning, there is no labelled data like supervised learning,and

agents learn from their experiences only. This process is similar to a human being;
for example, a child learns various things by experiences in his day-to-day life. An
example of reinforcement learning is to play a game, where the Game is the
environment, moves of an agent at each step define states, andthe goal of the agent is to
get a high score. Agent receives feedback in terms ofpunishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi- agent systems.
Categories of Reinforcement Learning
Reinforcement learning is categorized mainly into two types of

methods/algorithms:
❖ Positive Reinforcement Learning: Positive reinforcement learning specifies

increasing the tendency that the required behaviour would occur again by
adding something. It enhances the strength of the behaviour of the agent and
positively impacts it.
Page | 30
❖ Negative Reinforcement Learning: Negative reinforcement learning works

exactly opposite to the positive RL. It increases the tendency that the specific
behaviour would occur again by avoiding the negative condition.
Figure 17 : Reinforcement Learning
APPLICATIONS OF MACHINE LEARNING
Machine learning is a buzzword for today's technology, and it is growing very

rapidly day by day. We are using machine learning in our daily life even without
knowing it such as Google Maps, Google assistant, Alexa, etc. Below are some most
trending real-world applications of Machine Learning:
1. Image Recognition
2. Sign Language Recognition
3. Automatic language Translation
4. Medical Diagnosis
5. Stock Market Trading
6. Speech Recognition
Page | 31
7. Hotel Recommender System
8. Movie Recommender System
9. Product Recommendations
10. Email Spam and Malware Filtering
11. Self-Driving Cars
12. Virtual Personal Assistant and many more.
Figure 18 : Applications of Machine Learning
Page | 32
MOVIE RECOMMENDER SYSTEM USING MACHINE
LEARNING & ITS LIBRARIES
Problem Statement :
Problem arise when user have difficult to find their preferred type of movie according to
their previously watched movie that has been aligned with their interests and viewing habits.
Thus, it will spend a lot of time and energy consuming for manual search. With a vast number
of movies available, users often struggle to find movies that align with their preferences. They
may spend a significant amount of time searching for suitable movies, leading to frustration
and decision fatigue. Traditional movie recommendations, such as generic genre-based
suggestions, may not cater to individual preferences. Users desire personalized
recommendations that consider their specific tastes, leading to a more engaging and satisfying
movie-watching experience. So, Movie Recommender System can leverage algorithms and
techniques to analyze user data, including preferences, ratings, and viewing history. By
understanding user preferences and employing content-based filtering techniques, the system
can generate similar movie according to previously watched movie that has been aligned with
their interests and viewing habits. This recommendations, making it easier for users to find
movies they are likely to enjoy.
Objectives :
The objective is to create a movie recommender system that assists users in
discovering new movies aligned with their interests and viewing habits. The system should
employ advanced algorithms to analyze user data, including preferences, past ratings, and
viewing history, in order to generate accurate and tailored movie recommendations.
▪ The system aims to enhance the movie-watching experience by:

Improving Movie Discovery: Recommender systems help users discover new movies
that align with their interests and preferences. By analyzing user data and employing
recommendation algorithms, the system presents users with a curated list of movies
they are likely to enjoy but may not have discovered otherwise.
Page | 33
• Increasing User Satisfaction: By offering tailored movie recommendations, the system

strives to enhance user satisfaction. Users can find movies that cater to their specific
tastes, leading to a more enjoyable and engaging movie-watching experience.
• Enhancing Engagement and Retention: Providing accurate and relevant
recommendations increases user engagement with the movie platform or service. Users
are more likely to spend time exploring and watching movies, leading to higher retention
rates and increased user loyalty.
• Facilitating Decision-Making: The vast selection of movies available can be
overwhelming for users. A movie recommender system simplifies the decision-making
process by suggesting movies that match the user's preferences, reducing the time and
effort required to find suitable content.
• Increasing Revenue and Business Value: Movie recommender systems have a direct
impact on revenue generation for movie platforms and services. By delivering
personalized recommendations, users are more likely to engage with recommended
movies, leading to increased rentals, purchases, or subscriptions. Additionally, the system
provides valuable insights into user preferences and behavior, enabling data-driven
decisions for content acquisition, marketing strategies, and overall business growth.
Recommendation System :
Recommender systems help to searching through large volume of dynamically
generated information to provide users with personalized content and services. This has
increased the demand for recommender systems more than ever before. It is an
information filtering systems that deal with the problem of information overload by
filtering vital information fragment out of large amount of dynamically generated
information according to user’s preferences, interest, or observed behaviour about item.
[1]. Recommender Systems have been widely used for product recommendations such as
books and movies as well as, it is also gaining ground in service recommendations such
as hotels, restaurants and travel attractions.
[2]. This project purpose a recommender system for help users to make a selection hotel
Page | 34
without searching on blog or ask opinions from families and friends.
Figure 19 : Recommendation System
Movie Recommendation System

A movie recommender system is an application or algorithm that
suggests personalized movie recommendations to users based on their preferences,
viewing history, and other relevant data. It utilizes various techniques and algorithms to
analyze user data and generate accurate and relevant movie suggestions. The system aims
to assist users in discovering new movies they are likely to enjoy and enhance their movie-
watching experience.
Recommender system approach

There three types of the most popular recommender systems approach which
are collaborative recommender system, content-based recommender system and
knowledge-based recommender system.
Collaborative Filtering approach

Recommendation is based on the items liked before by the other people with
similar tastes and preferences like in the past. Collaborative filtering is based on
Page | 35
neighbourhood of like minded customers and similarity between items. Pearson

correlation or cosine similarity is used for the Neighbourhood formation scheme.
Collaborative filtering system collects more ratings from more users, the probability
increases that someone in the system will be a good match for any given new user.
However, a collaborative filtering system must be initialized with a large amount of data,
because a system with a small base of ratings is unlikely to be very useful.
Collaborative filtering method proposes info are based over the similarity
of ratings of particular movie, among different registered users. Collaborative filtering,
Community oriented sifting framework proposes things are reliant over similarity
strategies among customers possibly things. Cooperative separating, otherwise called
social sifting, channels data utilizing others' suggestions. Customers who have recently
concurred on the assessment of specific components are bound to acknowledge it again
later on. For instance, an individual who needs to see a film can request suggestions from
companions. The proposals of certain companions with comparable interests are more
solid than others. This data is utilized on what film to watch. Communitarian Filtering,
doesn't need whatever else with the exception of clients' past inclination on a lot of things.
As it depends on authentic information, the center theory here is that the clients who have
concurred earlier will be in general additionally concur later on. This system proposes
such stuffs which were particular from comparable kind of customers. Community sifting
has a few good conditions
1. In Content Based Filtering customer does assessments subsequently for authentic
viewpoint and estimation and prediction of item is done.
2. It predicts precise results and proposals since proposals depend on client's
comparability as opposite to data proportionality.
Page | 36
Figure 20 : Collaborative filtering vs Content Based filtering of Movie Recommender System
Content-Based approach
Recommendation is based on the items that have similar content and
characteristics to those the user liked before. A dataset that contains past user transactions is
split into training and testing set. It is all about theories of consumer buying behaviour.
Content based recommendations are independent of characteristics 8 of other users [1]. The
system does not recommend the type of item that are different from anything that user has
already choose. Thus, problem will occur when user want to try something new and system
would never make it happen.
Content-based filtering is created based on keeping in mind the profile of the client's
affinity and the initial database data. In this, to precisely predict the things we have castoff
the ratings recorded by the clients to the movies or TV Series and users favored likes and
dislikes to the shows. And by the end of the day at the background of the software using
Collaborative Filtering methodand estimations endorse those things or like those things that
were favored before previously. It calculates and predicts the new shows and or earlier based
predicted things and proposes best movies or shows based on his likes and dislikes items. It
Page | 37
uses different strategies and projection methods on different areas of use. This method is
mostly used in Hybrid Recommender Systems. An older calculations or the predictions of
motion pictures or movies through MOVIEGEN datasets have different implementations, for
ex, this demonstrates the movement of users request, with what had been searched in the past
is also saved in the history or in the database. On learning these mistakes, we have developed
Movie Recommendation System using Pearson Correlation Method, an advance method
based structure that predicts and outputs movies to customers reliant for a data given by the
customers in the past and the present examination, a customer is given the decision to pick
his choices from a great deal of qualities based on No. Of Ratings and Rating to each movie,
etc. We update the users choices in the database and computes a new set of results from the
new data provided and based on the choices of the past visited history of customers. The
software is developed using Python and a simple user interface.
Knowledge-Based approach (KBS)

Recommendation is based on products based on inferences about a user needs
and preferences. Example for existing system using this approach is restaurant
recommender Entree (Burke, Hammond & Cooper, 1996; Burke, Hammond & Young,
1997) makes its recommendations by finding restaurants in a new city similar to
restaurants the user knows and likes. The system allows users to navigate by stating their
preferences with respect to a given restaurant, thereby refining their search criteria. A
knowledge-based recommender system avoids some of these drawbacks. It does not have
a ramp-up problem since its recommendations do not depend on a base of user ratings. It
does not have to gather information about a particular user because its judgements are
independent of individual tastes. Figure 19 below shows the development of a
knowledge-based system
Page | 38
Figure 21 : Develop of a Knowledge based System
The knowledge acquisition process incorporates typical fact finding methods

like interviews, questionnaires, record reviews and observation to acquire factual and
explicit knowledge. For this, techniques like concept sorting, concept mapping, and
protocol analysis are being used. The acquired knowledge should be immediately 9
documented in a knowledge representation scheme. For examples using rules, frames,
scripts and semantic network are the typical examples of knowledge representation
scheme. This project will use knowledge base (KBS) as recommender system approach
and using rules for knowledge representation.
Page | 39
Rule-Based
One of the most popular approaches to knowledge representation is to use
production rules into the system. Rule based also known as expert system that are
invented in the earlier 1970’s and are still in use until now. The definitions of rulebased
system depend almost entirely on expert systems, which are system that capture the
reasoning of human expert in solving a knowledge intensive problem. Instead of
representing knowledge in a declarative, static way as a set of things which are true, the
rule-based system represent knowledge in terms of a set of rules that tells what to do or
make a conclusion. A rule-based system can be simply created by using asset of assertions
and a set of rules that specify how to act on the assertion set. Rules are expressed as a set
of if-then statements, sometimes called IF-THEN rules. Rule reasoning is process to
derive a value for a conclusion. There are two means of deriving conclusions. First, start
with all the known data and progress toward the conclusion by using data driven, forward
chaining or forward reasoning. Second, select a possible conclusion and try to prove its
validity by looking for supporting evidence by using goal driven, backward chaining, or
backward reasoning. Figure 2.2 below shows an example for rule reasoning:
Figure 22 : Example of Rule Reasoning Page | 40

Rule 1
• IF Surface = smooth AND Fruitclass= vine AND Color = green
• THEN Fruit = honeydew
Each rule describes some characteristic of the different fruits through a series of parameters,
for example:
• Fruit
• Fruitclass
• Seedclass
Parameters that represent the final answer are conclusions or goals. Hotel
Recommend System will use rule to describe some characteristic of the different hotel
through a series of parameters. In a nutshell, hotel recommendation system will use rules
for knowledge representation which is rule-based to provide the problem-solving to this
project.
LIBRARIES REQUIRED :
➢ NUMPY:
NumPy (Numerical Python) is an open source Python library that’s used in
almost every field of science and engineering. The NumPy API is used
extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and most
other data science and scientific Python packages. It is a Python library that
provides a multidimensional array object, various derived objects (such as
masked arrays and matrices), and an assortment of routines for fast operations on
arrays, including mathematical, logical, shape manipulation, sorting, selecting,
I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations,
random simulation and much more.
The NumPy library contains multidimensional array and matrix data
structures (you’ll find more information about this in later sections). It
Page | 41
provides ndarray, a homogeneous n-dimensional array object, with methods to

efficiently operate on it. NumPy can be used to perform a wide variety of
mathematical operations on arrays. It adds powerful data structures to Python that
guarantee efficient calculations with arrays and matrices and it supplies an
enormous library of high-level mathematical functions that operate on these
arrays and matrices.
How to import NumPy

To access NumPy and its functions import it in your Python code like this:
import numpy as np
We shorten the imported name to np for better readability of code using NumPy. This
is a widely adopted convention that you should follow so that anyone working with your code
can easily understand it.
Reading the example code

If you aren’t already comfortable with reading tutorials that contain a lot of code,
you might not know how to interpret a code block that looks like this:
>>>a= np.arrange(6)
>>>a2=va[np.newaxis, :]
>>>a2.shape
OUTPUT: (1,6)
Page | 42
Figure 20 : Example of a Numpy Program
Figure 23 : Uses of Numpy
➢ PANDAS
Pandas is an open-source library that is made mainly for working with relational
or labeled data both easily and intuitively. It provides various data structures and
operations for manipulating numerical data and time series. This library is built on
Page | 43
top of the NumPy library. Pandas is fast and it has high performance & productivity
for users.
Pandas generally provide two data structures for manipulating data,
They are:
1. Series
2. DataFrame
Example:
import pandas as pd
import numpy as np
# Creating empty series

ser = pd.Series()
print(ser)
# simple array
data = np.array(['g', 'e', 'e', 'k', 's'])
ser = pd.Series(data)
print(ser)
Output :
Figure 24 : Output of above Pandas code
Page | 44
➢ NLTK
NLTK is a leading platform for building Python programs to work with human
language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such
as WordNet, along with a suite of text processing libraries for classification, tokenization,
stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP
libraries, and an active discussion forum. (NLP) is a field that focuses on making natural
human language usable by computer programs. NLTK, or Natural Language Toolkit, is a
Python package that you can use for NLP.
A lot of the data that you could be analyzing is unstructured data and contains human-readable
text. Before you can analyze that data programmatically, you first need to preprocess it. In this
tutorial, you’ll take your first look at the kinds of text preprocessing tasks you can do with
NLTK so that you’ll be ready to apply them in future projects. You’ll also see how to do some
basic text analysis and create visualizations.
Thanks to a hands-on guide introducing programming fundamentals alongside

topics in computational linguistics, plus comprehensive API documentation, NLTK is
suitable for linguists, engineers, students, educators, researchers, and industry users alike.
NLTK is available for Windows, Mac OS X, and Linux. Best of all, NLTK is a free, open
source, community-driven project.NLTK has been called “a wonderful tool for teaching,
and working in, computational linguistics using Python,” and “an amazing library to play
with natural language.”
Natural Language Processing with Python provides a practical introduction to

programming for language processing. Written by the creators of NLTK, it guides the
reader through the fundamentals of writing Python programs, working with corpora,
categorizing text, analyzing linguistic structure, and more. The online version of the book
has been been updated for Python 3 and NLTK 3. (The original Python 2 version is still
available at https://www.nltk.org/book_1ed.)
Page | 45
Natural language processing (NLP) is a field that focuses on making natural human language
usable by computer programs. NLTK, or Natural Language Toolkit, is a Python package that
you can use for NLP.
A lot of the data that you could be analyzing is unstructured data and contains human-
readable text. Before you can analyze that data programmatically, you first need to
preprocess it. In this tutorial, you’ll take your first look at the kinds of text
preprocessing tasks you can do with NLTK so that you’ll be ready to apply them in future
projects. You’ll also see how to do some basic text analysis and create visualizations.
If you’re familiar with the basics of using Python and would like to get your feet wet with
some NLP, then you’ve come to the right place.
By the end of this tutorial, you’ll know how to:
• Find text to analyze

• Preprocess your text for analysis
• Analyze your text
• Create visualizations based on your analysis
We will be using Python library NLTK (Natural Language Toolkit) for doing text analysis in
English Language. The Natural language toolkit (NLTK) is a collection of Python libraries
designed especially for identifying and tag parts of speech found in the text of natural language
like English.
Installing NLTK
Before starting to use NLTK, we need to install it. With the help of following command, we can
install it in our Python environment −
pip install nltk
If we are using Anaconda, then a Conda package for NLTK can be built by using the following
command −
conda install -c anaconda nltk
Page | 46
Downloading NLTK’s Data
After installing NLTK, another important task is to download its preset text repositories so that
it can be easily used. However, before that we need to import NLTK the way we import any
other Python module. The following command will help us in importing NLTK −
import nltk
Now, download NLTK data with the help of the following command −
nltk.download()
It will take some time to install all available packages of NLTK.
Other Necessary Packages
Some other Python packages like gensim and pattern are also very necessary for text analysis
as well as building natural language processing applications by using NLTK. the packages can
be installed as shown below −
gensim
gensim is a robust semantic modeling library which can be used for many applications. We can
install it by following command −
pip install gensim

pattern
It can be used to make gensim package work properly. The following command helps in
installing pattern −
pip install pattern
Lemmatization
It is another way to extract the base form of words, normally aiming to remove inflectional
endings by using vocabulary and morphological analysis. After lemmatization, the base form of
any word is called lemma.
NLTK module provides the following package for lemmatization −
Page | 47
WordNetLemmatizer package
This package will extract the base form of the word depending upon whether it is used as a noun
or as a verb. The following command can be used to import this package −
from nltk.stem import WordNetLemmatizer
Counting POS Tags–Chunking
The identification of parts of speech (POS) and short phrases can be done with the help of
chunking. It is one of the important processes in natural language processing. As we are aware
about the process of tokenization for the creation of tokens, chunking actually is to do the
labeling of those tokens. In other words, we can say that we can get the structure of the sentence
with the help of chunking process.
Example
In the following example, we will implement Noun-Phrase chunking, a category of chunking

which will find the noun phrase chunks in the sentence, by using NLTK Python module.
Consider the following steps to implement noun-phrase chunking −
Step 1: Chunk grammar definition
In this step, we need to define the grammar for chunking. It would consist of the rules, which
we need to follow.
Step 2: Chunk parser creation
Next, we need to create a chunk parser. It would parse the grammar and give the output.
Step 3: The Output
In this step, we will get the output in a tree format.
Running the NLP Script
Start by importing the the NLTK package −
import nltk
Now, we need to define the sentence.
Here,
Page | 48
• DT is the determinant
• VBP is the verb
• JJ is the adjective
• IN is the preposition
• NN is the noun
sentence = [("a", "DT"),("clever","JJ"),("fox","NN"),("was","VBP"),
("jumping","VBP"),("over","IN"),("the","DT"),("wall","NN")]
Next, the grammar should be given in the form of regular expression.
grammar = "NP:{<DT>?<JJ>*<NN>}"
Now, we need to define a parser for parsing the grammar.
parser_chunking = nltk.RegexpParser(grammar)
Now, the parser will parse the sentence as follows −
parser_chunking.parse(sentence)
Next, the output will be in the variable as follows:-
Output = parser_chunking.parse(sentence)
Now, the following code will help you draw your output in the form of a tree.
output.draw()
Figure 25 : Output of above nltk code
Page | 49
Tokenizing
Tokenization may be defined as the Process of breaking the given text, into smaller units called
tokens. Words, numbers or punctuation marks can be tokens. It may also be called word
segmentation.
Example
Input − Bed and chair are types of furniture.
We have different packages for tokenization provided by NLTK. We can use these packages
based on our requirements. The packages and the details of their installation are as follows −
sent_tokenize package
This package can be used to divide the input text into sentences. We can import it by using the
following command −
from nltk.tokenize import sent_tokenize

word_tokenize package
This package can be used to divide the input text into words. We can import it by using the
following command −
from nltk.tokenize import word_tokenize

WordPunctTokenizer package
This package can be used to divide the input text into words and punctuation marks. We can
import it by using the following command −
from nltk.tokenize import WordPuncttokenizer

By tokenizing, you can conveniently split up text by word or by sentence. This will allow you
to work with smaller pieces of text that are still relatively coherent and meaningful even outside
of the context of the rest of the text. It’s your first step in turning unstructured data into structured
data, which is easier to analyze.
Page | 50
When you’re analyzing text, you’ll be tokenizing by word and tokenizing by sentence. Here’s
what both types of tokenization bring to the table:
• Tokenizing by word: Words are like the atoms of natural language. They’re the smallest
unit of meaning that still makes sense on its own. Tokenizing your text by word allows
you to identify words that come up particularly often. For example, if you were analyzing
a group of job ads, then you might find that the word “Python” comes up often. That
could suggest high demand for Python knowledge, but you’d need to look deeper to know
more.
• Tokenizing by sentence: When you tokenize by sentence, you can analyze how those
words relate to one another and see more context. Are there a lot of negative words around
the word “Python” because the hiring manager doesn’t like Python? Are there more terms
from the domain of herpetology than the domain of software development, suggesting
that you may be dealing with an entirely different kind of python than you were
expecting?
Here’s how to import the relevant parts of NLTK so you can tokenize by word and by sentence:
>>>
>>> from nltk.tokenize import sent_tokenize, word_tokenize

Now that you’ve imported what you need, you can create a string to tokenize. Here’s a quote
from Dune that you can use:
>>>
>>> example_string = """

... Muad'Dib learned rapidly because his first training was in how to learn.
... And the first lesson of all was the basic trust that he could learn.
... It's shocking to find how many people do not believe they can learn,
... and how many more believe learning to be difficult."""
You can use sent_tokenize() to split up example_string into sentences:
>>>
Page | 51
>>> sent_tokenize(example_string)
["Muad'Dib learned rapidly because his first training was in how to learn.",
'And the first lesson of all was the basic trust that he could learn.',
"It's shocking to find how many people do not believe they can learn, and how many more
believe learning to be difficult."]
Tokenizing example_string by sentence gives you a list of three strings that are sentences:
1. "Muad'Dib learned rapidly because his first training was in how to learn."
2. 'And the first lesson of all was the basic trust that he could learn.'
3. "It's shocking to find how many people do not believe they can learn, and how many more
believe learning to be difficult."
Now try tokenizing example_string by word:
>>>
>>> word_tokenize(example_string)
["Muad'Dib",
'learned',
'rapidly',
'because',
'his',
'first',
'training',
'was',
'in',
'how',
'to',
'learn',
'.',
'And',
'the',
'first',
Page | 52
'lesson',
'of',
'all',
'was',
'the',
'basic',
'trust',
'that',
'he',
'could',
'learn',
'.',
'It',
"'s",
'shocking',
'to',
'find',
'how',
'many',
'people',
'do',
'not',
'believe',
'they',
'can',
'learn',
',',
'and',
'how',
'many',
'more',
Page | 53
'believe',
'learning',
'to',
'be',
'difficult',
'.']
You got a list of strings that NLTK considers to be words, such as:
• "Muad'Dib"
• 'training'
• 'how'
But the following strings were also considered to be words:
• "'s"
• ','
• '.'
See how "It's" was split at the apostrophe to give you 'It' and "'s", but "Muad'Dib" was left
whole? This happened because NLTK knows that 'It' and "'s" (a contraction of “is”) are two
distinct words, so it counted them separately. But "Muad'Dib" isn’t an accepted contraction
like "It's", so it wasn’t read as two separate words and was left intact.
Filtering Stop Words

Stop words are words that you want to ignore, so you filter them out of your text when you’re
processing it. Very common words like 'in', 'is', and 'an' are often used as stop words since they
don’t add a lot of meaning to a text in and of themselves.
Here’s how to import the relevant parts of NLTK in order to filter out stop words:
>>>
>>> nltk.download("stopwords")
Page | 54
>>> from nltk.corpus import stopwords

>>> from nltk.tokenize import word_tokenize
Here’s a quote from Worf that you can filter:
>>>
>>> worf_quote = "Sir, I protest. I am not a merry man!"

Now tokenize worf_quote by word and store the resulting list in words_in_quote:
>>>
>>> words_in_quote = word_tokenize(worf_quote)

>>> words_in_quote
['Sir', ',', 'protest', '.', 'merry', 'man', '!']
You have a list of the words in worf_quote, so the next step is to create a set of stop words to
filter words_in_quote. For this example, you’ll need to focus on stop words in "english":
>>>
>>> stop_words = set(stopwords.words("english"))

Next, create an empty list to hold the words that make it past the filter:
>>>
>>> filtered_list = []
You created an empty list, filtered_list, to hold all the words in words_in_quote that aren’t stop
words. Now you can use stop_words to filter words_in_quote:
>>>
>>> for word in words_in_quote:

... if word.casefold() not in stop_words:
... filtered_list.append(word)
You iterated over words_in_quote with a for loop and added all the words that weren’t stop
words to filtered_list. You used .casefold() on word so you could ignore whether the letters
Page | 55
in word were uppercase or lowercase. This is worth doing

because stopwords.words('english') includes only lowercase versions of stop words.
Alternatively, you could use a list comprehension to make a list of all the words in your text that
aren’t stop words:
>>>
>>> filtered_list = [
... word for word in words_in_quote if word.casefold() not in stop_words
... ]
When you use a list comprehension, you don’t create an empty list and then add items to the end
of it. Instead, you define the list and its contents at the same time. Using a list comprehension is
often seen as more Pythonic.
Take a look at the words that ended up in filtered_list:
>>>
>>> filtered_list
['Sir', ',', 'protest', '.', 'merry', 'man', '!']
You filtered out a few words like 'am' and 'a', but you also filtered out 'not', which does affect
the overall meaning of the sentence. (Worf won’t be happy about this.)
Words like 'I' and 'not' may seem too important to filter out, and depending on what kind of
analysis you want to do, they can be. Here’s why:
• 'I' is a pronoun, which are context words rather than content words:
o Content words give you information about the topics covered in the text or the
sentiment that the author has about those topics.
o Context words give you information about writing style. You can observe
patterns in how authors use context words in order to quantify their writing style.
Once you’ve quantified their writing style, you can analyze a text written by an
unknown author to see how closely it follows a particular writing style so you can
try to identify who the author is.
Page | 56
• 'not' is technically an adverb but has still been included in NLTK’s list of stop words for
English. If you want to edit the list of stop words to exclude 'not' or make other changes,
then you can download it.
So, 'I' and 'not' can be important parts of a sentence, but it depends on what you’re trying to learn
from that sentence.
Stemming
Stemming is a text processing task in which you reduce words to their root, which is the core
part of a word. For example, the words “helping” and “helper” share the root “help.” Stemming
allows you to zero in on the basic meaning of a word rather than all the details of how it’s being
used. NLTK has more than one stemmer, but you’ll be using the Porter stemmer. Due to
grammatical reasons, language includes lots of variations. Variations in the sense that the
language, English as well as other languages too, have different forms of a word. For example,
the words like democracy, democratic, and democratization. For machine learning projects, it
is very important for machines to understand that these different words, like above, have the
same base form. That is why it is very useful to extract the base forms of the words while
analyzing the text.
Stemming is a heuristic process that helps in extracting the base forms of the words by chopping
of their ends.
The different packages for stemming provided by NLTK module are as follows −
PorterStemmer package
Porter’s algorithm is used by this stemming package to extract the base form of the words. With
the help of the following command, we can import this package −
from nltk.stem.porter import PorterStemmer
For example, ‘write’ would be the output of the word ‘writing’ given as the input to this
stemmer.
Page | 57
LancasterStemmer package
Lancaster’s algorithm is used by this stemming package to extract the base form of the words.
With the help of following command, we can import this package −
from nltk.stem.lancaster import LancasterStemmer
For example, ‘writ’ would be the output of the word ‘writing’ given as the input to this
stemmer.
SnowballStemmer package
Snowball’s algorithm is used by this stemming package to extract the base form of the words.
With the help of following command, we can import this package −
from nltk.stem.snowball import SnowballStemmer
For example, ‘write’ would be the output of the word ‘writing’ given as the input to this
stemmer.
Here’s how to import the relevant parts of NLTK in order to start stemming:
>>>
>>> from nltk.stem import PorterStemmer

>>> from nltk.tokenize import word_tokenize
Now that you’re done importing, you can create a stemmer with PorterStemmer():
>>>
>>> stemmer = PorterStemmer()

The next step is for you to create a string to stem. Here’s one you can use:
>>>
>>> string_for_stemming = """

... The crew of the USS Discovery discovered many discoveries.
... Discovering is what explorers do."""
Page | 58
Before you can stem the words in that string, you need to separate all the words in it:
>>>
>>> words = word_tokenize(string_for_stemming)

Now that you have a list of all the tokenized words from the string, take a look at what’s
in words:
>>>
>>> words
['The',
'crew',
'of',
'the',
'USS',
'Discovery',
'discovered',
'many',
'discoveries',
'.',
'Discovering',
'is',
'what',
'explorers',
'do',
'.']
Create a list of the stemmed versions of the words in words by using stemmer.stem() in a list
comprehension:
>>>
>>> stemmed_words = [stemmer.stem(word) for word in words]

Take a look at what’s in stemmed_words:
Page | 59
>>>
>>> stemmed_words
['the',
'crew',
'of',
'the',
'uss',
'discoveri',
'discov',
'mani',
'discoveri',
'.',
'discov',
'is',
'what',
'explor',
'do',
'.']
Here’s what happened to all the words that started with 'discov' or 'Discov':
Original word Stemmed version
'Discovery' 'discoveri'
'discovered' 'discov'
'discoveries' 'discoveri'
'Discovering' 'discov'
Those results look a little inconsistent. Why would 'Discovery' give

you 'discoveri' when 'Discovering' gives you 'discov'?
Understemming and overstemming are two ways stemming can go wrong:
Page | 60
1. Understemming happens when two related words should be reduced to the same stem
but aren’t. This is a false negative.
2. Overstemming happens when two unrelated words are reduced to the same stem even
though they shouldn’t be. This is a false positive.
The Porter stemming algorithm dates from 1979, so it’s a little on the older side. The Snowball
stemmer, which is also called Porter2, is an improvement on the original and is also available
through NLTK, so you can use that one in your own projects. It’s also worth noting that the
purpose of the Porter stemmer is not to produce complete words but to find variant forms of a
word.
➢ PICKLE
Python pickle module is used for serializing and de-serializing python object structures. The
process to converts any kind of python objects (list, dict, etc.) into byte streams (0s and 1s) is
called pickling or serialization or flattening or marshalling. We can converts the byte stream
(generated through pickling) back into python objects by a process called as unpickling.
In real world sceanario, the use pickling and unpickling are widespread as they allow us to easily
transfer data from one server/system to another and then store it in a file or database.
Precaution: It is advisable not to unpickle data received from an untrusted source as they may
pose security threat. However, the pickle module has no way of knowing or raise alarm while
pickling malicious data.
Only after importing pickle module we can do pickling and unpickling. Importing pickle can be
done using the following command −
import pickle
Pickle examples:
Below is a simple program on how to pickle a list:
Pickle a simple list: Pickle_list1.py
import pickle
Page | 61
mylist = ['a', 'b', 'c', 'd']
with open('datafile.txt', 'wb') as fh:
pickle.dump(mylist, fh)
In the above code, list – “mylist” contains four elements (‘a’, ‘b’, ‘c’, ‘d’). We open the file in
“wb” mode instead of “w” as all the operations are done using bytes in the current working
directory. A new file named “datafile.txt” is created, which converts the mylist data in the byte
stream.
Unpickle a simple list: unpickle_list1.py
import pickle
pickle_off = open ("datafile.txt", "rb")
emp = pickle.load(pickle_off)
print(emp)
Output: On running above scripts, you can see your mylist data again as output.
['a', 'b', 'c', 'd']
Pickle a simple dictionary −
import pickle
EmpID = {1:"Zack",2:"53050",3:"IT",4:"38",5:"Flipkart"}
pickling_on = open("EmpID.pickle","wb")
pickle.dump(EmpID, pickling_on)
pickling_on.close()
Unpickle a dictionary −
import pickle
pickle_off = open("EmpID.pickle", 'rb')
EmpID = pickle.load(pickle_off)
print(EmpID)
Page | 62
On running above script(unpickle) we get our dictionary back as we initialized earlier. Also,
please note because we are reading bytes here, we have used “rb” instead of “r”.
Output
{1: 'Zack', 2: '53050', 3: 'IT', 4: '38', 5: 'Flipkart'}
Advantages of using Pickle Module:

❖ Recursive objects (objects containing references to themselves): Pickle keeps track of
the objects it has already serialized, so later references to the same object won’t be
serialized again. (The marshal module breaks for this.)
❖ Object sharing (references to the same object in different places): This is similar to self-
referencing objects; pickle stores the object once, and ensures that all other references
point to the master copy. Shared objects remain shared, which can be very important for
mutable objects.
❖ User-defined classes and their instances: Marshal does not support these at all, but pickle
can save and restore class instances transparently. The class definition must be
importable and live in the same module as when the object was stored.
➢ STREAMLIT
Streamlit is a free and open-source framework to rapidly build and share beautiful
machine learning and data science web apps. It is a Python-based library specifically
designed for machine learning engineers. Data scientists or machine learning engineers
are not web developers and they're not interested in spending weeks learning to use these
frameworks to build web apps. Instead, they want a tool that is easier to learn and to use,
as long as it can display data and collect needed parameters for modeling. Streamlit
allows you to create a stunning-looking application with only a few lines of code.
With Streamlit, you can create a web application directly from your
Python script without the need for HTML, CSS, or JavaScript knowledge. It provides a
Page | 63
range of built-in components and widgets that you can use to create interactive elements
such as sliders, dropdowns, charts, and tables. You can also leverage popular Python
libraries like Matplotlib, Pandas, and Plotly to visualize data within your app.
To use Streamlit, you typically follow these steps:

❖ Install Streamlit by running pip install streamlit in your Python environment.
❖ Create a new Python script and import the necessary libraries, such as Streamlit and the
ones you need for data processing and visualization.
❖ Write the code for your app, including data loading, processing, and any visualizations
or interactive elements you want to include.
❖ Use Streamlit's API to define the layout and behavior of your app. You can use functions
like st.sidebar for sidebar elements, st.write to display text or data, and st.pyplot to show
plots generated with other libraries.
❖ Run the Streamlit development server by executing streamlit run your_script.py in your
terminal. This will start the server and launch your app in a web browser.
❖ Streamlit provides automatic live reloading, which means that any changes you make to
your Python script will be immediately reflected in the running app, allowing you to
iterate quickly.
❖ Streamlit also offers additional features like caching, sharing apps, and deploying them
to various platforms. You can explore the Streamlit documentation and community
resources for more information and examples to get started with building your own
interactive web applications.
Let’s install streamlit. Type the following command in the command prompt.
pip install streamlit
Page | 64
Once Streamlit is installed successfully, run the given python code and if you do not get an
error, then streamlit is successfully installed and you can now work with streamlit.
How to run Streamlit file?

Open command prompt or Anaconda shell and type
streamlit run filename.py
Here my
Figure 26 : Run Streamlit file

filename is ‘sample.py’. Open the local URL in
the web browser.
Understanding the Streamlit basic functions
1. Title:
• Python3
# import module
import streamlit as st
# Title
st.title("Hello GeeksForGeeks !!!")
Output:
Page | 65
Figure 27 : title
2. Header and Subheader:
• Python
# Header
st.header("This is a header")
# Subheader
st.subheader("This is a subheader")
Output:
Figure 28: Header/Subheader
3. Text:
• Python3
# Text
st.text("Hello GeeksForGeeks!!!")
Output:
Page | 66
Figure 29 : Text
4. Markdown:
• Python3
# Markdown
st.markdown("### This is a markdown")
Output:
Figure 30 : Markdown
5. Success, Info, Warning, Error, Exception:
• Python3
# success
st.success("Success")
# success
st.info("Information")
Page | 67
# success
st.warning("Warning")
# success
st.error("Error")
# Exception - This has been added later
exp = ZeroDivisionError("Trying to divide by Zero")
st.exception(exp)
Output:
Figure 31 :Success, Information, Warning, Error and Exception
Page | 68
6. Write:
Using write function, we can also display code in coding format. This is not possible using
st.text(”).
• Python3
# Write text
st.write("Text with write")
# Writing python inbuilt function range()
st.write(range(10))
Output:
Figure 32 :write() function
7. Display Images:
• Python3
# Display Images
# import Image from pillow to open images
from PIL import Image
img = Image.open("streamlit.png")
Page | 69
# display image using streamlit
# width is used to set the width of an image
st.image(img, width=200)
Output:
Figure 33: Display image using streamlit
8. Checkbox:
A checkbox returns a boolean value. When the box is checked, it returns a True value else
returns a False value.
• Python3
# checkbox
# check if the checkbox is checked
# title of the checkbox is 'Show/Hide'
if st.checkbox("Show/Hide"):
# display the text if the checkbox returns True value
st.text("Showing the widget")
Page | 70
Output:
Figure 34: Checkbox is not checked
Figure 35: The text is displayed when the box is checked
9. Radio Button:
• Python3
# radio button
# first argument is the title of the radio button
# second argument is the options for the radio button
status = st.radio("Select Gender: ", ('Male', 'Female'))
# conditional statement to print
# Male if male is selected else print female
# show the result using the success function
if (status == 'Male'):
st.success("Male")
else:
Page | 71
st.success("Female")
Output:
Figure 36: Success shows Male when male option is selected
Figure 37: Success shows Female when Female option is selected
10. Selection Box:

You can select any one option from the select box.
• Python3
# Selection box
# first argument takes the titleof the selectionbox
# second argument takes options
hobby = st.selectbox("Hobbies: ",
Page | 72
['Dancing', 'Reading', 'Sports'])
# print the selected hobby
st.write("Your hobby is: ", hobby)
Output:
Figure 38: Selectbox showing options to select from
Figure 39: Selected option is printed
11. Multi-Selectbox:
The multi-select box returns the output in the form of a list. You can select multiple options.
• Python3
# multi select box
# first argument takes the box title
Page | 73
# second argument takes the options to show
hobbies = st.multiselect("Hobbies: ",
['Dancing', 'Reading', 'Sports'])
# write the selected options
st.write("You selected", len(hobbies), 'hobbies')
Output:
Figure 40: Multi-SelectBox
Figure 41: Selected 2 options
12. Button:
st.button() returns a boolean value. It returns a True value when clicked else returns False.
• Python3
# Create a simple button that does nothing
st.button("Click me for no reason")
Page | 74
# Create a button, that when clicked, shows a text
if(st.button("About")):
st.text("Welcome To GeeksForGeeks!!!")
Output:
Figure 42: Click the first button
Figure 43: Click the About button
13. Text Input:
• Python3
# Text Input
# save the input text in the variable 'name'
# first argument shows the title of the text input box
# second argument displays a default text inside the text input area
Page | 75
name = st.text_input("Enter Your name", "Type Here ...")
# display the name when the submit button is clicked
# .title() is used to get the input text string
if(st.button('Submit')):
result = name.title()
st.success(result)
Output:
Text Input
Figure 44: Display success message when the Submit button is clicked
Page | 76
14. Slider:
• Python3
# slider
# first argument takes the title of the slider
# second argument takes the starting of the slider
# last argument takes the end number
level = st.slider("Select the level", 1, 5)
# print the level
# format() is used to print value
# of a variable at a specific position
st.text('Selected: {}'.format(level))
Output:
Figure 45: Slider
Mini Project:
Let us recollect everything that we learn above and create a BMI Calculator web app.
The formula of BMI Index when weight is in Kgs and height is in meters is:
Page | 77
• Python3
# import the streamlit library
# give a title to our app
st.title('Welcome to BMI Calculator')
# TAKE WEIGHT INPUT in kgs
weight = st.number_input("Enter your weight (in kgs)")
# TAKE HEIGHT INPUT
# radio button to choose height format
status = st.radio('Select your height format: ',
('cms', 'meters', 'feet'))
# compare status value
if(status == 'cms'):
# take height input in centimeters
height = st.number_input('Centimeters')
try:
bmi = weight / ((height/100)**2)
Page | 78
except:
st.text("Enter some value of height")
elif(status == 'meters'):
# take height input in meters
height = st.number_input('Meters')
try:
bmi = weight / (height ** 2)
except:
else:
# take height input in feet
height = st.number_input('Feet')
# 1 meter = 3.28
try:
bmi = weight / (((height/3.28))**2)
except:
Page | 79
# check if the button is pressed or not
if(st.button('Calculate BMI')):
# print the BMI INDEX
st.text("Your BMI Index is {}.".format(bmi))
# give the interpretation of BMI index
if(bmi < 16):
st.error("You are Extremely Underweight")
elif(bmi >= 16 and bmi < 18.5):
st.warning("You are Underweight")
elif(bmi >= 18.5 and bmi < 25):
st.success("Healthy")
elif(bmi >= 25 and bmi < 30):
st.warning("Overweight")
elif(bmi >= 30):
st.error("Extremely Overweight")
Page | 80
Output:
Figure 46: Calculate BMI Index, Scenario 1
Height in Meters:
Figure 47: Calculate BMI Index Scenario 2
Page | 81
Height in Feet:
Figure 48: Calculate BMI Index, Scenario 3
ARCHITECTURE DIAGRAM
Figure 49 : Architecture diagram of Movie Recommender System

Page | 82
WORKING
Before going to code part, we will download dataset, and then we will use it in code.
DATASET
Here in this project we are going to use the dataset from Kaggle. We will use these
following dataset:
1. tmdb_5000_credits
2. tmdb_5000_movies
CODE WITH OUTPUT
Importing Libraries & Reading Datasets
Page | 83
Page | 84
Merging two datasets
Page | 85
Page | 86
Page | 87
TO FIND MISSING DATA
Page | 88
CONVERTS STRING OF LISTS INTO LISTS
Page | 89
Page | 90
Page | 91
Page | 92
Page | 93
VECTORIZATION
FROM NLTK IMPORT PORTERSTEMMER
Page | 94
Page | 95
MAIN FUNCTION OF RECOMMEND MOVIE
Page | 96
IMPORTING PICKLE
FRONTEND PART USING STREAMLIT WITH IMDB RATING
For getting IMDB Rating in frontend we have to import IMDB. To access IMDb
(Internet Movie Database) data using Python, you can make use of the IMDbPY library.
IMDbPY is a Python package that provides an interface to the IMDb database, allowing you
to retrieve information about movies, TV shows, actors, and more.
To get started, you'll need to install the IMDbPY library. You can do this by running
the following command in your Python environment:
pip install IMDbPY
Once we have installed this library, now we can use it in movie recommender system.
import pickle
import pandas as pd
import requests
import imdb
# Initialize the IMDb access

ia = imdb.IMDb()
# Function to fetch IMDb rating for a given movie title
def fetch_imdb_rating(movie_title):
# Search for the movie using the title
movies = ia.search_movie(movie_title)
Page | 97
# Get the first movie from the search results

movie = movies[0]
# Retrieve the IMDb information for the movie

ia.update(movie)
# Check if the IMDb rating is available

if 'rating' in movie:
return movie['rating']
return None
# Function to fetch poster image from TMDb

def fetch_poster(movie_id):
response =
requests.get('https://api.themoviedb.org/3/movie/{}?api_key=8265bd16
79663a7ea12ac168da84d2e8&language=en-US'.format(movie_id))
data = response.json()
if 'poster_path' in data:
return "https://image.tmdb.org/t/p/w500/" +
data['poster_path']
else:
return None
# Streamlit app code
st.title('MOVIE RECOMMENDER SYSTEM ')
name = st.text_input("Enter your name")
if st.button("Next"):
st.session_state["name"] = name
st.experimental_rerun()
if "name" in st.session_state:
Page | 98
st.subheader("Hello, " + st.session_state["name"])
def recommend(movie):
movie_index = movies[movies['title'] == movie].index[0]
distances = similarity[movie_index]
movies_list = sorted(list(enumerate(distances)),
reverse=True, key=lambda x: x[1])[1:6]
recommended_movies = []
recommended_movies_posters = []
recommended_movies_ratings = []
for i in movies_list:
movie_id = movies.iloc[i[0]].movie_id
movie_title = movies.iloc[i[0]].title
recommended_movies.append(movie_title)
# Fetch poster from TMDb
recommended_movies_posters.append(fetch_poster(movie_id))
# Fetch IMDb rating
recommended_movies_ratings.append(fetch_imdb_rating(movie_title))
return recommended_movies, recommended_movies_posters,

recommended_movies_ratings
movies_dict = pickle.load(open('movie_dict.pkl', 'rb'))

movies = pd.DataFrame(movies_dict)
similarity = pickle.load(open('similarity.pkl', 'rb'))
selected_movie_name = st.selectbox('Enter the Movie Name',

movies['title'].values)
Page | 99
if st.button('Recommend Movie'):
names, posters, ratings = recommend(selected_movie_name)
st.subheader('Below are some similar movies:')
col1, col2, col3, col4, col5 = st.columns(5)

with col1:
st.subheader(names[0])
st.image(posters[0])
st.write('IMDb Rating:', ratings[0])
with col2:
with col3:
with col4:
with col5:
Page | 100
RUN STREAMLIT
To run Streamlit, we have to run a command
Streamlit run <file_name>
then a Local and Network URL will be generated and it will be open in your browser
WEBPAGE
Page | 101
Page | 102
APPLICATIONS:
Here are some common applications of movie recommender systems:
❖ Streaming Platforms: Movie recommender systems are extensively used by streaming

platforms like Netflix, Amazon Prime Video, and Hulu to provide personalized
recommendations to their users.
Page | 103
❖ E-commerce Websites: Online marketplaces that sell movies, such as Amazon, often
employ recommender systems to suggest relevant movies to their customers based on their
browsing history, purchase behavior, and ratings.
❖ Social Media Platforms: Social media platforms like Facebook and Twitter have
integrated movie recommender systems to enhance user engagement. By analyzing user
profiles, interests, and interactions, these systems recommend movies that align with users'
preferences and enable them to share and discuss their favorite films with friends.
❖ Movie Review Websites: Platforms like IMDb and Rotten Tomatoes use movie
recommender systems to suggest films to their users based on their ratings, reviews, and
browsing behavior. This helps users discover movies that are likely to appeal to their tastes
and preferences.
❖ Personalized Advertising: Movie recommender systems can also be used for targeted
advertising. By analyzing user preferences and viewing habits, advertisers can deliver
personalized movie recommendations as part of their marketing campaigns. This allows
them to promote relevant movies and increase the likelihood of user engagement and
conversions and many more.
ADVANTAGES:
Movie recommender systems offer several advantages to both users and

businesses. Here are some key advantages of movie recommender systems:
❖ Personalized Recommendations: Movie recommender systems provide personalized

movie suggestions based on individual preferences, viewing history, and behavior. This
helps users discover new movies that align with their interests, saving them time and
effort in searching for content.
❖ Increased User Engagement: Recommender systems play a crucial role in keeping users
Page | 104
engaged with a platform or service. By suggesting relevant movies, these systems

encourage users to explore and consume more content. This leads to increased user
satisfaction, longer session durations, and a higher likelihood of repeat visits.
❖ Discovery of New Movies: Movie recommender systems expose users to a wider range
of movies and genres that they may not have otherwise discovered. By analyzing user
preferences and behaviors, these systems recommend movies that have a higher chance
of being well-received by the user.
❖ Enhanced User Retention: Personalized movie recommendations contribute to improved

user retention. When users consistently receive relevant suggestions that align with their
preferences, they are more likely to remain engaged and loyal to the platform or service.
❖ Increased Revenue and Sales: Recommender systems can have a positive impact on
revenue and sales for businesses. By suggesting movies that align with users' preferences,
these systems can increase the likelihood of users renting or purchasing movies.
❖ Improved User Satisfaction: When users receive relevant movie recommendations that
cater to their preferences, they experience a higher level of satisfaction with the platform
or service. By delivering a personalized and curated experience, recommender systems
show that the platform understands and values the users' tastes and interests.
❖ Efficient Content Curation: Movie recommender systems automate the process of content
curation by analyzing vast amounts of user data and applying algorithms to generate
personalized recommendations. This reduces the manual effort required for users to
search and browse through a vast catalog of movies.
Page | 105
LIMITATION OF THE SYSTEM:
➢ System will recommend those movies which are in the dataset that is downloaded from
Kaggle, others will be shown error.
➢ It may provide inaccurate results if data entered incorrectly.
➢ For all recent movies, it will not recommend movies to the users as these movies are not
in the dataset provided here.
CONCLUSION:
Movie recommender systems have transformed the way we discover and engage
with movies. They offer personalized recommendations tailored to individual
preferences, leading to increased user satisfaction, engagement, and retention. By
suggesting movies that align with users' tastes, these systems save time and effort in the
search for content and help users explore new genres and hidden gems. Incorporating user
feedback and ratings, considering contextual factors, adopting hybrid approaches, and
providing explanations for recommendations can enhance the accuracy and relevance of
movie suggestions. Additionally, allowing user control and customization empowers
users to have a more active role in their movie selection process. As technology advances,
it is important for movie recommender systems to continue evolving to meet the ever-
changing needs and preferences of users. By addressing the challenges and incorporating
user-centric approaches, these systems can continue to provide valuable
recommendations and contribute to a more enriching and enjoyable movie-watching
experience for users worldwide. Moving forward, it is essential for movie recommender
systems to focus on providing diverse and contextually relevant recommendations, while
also respecting user privacy and offering transparency in their recommendation
processes. Striking a balance between personalization and serendipity, along with
considering user preferences and evolving tastes, will contribute to the continued
improvement and effectiveness of movie recommender systems.
Page | 106
REFRENCES:
[1] Li H, Cai F, and Liao Z. Content-Based Filtering Recommendation Algorithm using HMM.
IEEE Transactions on Computational and Information Sciences, 8(1):275-277, 2012.
[2] Hirdesh Shivhare, Anshul Gupta and Shalki Sharma (2015), “Recommender system using
fuzzy c-means clustering and genetic algorithm based weighted similarity measure”, IEEE
International Conference on Computer, Communication and Control.
[3] Manoj Kumar, D.K. Yadav, Ankur Singh and Vijay Kr. Gupta (2015), “A Movie
Recommender System: MOVREC”, International Journal of Computer Applications (0975
– 8887) Volume 124 – No.3.
[4] Nitasha Soni; Krishan Kumar; Ashish Sharma; Satyam Kukreja; Aman Yadav Machine
Learning Based Movie Recommendation System | IEEE Conference Publication | IEEE
Xplore
[5] Shourya Chawla; Sumita Gupta; Rana Majumdar Movie Recommendation Models Using
Machine Learning | IEEE Conference Publication | IEEE Xplore
Page | 107

AYASKANTA PARIDA - Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AYASKANTA PARIDA - Report

Uploaded by

Copyright:

Available Formats

Figure 24 : File Structure

MOVIE RECOMMENDER SYSTEM

Department of Computer Science and Engineering

Major Project Report on

MOVIE RECOMMENDER SYSTEM

Submitted in partial fulfillment of the

Department of Computer Science and Engineering

Department of Computer Science and Engineering

TRIDENT ACADEMY OF TECHNOLOGY

This B.Tech Viva-Voce Examination of the Major Project work

MENTOR HEAD OF THE DEPARTMENT

(Dr.Subhra Swetanisha) (Mrs. Padmabatí Chand)

Iconvey my sincere thanks to our HOD, Department of CSE and the

The movie industry has experienced exponential growth,

Department of Computer Science and Engineering

This B.Tech Viva-Voce Examination of the Major Project work

Place: Bhubaneswar Salanti Sachi padma

Date: June 25, 2023 Dr. Subhra Swetanisha

Dept. of CSE, (Project Guide)

Date: June 25, 2023 HOD, Dept. of CSE

3 COMMON MACHINE LEARNING ALGORITHMS 5-6

4 REAL WORLD MACHINE LEARNING USE CASES 6-7

5 CHALLENGES OF MACHINE LEARNING 7-9

6 WHAT IS LEARNING ? 10-11

7 AI VS ML VS DEEP LEARNING 12-14

8 IMPORTANCE OF MACHINE LEARNING 14-19

9 FEATURES OF MACHINE LEARNING 19-24

10 TYPES OF MACHINE LEARNING 24-31

14 LIBRARIES REQUIRED 41-82

16 WORKING (including Code with Output) 83-100

20 LIMITATIONS OF THE SYSTEM 106

These systems help optimize the content discovery process, ensuring

Finally, we will examine popular implementations of movie recommender

Figure 1: Movies Poster

WHAT IS MACHINE LEARNING ?

Machine learning is programming computers to optimize a performance

Figure 2: Machine Learning

Machine learning (ML) is a field of inquiry devoted to understanding and

Machine learning algorithms are used in a wide variety of applications,such as in

Figure 3: Machine Learning Working

COMMON MACHINE LEARNING ALGORITHMS

A number of machine learning algorithms are commonly used. These include:

REAL-WORLD MACHINE LEARNING USE CASES

• Speech recognition: It is also known as automatic speech recognition (ASR),

strategies. This approach is used by online retailers to make relevant product

CHALLENGES OF MACHINE LEARNING

• Bias and discrimination

Instances of bias and discrimination across a number of machine learning systems

Since there isn’t significant legislation to regulate AI practices, there is no real

A computer program which learns from experience is called a Machine Learning

• Learning is any process by which a system improves performance from

• Currently, it is being used for various tasks such as imagerecognition, speech

a) Handwriting Recognition Writing Problem:

✓ Task T: Recognising classifying handwritten words within images.

✓ Training experience E : A dataset of handwritten words with given

b) A Robot Learning Problem

✓ Task T: Driving on highways using vision sensors.

✓ Performance measure P: Average distance traveled before an error.

✓ Training experience E: A sequence of images and steering commands

c) A Chess Learning Problem

✓ Task T: Playing chess

✓ Performance measure P: Percent of games won against opponents.

✓ Training experience E: Playing practice games against itself.