You are on page 1of 16

A

Mini Project Report


On
Title of Project
For
Partial fulfillment of award of the
B.Tech. Degree
in
Information Technology

2021-22
Name of Supervisor
Guide Name
Assistant Professor/Associate Professor/Professor
Name Team Member(s)
Student Name (Roll No)

Department pf Information Technology


G. L. Bajaj Institute of Technology and Management
Plot No 2, Knowledge Park-III, Greater Noida-201306
2023
Department of Information Technology

Declaration

I/We herewith declare that the project work conferred during this report entitled
“……………………………………………” , in partial fulfillment of the necessity for
the award of the degree of Bachelor of Technology in Information Technology,
submitted to A.P.J. Abdul Kalam Pradesh Technical University, Uttar Pradesh, is an
authentic record of my/our own work distributed in Department of Information
Technology & Engineering, G.L. Bajaj Institute of Technology & Management, Greater
Noida. It contains no material antecedently printed or written by another person except
wherever due acknowledgement has been created within the text. The project work
reported during this report has not been submitted by me/us for award of the other degree
or certification.

Signature: Signature:

Name: Name:

Roll No : Roll No :

Signature:

Name:

Roll No :

Date:

Place: Greater Noida


Department of Information Technology

Certificate

This is to certify that Project Report entitled


“…………………………………………….” that is submitted by Name of students in
partial fulfillment of the necessity for the award of degree B. Tech. in Department of
Information Technology of Abdul Kalam Technical University, are record of the
candidate own work distributed by him below my/our oversight. The matter embodied
during this thesis is original and has not been submitted for the award of the other
degree.

Date:

Name of Supervisor Dr. P C Vashist

(Designation) Head of Department


Department of Information Technology

Acknowledgement

We would like to express our sincere thanks to our project supervisor Guide Name and
our Head of department Dr. P.C Vashist for their invaluable guidance and suggestions.
This project helped to us to understand the concept of machine learning and IOT. This
project enriches our knowledge and experience of working in a team and a live project.
Also, we would like to express gratitude to Faculty Name for his/her help in
preparation and overview of our project.

Lastly, we would like to thank all the faculties for providing their valuable time
whenever needed for helping us carry on with our project.
TABLE OF CONTENTS (Sample)

ABSTRACT 8

CHAPTER-1: INTRODUCTION TO 1IMAGE PROCESSING 9

1.1 INTRODUCTION 9

1.2 HISTORY 10

1.3 OPENCV 11

1.4 METHODS OF IMAGE PROCESSING 12

1.5 STAGES OF PREPROCESSING 13

1.5.1 ACQUISITION OF IMAGE 13

1.5.2 PREPROCESSING 13

1.5.3 SEGMENTATION 14

1.6APPLLICATION OF IMAGE PROCESSING 14

1.7 ADVANTAGES 15

CHAPTER-2: EXISTING TECHNOLOGIES 18

2.1 VEO 18

2.2 SOLOSHOT 21

CHAPTER-3: IMAGE RECOGNITION IN REAL TIME 23

3.1 INTRODUCTION 23

3.2 REAL TIME FOOTBALL DETECTION 23

3.3 WHAT IS YOLO OBJECT DETECTION 24

3.4 APPLICATIONS OF YOLO 25


3.5 PREREQUISITES OF YOLO ALGORITHM 25

3.6 CONVOLUTIONAL NETWORK IN YOLO 27

3.7 LOSS FUNCTIONS & REDUCTION 30

3.8 YOLO ALGORITHM PROCESS 33

3.8.1 WORKING 33

3.9 ABNORMAL BOUNDARY CASES 35

3.9.1 INTERSECTION OVER UNION 35

3.9.2 NON-MAX SUPPRESSION 37


3.9.3 ANCHOR BOXES 39
3.10 CHALLENGES 41

CHAPTER-4: INTERNET OF THINGS(IOT) 43

4.1 WHAT IS IOT? 43

4.2 HISTORY OF IOT 43

4.3 HOW IOT WORKS 44

4.4 USE OF IOT 45

4.5 RASBERRY PI 47

4.6 SERVO MOTOR 50

4.6.1 INSIDE OF A SERVO MOTOR 51

4.6.2 SERVO MOTOR WORKING MECHANISM 52

4.6.3 SERVO MOTOR WORKING PRINCIPLE 52

4.6.4 HOW DO SERVO MOTOR WORK 53

4.6.5 CONTROLLING A SERVO MOTOR 53

4.7 MOVING CAMERA MODULE 58


CHAPTER-5: INTEGRATING THE CONCEPT 60

5.1 AGENDA 60

5.2 THE IDEA AND EXECUTION 60

5.3 DATASET AND TRAINING 63

5.4 REQUIRENMENTS 63

5.5 SETUP 64

5.6 DETECTION CODE WALKTHROUGH 65

5.7 TRAINING CODE WALKTHROUGH 70

5.8 TRAINING PROCESS IN DEPTH 72

5.9 GOOGLE COLAB 72

5.10 LABELLING TOOL 73

5.11 STEPS(YOLO) 73

5.12 SAMPLES OF LABELLING DATASET USING LABELIMG 74

5.13 SAMPLE OUTPUT OF COLLAB 78

CHAPTER-6: CONCLUSION1 AND FUTURE SCOPE 79

6.1 FUTURE SCOPE 79

6.2 CONCLUSION 79

CHAPTER- 7 ENVIRONMENT AND SUSTAINABILITY……………………………………………………………………….80

REFERENCES 82
Remark:
Instructions for Formatting the Project Report:
1. All the fonts of the text should be in Times New Roman.
2. The Heading should be of Font Size=14 with Bold.
3. The text should be of Font Size=12.
3. There should be 1.5 line spacing between the Texts.
4. Figure & its caption should be center justified with font size 10.
5. Table and its caption should be center justified with font size 10.
6. All the text should be Justified (Select text -> Ctrl +J) in the
Project.
7. Project report should be plag free (less than 10% similarity).
8. Project Report should not less than 60 pages and printed on bond
paper.
9. Hard Binding should be blue with golden print ( contact to
supervisor).

CHAPTER-1
NTRODUCTION
Introduction
News has been the provider of information since centuries. In traditional times, there were news
agencies which were the source of news and hence, reliability and confidentiality remained with
the official organizations itself. In recent times, internet grew rapidly from rural to urban areas.
With the growth of internet, more users from all over the world got access to internet and to
spread the information in their way [1].

According to Economic Times report of 2019, there are 627 million internet users in India which
means India is home to world’s second largest internet user base [2]. However, with the
increasing popularity of social media, the internet becomes ideal breeding ground for fake news.
A research by BBC shows that nearly 72% Indians struggled to distinguish between fake and
real news [3]. Websites like The Onion[4], News Thump[5], The Poke News[6], and The Mash
News[7] are among the top rankers of ‘Fake’ or ‘misleading’ news propagator [8]. Hence, many
online fact checking resources like Snopes[9], FactCheck.org[10], Factmata.com[11],
PolitiFact.com[12] and many more grew rapidly. Social networking sites such as Facebook,
Whatsapp, and Google addressed this particular concern but the efforts hardly contributed in
solving the issue.

Approaches to detect Fake News:

1.1 Detection Approaches Based on Machine Learning: Support Vector Machines (SVMs),
Random forests, logistic regression models, Conditional Random Field (CRF) classifiers,
Hidden Markov Models (HMMs) [13].

1.2 Detection approaches based on deep learning: The two most widely implemented
paradigms in modern artificial neural networks are Recurrent Neural Networks (RNN) and
Convolutional Neural Networks (CNN) [13].

This model will detect fake news by checking the credibility of the news provider, comment sentiment
analysis and content of the provided news. We will be using Natural Language Processing for pre-
processing the dataset and machine learning approach to fight fake news.

Figure 1: Fact Checker [14]


BACKGROUND
There are many models for fact checking and detecting fake news. PolitiFact[12] - A fact-
checking website operated by Poynter Institute in St. Petersburg, Florida which uses Truth-O-
Meter to determine truthfulness of a statement/article/event/Image/video. But the fact checking
is limited to political news and hence fails to cover broad spectrum of news. According to a
survey paper, Facebook fake news sources can be encountered using BS Detector[15]. Another
fact checking website, Factmata[11] provides platform to get better understanding of the content
by providing scores content on nine signals, including Hate speech and Political bias, to give us
a deep understanding of credibility and safety of any content on web. Messenger for businesses
Flock has launched Fake news detector that aims to stop false and misleading information from
being introduced in their environment [16].

In India, fact check has recently been launched by India Today, Times of India, and AFP India
but these resources do not provide platform for users to check whether the news article they are
viewing is fake or real. AltNews [17] has been successful in India to provide platform for user to
clear their doubt, though it is yet to get more efficient and reliable.

Models like Fact Finder, only check whether the news is fake or real. On the other hand,
AltNews website or app works on fake news and publish viral fake news articles. Our model,
performs both work simultaneously.

PROPOSED WORK
In this paper a model is build based on pre-processing data with the use of NLTK library,
removing all the stopwords such as “the”, “is”, and “are” and only using those words which are
unique and provide us with relevant information. We also removed punctuations, numbers and
converted our dataset into lowercase letters. Also we have used Count Vectorizer or TF-IDF
matrix which tallies to how often the word in used in a given article in our dataset, Figure 2
depicts the process from collecting News Articles Dataset to using News Classification
Algorithm. Since the problem concerns with text classification and information extraction, we
have used Naïve Bayes classifier for text-based classification. For training and testing, we have
used Multinomial NB and Passive Aggressive Classifier with 33% training dataset. We will also
remove rare words occurring in our corpus with the help of Count Vectorizer [18-20].

The goal of the project is to make a website and app for user so that whenever he/she selects a
text, the app reflects with floating window and provides user with the percentage of fake and
real news of the selected text. The advantage with the app or website is that without opening or
uploading any content in the app, the app will detect fake news.
Fi
gure 2: Process Flow Diagram

METHODOLOGY
In this section, methodology of proposed model has been described. Figure 3 represents work
flow of methods involved in creating the model. The major steps involved in building the model
are:
1. Corpus of Text Document

2. Text wrangling and pre-processing

3. Parsing and Basic Exploratory Data Analysis

4. Text representation using relevant feature engineering techniques

5. Modeling

6. Evaluation and Deployment

Figure 3: Methodology

Scraping News Articles for Data Retrieval

Currently, the model has been trained using dataset from Kaggle [21] with 6335 rows and 4
columns. News articles will be scraped from, inshorts [22], with the help of python libraries
along with NLTK and spacy. A typical news article is also in the HTML section as depicted in
the following image:
Figure 4: The landing page for technology news articles and its corresponding HTML structure [23]

The specific HTML tags can also be used which contain the textual content [24]. Hence, with
the help of libraries such as BeautifulSoup and requests, useful content will be scraped.

Collected dataset contains 6335 rows and 4 columns; the head of the dataset has been depicted
in the following Figure 5:

Figure 5: Dataset of real and fake news articles

Text Wrangling, Cleaning and Pre-processing

Here, the nltk and spacy packages both have been leveraged to process the data. Stopwords can
be used to process data and remove the most common words used in our dataset such as
“and”,”the” and “is”. Along with stop words, HTML tags, accented text, expand contractions,
punctuations, numbers, and special characters are also needed to be removed since they do not
provide relevant information. Lemmatizing and stemming text are done with the help of
functions such as lemmatize_text() and simple_stemmer() respectively.

With the help of TF-IDF vectorizer, word importance in a given article in the entire corpus is
determined. [25]

Data Visualization and Feature Extraction

For better understanding of the dataset, we use matplotlib and seaborn libraries for visualization
and plotting graphs. Using stripplot() method, present in seaborn library statistical plot as
depicted in Figure 6 was formed which shows 0~5000, datasets are REAL while from
5000~10000, datasets are FAKE. CountVectoriser library to remove the rare words was
imported.

Figure 6: Dataset Visualization of Fake news and Real news using Seaborn
X-axis represents label(fake or real), y-axis represents Index
Modeling and Grid Search

With the help of Multinomial NB and Passive Aggressive Classifier, 33% of the dataset was
trained and testing rest 67%. Using confusion matrix, highest accuracy model will be achieved.
[26]

Experimental and Result Analysis

Let’s consider the result as positive, when the classifier classifies news articles as fake:

● The number of True Positives is the number of news articles correctly classified as Fake
News;
● The number of False Positives is the number of news articles incorrectly classified as
Fake News;
● The number of True Negatives is the number of news articles correctly classified as True
News;
● The number of True Positives is the number of news articles incorrectly classified as
True News;

The precision of a classifier is calculated as follows:

Precision = tp / (tp + fp)

where:
tp – number of true positive examples;
fp – number of false positive examples.
The recall of a classifier is calculated as follows:

Recall = tp / (tp + fn), (27)

where fn is a number of false negative examples.


As depicted in figure 7, confusion matrix helps in evaluating the quality of the output of a
classifier, in this case being, Multinomial NB and Passive Aggressive Classifier, on the fake or
real news dataset. Diagonal elements of the matrix represents number of points where predicted
label is equal to true label while off-diagonal matrix of the matrix represents number of points
where prediction of the model fails.

The figure shows the matrix without normalization. Here the results of the matrix changes as the
classification models or vectorizers are changed.

In Matrix 1, combination of Multinomial NB and Tf-Idf Vectoriser

In Matrix 2, combination of Multinomial NB and Count Vectoriser

In Matrix 3, combination of Passive Aggressive Classifier and Tf-Idf Vectoriser

In Matrix 4, combination of Passive Aggressive Classifier and Hashing Vectoriser

Figure 7: Confusion Matrix, without normalization

The precision for the given classifying model is 0.902; recall on the other hand is 0.486.
The precision of the model represents the relevant instances among the retrieved instances,
while recall is the fraction of total amount of relevant instances that were actually retrieved.

CONCLUSION AND FUTURE SCOPE


In this project, the proposed model is Fake News Detection which differentiates the text by text
classification algorithms to tell whether the news is ‘fake’ or ‘real’. For training, 33% dataset
has been used, and 67% data has been used for testing the FND model. The model predicted
fake and real news successfully with 90.2% accuracy.

In future, VADER for sentiment analysis can be used which is more efficient algorithm and a
text classification model that provides us with highest accuracy. Also, existing Fake News
Detection models have worked for news and politics only, scope in Stock Markets, where shares
rise and fall very frequently, still persists.

REFERENCES
1. Kuriakose, Ammu, et al. "ALIKAH-A Clickbait and Fake News Detection System using Natural
Language Processing." 2019 3rd International Conference on Trends in Electronics and Informatics
(ICOEI). IEEE, 2019.
2. “India has second highest number of Internet users after China” - economictimes.com, 2019[Online].
Available : https://economictimes.indiatimes.com
3. “Ordinary Indians are fueling the country’s fake-news crisis” – qz.com, 2018[Online]. Available:
https://qz.com/india
4. “The Onion” – theonion.com [Online]. Available: https://www.theonion.com/
5. “News Thump” – newsthump.com [Online]. Available: https://newsthump.com/
6. “Poke News” – pokenews.com [Online]. Available:
https://thepoke.co.uk/category/news/
7. “Mash News” – mashnews.com [Online].
Available: https://www.thedailymash.co.uk/news
8. “Top 50 Fake News Websites And Blogs on the Web in 2019” – blog.feedspot.com, 2019[Online].
Available: https://blog.feedspot.com/fake_news_blogs/
9. “Snopes” – snopes.com [Online]. Available: https://www.snopes.com/
10. “FACTCHECK.ORG” – factcheck.org [Online]. Available: https://www.factcheck.org/
11. “FACTMATA” – factmata.com [Online]. Available: https://factmata.com/
12. “Fact Checking U.S. Politics | PolitiFact ” – politifact.com [Online].
Available: https://politifact.com/
13. Bondielli, Alessandro, and Francesco Marcelloni. "A survey on fake news and rumour detection
techniques." Information Sciences 497 (2019): 38-55.
14. “Protecting the EU Elections From Misinformation and Expanding Our Fact-Checking Program to New
Languages” – aboutfb.com[Online]. Available: https://about.fb.com/news

15. "B.S. Detector - Browser extension to identify fake news sites", Bsdetector.tech, 2018. [Online].
Available: http://bsdetector.tech/.
16. “Messenger platform Flock launches feature to identify fake news”, economictimes.com, 2019 [Online].
Available: https://m.economictimes.com/small-biz
17. “Alt News”, altnews.com [Online]. Available: https://www.altnews.in/
18.  N. J. Conroy, V. L. Rubin, and Y. Chen, “Automatic deception detection: Methods for finding fake
news,” Proceedings of the Association for Information Science and Technology, vol. 52, no. 1, pp. 1–4,
2015.
19. S. Feng, R. Banerjee, and Y. Choi, “Syntactic stylometry for deception detection,” in Proceedings of the
50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2,
Association for Computational Linguistics, 2012, pp. 171–175.
20. Shlok Gilda,Department of Computer Engineering, Evaluating Machine Learning Algorithms for Fake
News Detection,2017 IEEE 15th Student Conference on Research and Development (SCOReD)
21. “Kaggle”, kaggle.com [Online]. Available: https://kaggle.com
22. “inshorts - stay informed”, inshorts.com [Online]. Available: https://inshorts.com
23. “A Practitioner's Guide to Natural Language Processing (Part I) — Processing & Understanding Text”,
towardsdatascience.com, 2019 [Online]. Available: https://towardsdatascience.com
24. M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised learning of sentence embeddings using
compositional n-gram features,” arXiv preprint arXiv:1703.02507, 2017.
25. H. Rashkin, E. Choi, J. Y. Jang, S. Volkova, Y. Choi, and P. G. Allen, “Truth of Varying Shades:
Analyzing Language in Fake News and Political Fact-Checking,” in Proceedings of the 2017
Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2931–2937.
26. M. Balmas, “When Fake News Becomes Real: Combined Exposure to Multiple News Sources and
Political Attitudes of Inefficacy, Alienation, and Cynicism,” Communic. Res., vol. 41, no. 3, pp. 430–
454, 2014.
27. Naive Bayes classifier. (n.d.) Wikipedia. [Online]. Available:
https://en.wikipedia.org/wiki/Naive_Bayes_classifier. Accessed Feb. 6, 2017.

You might also like