Professional Documents
Culture Documents
ACKNOWLEDGEMENT
We would like to give our thanks to our university, Visvesvaraya Technological Institute,
for giving us the opportunity to design and implement this project.
We would also like to thank our Principal, Dr. Suryaprasad J for his gracious access to the
college resources.
Additionally, we would like to thank our Head of Department Dr. Annapurna D. for
graciously allowing us to use and access the resources of the Information Science and
Engineering Department.
We would like to thank our Project Guide Professor Bharathi R, for her untiring guidance
and aid in the course of this project, and for her help in selecting and confirming our project
topic and problem statement.
We would also like to thank our Project Coordinators, Professors Sreenath M. V. and
Kakoli Bora for their aid in preparing templates for our project reports and presentations.
i
Favourable Candidate Prediction
ABSTRACT
Favourability predictions are a major aspect of any election, and can have a huge impact
on the election as a whole and it is associated with various verticals such as economy,
media, corporate policies, etc. In the recent past, favourability prediction has become a
point of interest for media, political entities and commons. However, current models only
use sample polls to estimate the popularity of candidates, which is highly inaccurate.
Therefore, there is a need to expand this by use of various datapoints such as previously
elected candidates, GDP, average education levels, infrastructure levels, etc.
Data must be extracted as per above requirements via scraping and wrangling to transform
data from various sources into a structured and high quality dataset. The scope of the project
is for the Bangalore region in the Karnataka State Elections. A model is constructed using
an appropriate Neural Networks algorithm in Deep Learning in order to predict and verify
the results using elections of previous years.
ii
Favourable Candidate Prediction
CONTENTS
ACKNOWLEDGEMENT ……………………………………………………………… i
ABSTRACT …………………………………………………………………………….. ii
iii
Favourable Candidate Prediction
3.4. Machine Learning Pipeline ……………………………………………... 13
3.5. Data Flow Diagrams ……………………………………………………. 15
3.5.1. Data Flow Diagram Level 0 ……………………………………. 15
3.5.2. Data Flow Diagram Level 1 ……………………………………. 16
3.5.3. Data Flow Diagram Level 2 ……………………………………. 16
3.6. Activity Diagrams ……………………………………………………… 17
3.6.1. Activity Diagram for the Overall Process ……………………… 17
3.6.2. Activity Diagram for the Prediction Process …………………... 18
3.7. Summary ……………………………………………………………….. 18
REFERENCES ………………………………………………………………………….. a
iv
Favourable Candidate Prediction
LIST OF FIGURES
v
Favourable Candidate Prediction
Chapter 1
Introduction
1.1. Introduction
Elections are a major aspect of our modern, republic and democratic society. They
greatly influence all aspects of our lives, including our economy and financial systems,
local and national policies, education, taxes, etc. Hence, predicting the results of an election
is a very important activity that many entities, including the involved political parties,
media outlets, and data analysis companies all participate in.
However, current systems for election result predictions are pretty poor, with such
mechanisms as exit polls being regarded as completely inaccurate. Additionally, most
systems try and predict overall numbers, rather than trying to predict which candidate is
favoured to win. Despite the importance of elections, the prediction strategies for them are
very poor, and are restrained to only a few aspects of the election, rather than focusing on
the important details as well.
Thus, there is a need to develop a superior system, which not only better predicts
the results of an election, but also goes down to finer details of the results.
The existing systems are incredibly inaccurate and fail to fully quantify all aspects
of the election. Hence, we propose a system which can predict who is the most favourable
candidate in an election, for any particular voting constituency. We try and use a wide
variety of sources, including GDP, education, and infrastructure levels (including water
access, electricity access, road and rail infrastructure, etc.) of the concerned constituency
to classify a candidate as most favoured candidate or not.
Due to the lack of easily available data, there is also a need to find this data, scrape
and wrangle it, and compile into a unified, high-quality structured, and comprehensive
dataset which can provide all the information needed to perform the prediction.
1
Favourable Candidate Prediction
We implement a Deep Neural Network to perform the prediction process, where
given the dataset as input, we obtain a list of constituencies and the corresponding favoured
candidates a s our outputs.
1.1.2. Scope
In the current era, the Information Age, we have access to a large amount of data
from numerous sources. With the aid of this data, we perform the prediction process as
using a Deep Learning algorithm.
• There is correlation between data such as GDP, education, amenities such as access
to water, electricity, healthcare, etc. and which candidate is most favourable.
• For testing purposes, we use the 2008 elections to predict the 2013 election’s
favourable candidate, and similarly using the 2013 elections for prediction in the
2018 elections.
The name of the system developed for this project is the Favourable
Candidate Prediction System, or the FCP System.
B. Deep Learning:
2
Favourable Candidate Prediction
example, in image processing, lower layers may identify edges, while higher
layers may identify the concepts relevant to a human such as digits or letters
or faces.
An IMF publication states that, "GDP measures the monetary value of final
goods and services—that are bought by the final user—produced in a region
in a given period of time (say a quarter or a year)."
Total GDP can also be broken down into the contribution of each industry
or sector of the economy. The ratio of GDP to the total population of the
region is the per capita GDP and the same is called Mean Standard of Living.
GDP is considered the "world's most powerful statistical indicator of
national development and progress."
E. Infrastructure
3
Favourable Candidate Prediction
• Healthcare access, such as hospitals, dispensaries, welfare systems,
etc.
❏ Election Prediction using Big Data Analytics - A Survey: This paper surveys some
very good resources for real time twitter analysis and its role in elections. As
elections are fought more and more digitally, this is an ideal framework for
predicting elections[1].
❏ The Machine Learning in the Prediction of Election: Attribute mining in this paper
involves several preprocessing steps and prediction model uses a supervised
learning pipeline using SVM[4].
The existing systems rely almost entirely on random sampling. However, due to
biases of the samplers, analysts and the inaccuracy of the representation of the sample, polls
such as exit polls are almost always very poor in their predictions. Hence, the system needs
a complete overhaul in order to function with any degree of accuracy.
4
Favourable Candidate Prediction
1.4. Proposed System
We propose a much more comprehensive system that takes in a larger and more
varied dataset. By using a much wider dataset, we are able to more accurately predict the
results of an election, on a higher granularity as well. Our system has the following
characteristics:
• The dataset comprises of the district-wise GDP, education, amenities and previous
election results
• The model is trained using previous election results, and then the model is used to
predict the next election’s results.
Favourability predictions are a major aspect of any election and can have a huge
impact on the election as a whole. Thus, having a good and reliable prediction system is a
must. However, current models only use convenience sampling to estimate the popularity
of candidates which while simple and easy, is highly inefficient and prone to bias. We
propose using ML algorithms to solve this problem.
Here we will make use of various data points such as previously elected candidates,
GDP, average education levels, infrastructure levels, etc. (which are all potentially major
influences) to obtain more accurate results. For the scope of this project, we restrict
ourselves to Karnataka State Elections. We use data scraping and wrangling to transform
data from various sources into a structured and high-quality dataset. Then, Deep Neural
Networks algorithm is used to predict election results and we can verify the results with
previous elections data.
5
Favourable Candidate Prediction
1.6. Summary
6
Favourable Candidate Prediction
Chapter 2
Here, we discuss the requirements of the FCP System. We discuss the operating
environment, the functional requirements, and the non-functional requirements of the
system.
This section gives a brief overview of the operating environment in which the
system will run, and the hardware and the software prerequisites of the project.
• RAM - 8 GB or higher
• OS - Windows 7/8/10/Linux
• Libraries
• Data Scraping: Not all data that is needed for the process is available directly as
processed datasets. Most of data is posted as news or as some other publication.
Hence, we need to scrape the data from various sources. Data Scraping helps us
tackle this obstacle.
• Data Wrangling: Data wrangling, sometimes referred to as data munging, is the
process of transforming and mapping data from one "raw" data form into another
format with the intent of making it more appropriate and valuable for a variety of
downstream purposes such as analytics. The original data needed for the project is
not necessarily in machine readable format, and needs to be transformed into the
correct format for the model’s use.
• Data Pre-processing: Data pre-processing is a data mining technique that involves
transforming raw data into an understandable format. Real-world data is often
incomplete, inconsistent, and/or lacking in certain behaviours or trends, and is likely
to contain many errors. Raw data usually has lots missing values and unwanted
attributes with low correlation. These need to be cropped, and the appropriate rows
need to be selected. This is needed to reduce the stress on the Deep Neural Network
model created.
• Model Generation: A model for the prediction of the Favourable Candidate needs
to be created. We use an appropriate Deep Learning algorithm to generate and train
the model.
• Validation: Once created, the model generates a list of Favourable Candidates,
which needs to be checked for accuracy. To do so, we predict and use the previous
elections to validate the FCP System.
9
Favourable Candidate Prediction
In systems engineering and requirements engineering, a non-functional requirement
(NFR) is a requirement that specifies criteria that can be used to judge the operation of a
system, rather than specific behaviours. They are contrasted with functional requirements
that define specific behaviour or functions. The plan for implementing functional
requirements is detailed in the system design. The plan for implementing non-functional
requirements is detailed in the system architecture, because they are usually architecturally
significant requirements.
• Usability: Usability is the ease of use and learnability of a human-made object such
as a tool or device. The FCP System is designed to generate a list of constituencies
and the corresponding Favourable Candidates in a .csv format. Hence, it is available
in a highly usable format for any end user.
10
Favourable Candidate Prediction
2.5. Summary
11
Favourable Candidate Prediction
Chapter 3
High-level design (HLD) explains the architecture that would be used for
developing a software product. The architecture diagram provides an overview of an entire
system, identifying the main components that would be developed for the product and their
interfaces. The HLD uses possibly nontechnical to mildly technical terms that should be
understandable to the administrators of the system. In contrast, low-level design further
exposes the logical detailed design of each of these elements for programmers. High-level
design usually includes a high level architecture diagram depicting the major modules and
interfaces in the system.
As per Figure 3.1., the proposed system consists of the following components:
• Finally, we feed in the production model to obtain the final desired results.
11
Favourable Candidate Prediction
This section describes the issues which need to be addressed or resolved before
attempting to devise a complete design solution.
Our proposed system will be able to predict who is the most favourable candidate
in an election for a voting constituency. We will implement a Deep Neural Network to
perform the prediction process, where given the dataset as input, we obtain a list of
constituencies and the corresponding favoured candidates as our outputs but due to
constraints on data availability and time, we restrict ourselves to working on predicting
favourable candidates for the Bangalore district in the Karnataka Legislative Assembly
elections.
2. Datasets are used for training purpose and a set of the dataset is also conserved for
testing the system.
12
Favourable Candidate Prediction
3. We cyclically generate candidate models and then evaluate them for accuracy.
candidate model generation is using the training data, whereas the evaluation is
using the testing data
1. Problem Definition: This defines what data needs to be obtained, and what
algorithm is to be used.
2. Data Ingestion: Here, all relevant datasets are obtained. This includes the state GDP
statistics, the census data, election results, etc.
13
Favourable Candidate Prediction
14
Favourable Candidate Prediction
3. Data Preparation and Preprocessing: Here, data is transformed so that the final
dataset ready for use in the algorithm is generated. Unneeded attributes are dropped,
relevant rows are selected, and the datasets are combined into one dataset.
4. Splitting the Data: Here, data needs to be separated into two parts, where one part
is used to construct the model in Step 5, while the other part is used to evaluate the
model in Step 6.
5. Model Training: Here, the chosen Deep Learning algorithm is used to construct the
prediction model for the FCP System.
7. Model Deployment: Once the model is generated and validated, we can now
produce the final list of Favourable Candidates for the election.
Data flow diagrams are used to graphically represent the flow of data in a business
information system. DFD describes the processes that are involved in a system to transfer
data from the input to the file storage and report generation.
Data flow diagrams can be divided into logical and physical. The logical data flow
diagram describes the flow of data through a system to perform certain functionality of a
business. The physical data flow diagram describes the implementation of the logical data
flow. Figures 3.5.1., 3.5.2., and 3.5.3. describe the level 0, 1 and 2 data flow diagrams for
the FCP System.
Activity diagram is defined as a diagram that focuses on the execution and flow of
the behaviour of a system instead of implementation. Activity diagrams consist of activities
that are made up of actions which apply to behavioural modelling technology.
Activity diagrams are used to model processes and workflows. The essence of a
useful activity diagram is focused on communicating a specific aspect of a system's
dynamic behaviour. Activity diagrams capture the dynamic elements of a system.
17
Favourable Candidate Prediction
3.6.2. Activity Diagram for the Prediction Process
3.7. Summary
This chapter gives us the detailed description of the user interaction with the system.
To describe this we have used interaction models such as use case diagrams, data flow
diagrams and activity diagrams. It also provides the overall system architecture along with
the assumptions to be made while developing the system and its constraints.
18
Favourable Candidate Prediction References
REFERENCES
[1] Naiknaware, Bharat R., and Seema S. Kawathekar. "Prediction of 2019 Indian
Election Using Sentiment Analysis." In 2018 2nd International Conference on I-
SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) I-SMAC (IoT in
Social, Mobile, Analytics and Cloud)(I-SMAC), 2018 2nd International Conference
on, pp. 660-665. IEEE, 2018.
[2] Zolghadr, Mohammad, Seyed Armin Akhavan Niaki, and S. T. A. Niaki. "Modeling
and forecasting US presidential election using learning algorithms." Journal of
Industrial Engineering International 14, no. 3 (2018): 491-500.
[3] Safiullah, Md, Pramod Pathak, Saumya Singh, and Ankita Anshul. "Social media
in managing political advertising: A study of India." Polish journal of management
Studies 13 (2016).
[4] Franch, Fabio. "(Wisdom of the Crowds) 2: 2010 UK election prediction with social
media." Journal of Information Technology & Politics 10, no. 1 (2013): 57-71.
[7] Sentiment Analysis Using Deep Learning Techniques with India Elections:
https://towardsdatascience.com/sentiment-analysis-using-deep-learning-
techniques-with-india-elections-2019-a-case-study-451549c8eb46