You are on page 1of 25

Favourable Candidate Prediction

ACKNOWLEDGEMENT
We would like to give our thanks to our university, Visvesvaraya Technological Institute,
for giving us the opportunity to design and implement this project.

We would also like to thank our Principal, Dr. Suryaprasad J for his gracious access to the
college resources.

Additionally, we would like to thank our Head of Department Dr. Annapurna D. for
graciously allowing us to use and access the resources of the Information Science and
Engineering Department.

We would like to thank our Project Guide Professor Bharathi R, for her untiring guidance
and aid in the course of this project, and for her help in selecting and confirming our project
topic and problem statement.

We would also like to thank our Project Coordinators, Professors Sreenath M. V. and
Kakoli Bora for their aid in preparing templates for our project reports and presentations.

i
Favourable Candidate Prediction

ABSTRACT
Favourability predictions are a major aspect of any election, and can have a huge impact
on the election as a whole and it is associated with various verticals such as economy,
media, corporate policies, etc. In the recent past, favourability prediction has become a
point of interest for media, political entities and commons. However, current models only
use sample polls to estimate the popularity of candidates, which is highly inaccurate.
Therefore, there is a need to expand this by use of various datapoints such as previously
elected candidates, GDP, average education levels, infrastructure levels, etc.

Data must be extracted as per above requirements via scraping and wrangling to transform
data from various sources into a structured and high quality dataset. The scope of the project
is for the Bangalore region in the Karnataka State Elections. A model is constructed using
an appropriate Neural Networks algorithm in Deep Learning in order to predict and verify
the results using elections of previous years.

ii
Favourable Candidate Prediction

CONTENTS

ACKNOWLEDGEMENT ……………………………………………………………… i

ABSTRACT …………………………………………………………………………….. ii

LIST OF FIGURES ……………………………………………………………………... v

1. Chapter 1: Introduction ………………………………………………………….. 1


1.1. Introduction ……………………………………………………………… 1
1.1.1. Purpose of the Project …………………………………………… 1
1.1.2. Scope …………………………………………………………….. 2
1.1.3. Definitions, Acronyms and Abbreviations ………………………. 2
1.2. Literature Survey ………………………………………………………… 4
1.3. Existing System ………………………………………………………….. 5
1.4. Proposed System ………………………………………………………… 5
1.5. Statement of Problem ……………………………………………………. 5
1.6. Summary ………………………………………………………………… 6

2. Chapter 2: Software Requirement Specification ………………………………… 7


2.1. Software Requirements Specification …………………………………… 7
2.2. Operating Environment ………………………………………………….. 8
2.2.1. Hardware Requirements …………………………………………. 8
2.2.2. Software Requirements ………………………………………….. 8
2.3. Functional Requirements ………………………………………………… 8
2.4. Non-functional Requirements …………………………………………… 9
2.5. Summary ……………………………………………………………….. 11

3. Chapter 3: High Level Design ………………………………………………….. 11


3.1. High Level Design ……………………………………………………… 11
3.2. Design Considerations ………………………………………………….. 12
3.2.1. Assumptions and Dependencies ………………………………... 12
3.2.2. Goals and Constraints …………………………………………... 12
3.3. System Architecture ……………………………………………………. 12

iii
Favourable Candidate Prediction
3.4. Machine Learning Pipeline ……………………………………………... 13
3.5. Data Flow Diagrams ……………………………………………………. 15
3.5.1. Data Flow Diagram Level 0 ……………………………………. 15
3.5.2. Data Flow Diagram Level 1 ……………………………………. 16
3.5.3. Data Flow Diagram Level 2 ……………………………………. 16
3.6. Activity Diagrams ……………………………………………………… 17
3.6.1. Activity Diagram for the Overall Process ……………………… 17
3.6.2. Activity Diagram for the Prediction Process …………………... 18
3.7. Summary ……………………………………………………………….. 18

REFERENCES ………………………………………………………………………….. a

iv
Favourable Candidate Prediction

LIST OF FIGURES

Figure Number Figure Name Page Number

Figure 2.1. Requirements of the System 7

Figure 3.1. General High-Level Design 11

Figure 3.3. System Architecture of the Project 13

Figure 3.4. The Architecture of the ML Pipeline 14

Figure 3.5.1. Data Flow Diagram Level 0 15

Figure 3.5.2. Data Flow Diagram Level 1 16

Figure 3.5.3. Data Flow Diagram Level 2 16

Figure 3.6.1. Activity Diagram for the Overall Process 17

Figure 3.6.2. Activity Diagram for the Prediction Process 18

v
Favourable Candidate Prediction
Chapter 1

Introduction

1.1. Introduction

Elections are a major aspect of our modern, republic and democratic society. They
greatly influence all aspects of our lives, including our economy and financial systems,
local and national policies, education, taxes, etc. Hence, predicting the results of an election
is a very important activity that many entities, including the involved political parties,
media outlets, and data analysis companies all participate in.

However, current systems for election result predictions are pretty poor, with such
mechanisms as exit polls being regarded as completely inaccurate. Additionally, most
systems try and predict overall numbers, rather than trying to predict which candidate is
favoured to win. Despite the importance of elections, the prediction strategies for them are
very poor, and are restrained to only a few aspects of the election, rather than focusing on
the important details as well.

Thus, there is a need to develop a superior system, which not only better predicts
the results of an election, but also goes down to finer details of the results.

1.1.1. Purpose of the Project

The existing systems are incredibly inaccurate and fail to fully quantify all aspects
of the election. Hence, we propose a system which can predict who is the most favourable
candidate in an election, for any particular voting constituency. We try and use a wide
variety of sources, including GDP, education, and infrastructure levels (including water
access, electricity access, road and rail infrastructure, etc.) of the concerned constituency
to classify a candidate as most favoured candidate or not.

Due to the lack of easily available data, there is also a need to find this data, scrape
and wrangle it, and compile into a unified, high-quality structured, and comprehensive
dataset which can provide all the information needed to perform the prediction.

1
Favourable Candidate Prediction
We implement a Deep Neural Network to perform the prediction process, where
given the dataset as input, we obtain a list of constituencies and the corresponding favoured
candidates a s our outputs.

1.1.2. Scope

In the current era, the Information Age, we have access to a large amount of data
from numerous sources. With the aid of this data, we perform the prediction process as
using a Deep Learning algorithm.

For the scope of the project, we work on the following ideas:

• It is possible to predict the most favourable candidate for a constituency using a


suitable Machine Learning algorithm.

• There is correlation between data such as GDP, education, amenities such as access
to water, electricity, healthcare, etc. and which candidate is most favourable.

• Due to constraints on data availability and time, we restrict ourselves to working on


predicting favourable candidates for the Bangalore district in the Karnataka
Legislative Assembly elections.

• For testing purposes, we use the 2008 elections to predict the 2013 election’s
favourable candidate, and similarly using the 2013 elections for prediction in the
2018 elections.

• A classification is accurate if the predicted favoured candidate is actually elected.

1.1.3. Definitions, Acronyms and Abbreviations

A. Favourable Candidate Prediction System (FCP System):

The name of the system developed for this project is the Favourable
Candidate Prediction System, or the FCP System.

B. Deep Learning:

Deep learning is a class of machine learning algorithms that uses multiple


layers to progressively extract higher level features from the raw input. For

2
Favourable Candidate Prediction
example, in image processing, lower layers may identify edges, while higher
layers may identify the concepts relevant to a human such as digits or letters
or faces.

C. Deep Neural Network (DNN):

A deep neural network (DNN) is an artificial neural network (ANN) with


multiple layers between the input and output layers. The DNN finds the
correct mathematical manipulation to turn the input into the output, whether
it be a linear relationship or a non-linear relationship. The network moves
through the layers calculating the probability of each output. For example,
a DNN that is trained to recognize dog breeds will go over the given image
and calculate the probability that the dog in the image is a certain breed. The
user can review the results and select which probabilities the network should
display (above a certain threshold, etc.) and return the proposed label. Each
mathematical manipulation as such is considered a layer, and complex DNN
have many layers, hence the name "deep" networks.

D. Gross Domestic Product (GDP)

Gross domestic products (GDP) is a monetary measure of the market value


of all final goods and services produced in a specific time period, often
annually.

An IMF publication states that, "GDP measures the monetary value of final
goods and services—that are bought by the final user—produced in a region
in a given period of time (say a quarter or a year)."

Total GDP can also be broken down into the contribution of each industry
or sector of the economy. The ratio of GDP to the total population of the
region is the per capita GDP and the same is called Mean Standard of Living.
GDP is considered the "world's most powerful statistical indicator of
national development and progress."

E. Infrastructure

Infrastructure in this context refers to the amenities available in a region and


to how many residents, where amenities refers to:

3
Favourable Candidate Prediction
• Healthcare access, such as hospitals, dispensaries, welfare systems,
etc.

• Transportation Infrastructure, such as buses, railways, pucca roads,


kutcha roads, etc.

• Access to water through various sources, electricity access, etc.

1.2. Literature Survey

❏ Election Prediction using Big Data Analytics - A Survey: This paper surveys some
very good resources for real time twitter analysis and its role in elections. As
elections are fought more and more digitally, this is an ideal framework for
predicting elections[1].

❏ Election Result Prediction using Deep Learning Techniques: A deep learning


methodology has been used to gather and analyse data from old elections to predict
the result of new elections[2].

❏ Modelling and Forecasting US Presidential Election using Learning Algorithm: The


technique used in this implements a robust big data pipeline whereby several
parameters have been used as part of Machine Learning techniques needed to mine
and analyse data[3].

❏ The Machine Learning in the Prediction of Election: Attribute mining in this paper
involves several preprocessing steps and prediction model uses a supervised
learning pipeline using SVM[4].

1.3. Existing System

The existing systems rely almost entirely on random sampling. However, due to
biases of the samplers, analysts and the inaccuracy of the representation of the sample, polls
such as exit polls are almost always very poor in their predictions. Hence, the system needs
a complete overhaul in order to function with any degree of accuracy.

4
Favourable Candidate Prediction
1.4. Proposed System

We propose a much more comprehensive system that takes in a larger and more
varied dataset. By using a much wider dataset, we are able to more accurately predict the
results of an election, on a higher granularity as well. Our system has the following
characteristics:

• It uses a Deep Learning algorithm to perform the prediction.

• The dataset comprises of the district-wise GDP, education, amenities and previous
election results

• It’s output is a list of constituencies, and the corresponding list of candidates in


order of their favourability.

• The model is trained using previous election results, and then the model is used to
predict the next election’s results.

1.5. Statement of Problem

Favourability predictions are a major aspect of any election and can have a huge
impact on the election as a whole. Thus, having a good and reliable prediction system is a
must. However, current models only use convenience sampling to estimate the popularity
of candidates which while simple and easy, is highly inefficient and prone to bias. We
propose using ML algorithms to solve this problem.

Here we will make use of various data points such as previously elected candidates,
GDP, average education levels, infrastructure levels, etc. (which are all potentially major
influences) to obtain more accurate results. For the scope of this project, we restrict
ourselves to Karnataka State Elections. We use data scraping and wrangling to transform
data from various sources into a structured and high-quality dataset. Then, Deep Neural
Networks algorithm is used to predict election results and we can verify the results with
previous elections data.

5
Favourable Candidate Prediction
1.6. Summary

In this chapter, we introduced and discussed what the Favourable Candidate


Prediction System is, and how it more comprehensively predicts the results of the election.
We discuss why the current systems are poorly suited, and how our system is more suitable.
We discuss the composition of the needed dataset and the mechanism used to perform the
prediction process. A literature survey was performed to find and refer to similar papers,
and the scope and the necessity of the project was discussed in detail.

6
Favourable Candidate Prediction
Chapter 2

Software Requirements Specification

2.1. Software Requirements Specification

A software requirements specification (SRS) is a description of a software system


to be developed. It is modelled after business requirements specification, also known as a
Stakeholder Requirements Specification (StRS). The software requirements specification
lays out the functional and non-functional requirements of the system. Used appropriately,
software requirements specifications can help prevent software project failure. To derive
the requirements, the developer needs to have clear and thorough understanding of the
products under development. This is achieved through detailed and continuous
communications between the project team and stakeholders throughout the software
development process.

Here, we discuss the requirements of the FCP System. We discuss the operating
environment, the functional requirements, and the non-functional requirements of the
system.

Figure 2.1. Requirements of the System


7
Favourable Candidate Prediction

2.2. Operating Environment

This section gives a brief overview of the operating environment in which the
system will run, and the hardware and the software prerequisites of the project.

2.2.1. Hardware Requirements

• Processor - Intel I7 or higher / Xeon v4 or higher

• RAM - 8 GB or higher

• Hard disk - 256 GB or higher

• Dedicated RAM - minimum 4GB.

2.2.2. Software Requirements

• OS - Windows 7/8/10/Linux

• Programming Language - Python 3.7.5

• Libraries

o Beautiful Soup 4.8.1 for Data Scraping

o Pandas 0.25.3/Numpy 0.17.3 and SciKit-learn 0.21.3 for Data Wrangling

o Pandas 0.25.3, SciPy 1.3.1, Matplotlib 3.1.1 for Machine Learning


Algorithms.

o SciKit 0.21.3 for Validation.

Documentation Tool – Microsoft Word and Google Docs

2.3. Functional Requirements

Formal Requirements are a formal quantification of what the stakeholders expect


from the system. In software engineering, a functional requirement defines a function of a
system or its component, where a function is described as a specification of behaviour
8
Favourable Candidate Prediction
between outputs and inputs. Functional requirements may involve calculations, technical
details, data manipulation and processing, and other specific functionality that define what
a system is supposed to accomplish.

The functional requirements of the project are:

• Data Scraping: Not all data that is needed for the process is available directly as
processed datasets. Most of data is posted as news or as some other publication.
Hence, we need to scrape the data from various sources. Data Scraping helps us
tackle this obstacle.
• Data Wrangling: Data wrangling, sometimes referred to as data munging, is the
process of transforming and mapping data from one "raw" data form into another
format with the intent of making it more appropriate and valuable for a variety of
downstream purposes such as analytics. The original data needed for the project is
not necessarily in machine readable format, and needs to be transformed into the
correct format for the model’s use.
• Data Pre-processing: Data pre-processing is a data mining technique that involves
transforming raw data into an understandable format. Real-world data is often
incomplete, inconsistent, and/or lacking in certain behaviours or trends, and is likely
to contain many errors. Raw data usually has lots missing values and unwanted
attributes with low correlation. These need to be cropped, and the appropriate rows
need to be selected. This is needed to reduce the stress on the Deep Neural Network
model created.
• Model Generation: A model for the prediction of the Favourable Candidate needs
to be created. We use an appropriate Deep Learning algorithm to generate and train
the model.
• Validation: Once created, the model generates a list of Favourable Candidates,
which needs to be checked for accuracy. To do so, we predict and use the previous
elections to validate the FCP System.

2.4. Non-functional Requirements

Non-functional Requirements are the various capabilities offered by the system


which do not directly affect the expected inputs, behaviours and outputs of the system, but
instead focus on how the results are achieved, and the quality of the software developed.

9
Favourable Candidate Prediction
In systems engineering and requirements engineering, a non-functional requirement
(NFR) is a requirement that specifies criteria that can be used to judge the operation of a
system, rather than specific behaviours. They are contrasted with functional requirements
that define specific behaviour or functions. The plan for implementing functional
requirements is detailed in the system design. The plan for implementing non-functional
requirements is detailed in the system architecture, because they are usually architecturally
significant requirements.

The non-functional requirements of the system are:

• Usability: Usability is the ease of use and learnability of a human-made object such
as a tool or device. The FCP System is designed to generate a list of constituencies
and the corresponding Favourable Candidates in a .csv format. Hence, it is available
in a highly usable format for any end user.

• Compliance: In general, compliance means conforming to a rule, such as a


specification, policy, standard or law. Regulatory compliance describes the goal that
organizations aspire to achieve in their efforts to ensure that they are aware of and
take steps to comply with relevant laws, policies, and regulations. The FCP System
uses legally available datasets, from various government and media sources that are
available and free for public use.

• Extensibility: Extensibility is a software engineering and systems design principle


that provides for future growth. Extensibility is a measure of the ability to extend a
system and the level of effort required to implement the extension. The FCP System
in its current scope is limited to the Bangalore region for the Karnataka State
Elections. In the future, we hope that the system can be extended to other regions
in the state, as well cover other elections as well.

• Testability: Software testability is the degree to which a software artefact (i.e. a


software system, software module, requirements- or design document) supports
testing in a given test context. Validation is a vital component of this system, and
hence, we try to make the system and its results as easily testable as possible. It is
necessary to ensure that the model is as accurate as possible, and this may require
extensive testing to do so.

10
Favourable Candidate Prediction
2.5. Summary

In this chapter, we discuss the operating environment, functional requirements, non-


functional requirements and the user characteristics of the FCP System. This is an important
aspect of the system which quantifies what is expected of the system, and the conditions
needed to run the system. We clarify the inputs and outputs of the system, as well as how
the system is constructed. This demonstrates all the expected capabilities of the system for
the stakeholders.

11
Favourable Candidate Prediction
Chapter 3

High Level Design

3.1. High Level Design

High-level design (HLD) explains the architecture that would be used for
developing a software product. The architecture diagram provides an overview of an entire
system, identifying the main components that would be developed for the product and their
interfaces. The HLD uses possibly nontechnical to mildly technical terms that should be
understandable to the administrators of the system. In contrast, low-level design further
exposes the logical detailed design of each of these elements for programmers. High-level
design usually includes a high level architecture diagram depicting the major modules and
interfaces in the system.

As per Figure 3.1., the proposed system consists of the following components:

• Data is collected, trimmed and cleaned, to ready it for use.

• We then preprocess the data, by using Correlation to determine which attribute


can help predict the results.

• We then use an appropriate Deep Learning Algorithm to generate a prediction


model for our project. (This step includes analysis and evaluation of results to
verify accuracy.)

• Finally, we feed in the production model to obtain the final desired results.

Figure 3.1. General High-Level Design

11
Favourable Candidate Prediction

3.2. Design Considerations

This section describes the issues which need to be addressed or resolved before
attempting to devise a complete design solution.

3.2.1. Assumptions and dependencies

We assume that a wide variety of sources, including GDP, education, and


infrastructure levels (including water access, electricity access, road and rail infrastructure,
etc.) of the concerned constituency help classify a candidate as most favoured candidate or
not.

3.2.2. Goals and Constraints

Our proposed system will be able to predict who is the most favourable candidate
in an election for a voting constituency. We will implement a Deep Neural Network to
perform the prediction process, where given the dataset as input, we obtain a list of
constituencies and the corresponding favoured candidates as our outputs but due to
constraints on data availability and time, we restrict ourselves to working on predicting
favourable candidates for the Bangalore district in the Karnataka Legislative Assembly
elections.

3.3. System Architecture

The architecture diagram provides an overview of an entire system, identifying the


main components that could be developed for the product and their interfaces. In Figure
3.3., the system architecture, based on a basic ML Algorithm, consists of the following
stages:

1. It starts with the data collection and the selection of features.

2. Datasets are used for training purpose and a set of the dataset is also conserved for
testing the system.

12
Favourable Candidate Prediction
3. We cyclically generate candidate models and then evaluate them for accuracy.
candidate model generation is using the training data, whereas the evaluation is
using the testing data

4. We check accuracy of the classification result of the final prediction result.

Figure 3.3. System Architecture of the Project

3.4. Machine Learning Pipeline

A Machine Learning Pipeline is a way of representing machine learning algorithms


which divides the process of constructing a ML project into distinct steps. In Figure 3.4.,
each step contributes one separate part of what needs to be done for the process, and hence
each step can represent a module or a set of modules. (Note: Some steps are abstract, and
do not form a module.)

For our project, the Pipeline elements are:

1. Problem Definition: This defines what data needs to be obtained, and what
algorithm is to be used.

2. Data Ingestion: Here, all relevant datasets are obtained. This includes the state GDP
statistics, the census data, election results, etc.

13
Favourable Candidate Prediction

Figure 3.4. The Architecture of the ML Pipeline

14
Favourable Candidate Prediction
3. Data Preparation and Preprocessing: Here, data is transformed so that the final
dataset ready for use in the algorithm is generated. Unneeded attributes are dropped,
relevant rows are selected, and the datasets are combined into one dataset.

4. Splitting the Data: Here, data needs to be separated into two parts, where one part
is used to construct the model in Step 5, while the other part is used to evaluate the
model in Step 6.

5. Model Training: Here, the chosen Deep Learning algorithm is used to construct the
prediction model for the FCP System.

6. Candidate Model Evaluation: Here, the predicted Favourable Candidates list is


checked to see how accurate it is.

7. Model Deployment: Once the model is generated and validated, we can now
produce the final list of Favourable Candidates for the election.

8. Performance Analysis: Our model’s outputs can be used to numerically describe


accuracy.

3.5. Data Flow Diagrams

Data flow diagrams are used to graphically represent the flow of data in a business
information system. DFD describes the processes that are involved in a system to transfer
data from the input to the file storage and report generation.

Data flow diagrams can be divided into logical and physical. The logical data flow
diagram describes the flow of data through a system to perform certain functionality of a
business. The physical data flow diagram describes the implementation of the logical data
flow. Figures 3.5.1., 3.5.2., and 3.5.3. describe the level 0, 1 and 2 data flow diagrams for
the FCP System.

3.5.1. Data Flow Diagram - Level 0

Figure 3.5.1. Data Flow Diagram Level 0


15
Favourable Candidate Prediction
3.5.2. Data Flow Diagram - Level 1

Figure 3.5.2. Data Flow Diagram Level 1

3.5.3. Data Flow Diagram - Level 2

Figure 3.5.3. Data Flow Diagram Level 2


16
Favourable Candidate Prediction
3.6 Activity Diagram

Activity diagram is defined as a diagram that focuses on the execution and flow of
the behaviour of a system instead of implementation. Activity diagrams consist of activities
that are made up of actions which apply to behavioural modelling technology.

Activity diagrams are used to model processes and workflows. The essence of a
useful activity diagram is focused on communicating a specific aspect of a system's
dynamic behaviour. Activity diagrams capture the dynamic elements of a system.

3.6.1. Activity Diagram for the Overall Process

Figure 3.6.1. Activity Diagram for the Overall Process

17
Favourable Candidate Prediction
3.6.2. Activity Diagram for the Prediction Process

Figure 3.6.2. Activity Diagram for the Prediction Process

3.7. Summary

This chapter gives us the detailed description of the user interaction with the system.
To describe this we have used interaction models such as use case diagrams, data flow
diagrams and activity diagrams. It also provides the overall system architecture along with
the assumptions to be made while developing the system and its constraints.
18
Favourable Candidate Prediction References

REFERENCES
[1] Naiknaware, Bharat R., and Seema S. Kawathekar. "Prediction of 2019 Indian
Election Using Sentiment Analysis." In 2018 2nd International Conference on I-
SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) I-SMAC (IoT in
Social, Mobile, Analytics and Cloud)(I-SMAC), 2018 2nd International Conference
on, pp. 660-665. IEEE, 2018.

[2] Zolghadr, Mohammad, Seyed Armin Akhavan Niaki, and S. T. A. Niaki. "Modeling
and forecasting US presidential election using learning algorithms." Journal of
Industrial Engineering International 14, no. 3 (2018): 491-500.

[3] Safiullah, Md, Pramod Pathak, Saumya Singh, and Ankita Anshul. "Social media
in managing political advertising: A study of India." Polish journal of management
Studies 13 (2016).

[4] Franch, Fabio. "(Wisdom of the Crowds) 2: 2010 UK election prediction with social
media." Journal of Information Technology & Politics 10, no. 1 (2013): 57-71.

[5] Architecting a Machine Learning Pipeline:


https://towardsdatascience.com/architecting-a-machine-learning-pipeline-
a847f094d1c7

[6] Workflow of a Machine Learning project:


https://towardsdatascience.com/workflow-of-a-machine-learning-project-
ec1dba419b94

[7] Sentiment Analysis Using Deep Learning Techniques with India Elections:
https://towardsdatascience.com/sentiment-analysis-using-deep-learning-
techniques-with-india-elections-2019-a-case-study-451549c8eb46

[8] How Reliable are Exit Polls: https://www.deccanherald.com/lok-sabha-election-


2019/how-reliable-are-exit-polls-data-says-not-much-734739.html

[9] Data Analytics and Predicting Election Results:


https://datascience.foundation/sciencewhitepaper/big-data-analytics-and-
predicting-election-results

You might also like