You are on page 1of 10

INTRODUCTION

Introduction
Population projections are an estimation of birth rate, demise rate and migration of
population from one place to another used for future prediction. The prediction and
forecasting are very 11 | Pagedifficult tasks. With the help of Machine learning prediction
and forecasting can be done efficiently. Machine learning consists of different computer
algorithms, from which it got the ability to learn. Prediction can be done with two
different ways of machine learning techniques such as supervised learning and
unsupervised learning. The algorithm is trained on a predefined set of training examples
which permit the algorithm to get a prediction from a dataset in supervised learning. In
the unsupervised algorithm, the algorithm is given a collection of unlabeled data and it
must find pattern relationships and attempt to label the data.

Background
A key component of successful countries is the ability to plan flexibly for the short and
long term. Planning must be based on good information about the present situation, how
important variables will change in the future, and how much faith is placed in future
predictions. Members of a population rely on the same resources and are subject to
similar environmental constraints, and depend on other members' availability to persist
over time. Scientists study populations by examining how individuals in that population
interact with their environment. The information obtained from the population studies
can be a determining factor in future planning.

Problem Statement
Forecasting the trends in the human population is a complex problem. There are a
number of uncertainties associated with the population of any country. Traditionally,
statistical approaches have been used to forecast the demography of a country. But these
approaches are not very suitable for predicting a chaotic system like the population.
Firstly, the approaches make some assumptions, which are sometimes found unrealistic.
Secondly, statistical population forecast procedures cannot deal with intrinsic chaos.
Additionally, population growth is difficult to forecast because of some events that can
alter a location's population profile within the shortest period. Events like birth rate, death
rates, and migration. Migration can speedily alter a population, especially when it is a
result of life-changing outcomes like war. In this case, an area's demographic setting is
affected while another place's population is increased. Some mathematical models like
the geometrical and arithmetical increase models may accurately forecast the population
of either new cities or old cities over a decade using historical data while assuming some
factors remain constant. However, they are widely accepted but are limited in their areas
of application. Optimizing these models or adopting a more effective model is crucial so
that a more accurate result can be achieved at the lowest cost.

Objective
The major objectives of the project are listed below:

● To develop a simple population prediction model that automatically predicts the


population for the future.
● To compare different algorithm for creating the model and calculate their accuracy.

Report Organization
This report consists of five chapters where each chapter gives information about different
related topics. The first chapter is the introduction where the entire project is introduced
and defined. This chapter has several parts such as an introduction, background, problem
statement, objective, and report organization. The second chapter deals with the
requirement analysis and feasibility analysis. The third chapter discusses the model
design. It consists of data collection and algorithm used. The fourth chapter deals with
the implementation and testing process. The fifth chapter concludes the project.
Literature Review
Machine learning approaches to the health of social determinants [1] describes a linear
regression of age and gender. The three attributes Prediction, fit, and interpretability were
made for comparison with different machine learning algorithms. Missing Population [2],
effectively predict the population in a region. Linear Regression is used to make
predictions on the data and its results are easy to understand.

Gavril Ognjanovski makes a prediction about the Swedish population growth [3]. The
data may be of any size it forecast the accuracy model [4] and estimates the statistical
relationship. Other papers also highlight the various problems faced by overpopulation.
Variation in the population rate is discussed in the current research and the importance to
provide the necessary sources to solve the problem. To find the accuracy of the
regression analysis several alternative functions are used in the regression model.

3.1 Methodology
Forecasting is a technique that predicts the future value based on past value. Usually, the
information about the products and services is not predictable and most of them are uncertain.
Forecasting mainly focuses on long-range planning, budgets and cost controls, future sales,
new products in markets, and Production of various products and operations. The types of
forecasting methods include Qualitative methods and Quantitative methods. The qualitative
method is based on subjective opinions got from one or more experts and the quantitative
method is based on data and analytical techniques.

3.1.1 Data Collection


The dataset used here has been collected from the Census report of a certain country’s
government website link containing the Decadal population census variation which covers
the data of the sample population from the year 1965 to 2018.
Figure 1: Sample Population from the year 1960 to 2018

3.1.2 Preprocessing
The dataset load into the Weka machine learning tool and then normalize the data to ease out
the prediction of growth rate with machine learning. The given columns (Year, Country
Name, Country Code, Indicator Name, Indicator Code & Population) from the dataset have
been taken and saved as CSV file format which will be used for the Weka tool.

There were some missing values in the data set. Based on the variables we used the mean and
mode of all the values of the variables. The missing values in the given dataset were ignored
to predict without using them.

Figure 2: Missing values of variables


Next, the distribution of variables was studied. For this histogram was used. A study of the
distribution of data gave a general idea about the variables related to the populations.

Figure 3: Histogram of populations of countries


The data is then processed with different machine learning algorithms like Linear Regression,
Support Vector Regression, Multi-Layer perceptron and Decision tree classifier techniques
and find the Mean error value to find the approximation in the prediction.

System Design

3.2.1 Decision Tree Algorithm


Decision Tree algorithm is one of the techniques for supervised learning. Using a decision
tree algorithm, regression and classification problems can be solved. A decision tree is used
to build a training model which is used to predict the target value. In a decision tree, the
prediction starts from the root node and follows the branch node which has the corresponding
value and moves towards the next node. This way prediction is done and the results are
obtained.

3.2.2 Naïve Bayes


Naïve Bayes methods are a set of supervised learning algorithms based on applying Bayes’
theorem with the “naïve” assumption of conditional independence between every pair of
features
given the value of the class variable. Bayes’ theorem states the following relationship, given
class
variable y and dependent feature vector x1 through xn:

Implementation
A population prediction model was developed which can effectively predict the population
growth of the country by observing the past population growth rate. To develop a working
system, implementation was done in a single phase where we preprocessed the data and
created a model to predict population growth using Decision Tree and Naïve Bayes
Algorithms.

Tools Used
1. Python
Python was used as a core programming language to develop the model.
2. NumPy

NumPy library was used to work with a multidimensional arrays, linear


algebra and matrices.

3. Sklearn

Sklearn was used to create a machine learning and classification models.


4. Pandas
Pandas was used for data manipulation.
5. Matplotlib
Matplotlib was used for statistical data visualization.

Testing
Accuracy, Precision, Recall and F-measure are taken to validate the performance of the
model. The overall test ensured the validity and reliability of the system. The accuracy
achieved using the decision tree was 62.60%, precision was 0.83, recall was 0.611 and f-
measure was 0.71. The accuracy for Naïve Bayes was 83.74%, precision was 0.83, recall
was 0.97 and f-measure was 0.90.
Linear Regression

Linear regression is one of the most basic types of machine learning, in which we train a
model that predicts the data’s actions on certain variables. The two variables on the x-axis
and y-axis should be linearly associated in linear regression.
Assume that ‘x’ and ‘y’ are two variables on the regression line. The value will be linearly
upward, that is whenever ‘x’ increases ‘y’ will also increase, or if ‘x’ decreases the value of
‘y’ is also decreases. Mathematically, a linear regression equation can be expressed as:
y=a+bx
Where a is y-intercept of the line, b is the slope of the line, ‘x’ is independent variable and ‘y’
is dependent variable.

Results and Discussion


In the dataset, the year, growth of the population, male and female, is used to predict the
future growth in value. Prediction output results of four machine learning techniques Linear
regression, Support Vector Regression, Multi-layer perceptron and Decision tree were
compared with mean square error value. The technique which gives less mean square error
will be the best technique.
The forecasting of population of male and female using Linear regression, Support Vector
Regression, Multi-layer perceptron and Decision tree.

Figure 3: Prediction of Population in 2021

Figure 4: Population Prediction using Machine learning Techniques.


Population prediction results obtained for future decadal variation is shown in figure2. With
the help of Machine learning prediction and forecasting techniques the results got different
values for the population. The four algorithms, Linear regression, Support Vector Regression,
Multi-Layer perceptron and Decision tree were used for the prediction techniques and got the
above result.

Figure 5: MSE value obtained from Prediction Techniques

The Root Mean Squared Error values are obtained at Cross-validation folds value is 10 while
executing Machine Learning. Among these techniques Linear Regression has the minimum
mean square error which proves that this is the best way to predict the population growth.
This research gives the actual population growth prediction when the country is not affected
by any natural disasters.

Conclusion

Compared to other methods, our current understanding of population growth


modelled quantitatively by regression does perform well. Future work, investing
other machine learning approaches, with other set of data as well as exploring the
models. Finding new ways of analyzing and understanding population growth
should be pursued.

Recommendations

The model can be further enhanced by again training the predicted output from
more datasets and adding more features. Adding some more features to this model
will surely make the model better.

You might also like