Modelling India's Population

Downloaded
from www.clastify.com by Shreshth
Modelling India’s Population

HL Maths Exploration
Downloaded from www.clastify.com by Shreshth
Table of contents
Section Page Number
Introduction 1
Malthusian (Exponential) Model 2
Verhulst’s (Logistics) Model 3
Gompertz Function 6
Comparison of observed and 9
predicted populations
Pearson Correlation Tests 10
Predicting the Future 11
Conclusion 12
Bibliography (not in page count) 13
2
Introduction
Rationale
As I write this task, the Indian population sits at a staggering 1.389 billion (United Nations, 2019) - the second-most populated
country in the world after China. Statistically, this means that 1 in every 6 people you meet is an Indian resident (Trading Economics,
2020). After gaining independence in 1947, the Indian population surged during the second half of the 20th century, crossing the 1
billion benchmark in the year 1997. What’s more, the Indian population is expected to surpass the Chinese population by the year
2027 (United Nations, 2019). During my time in school, I gained an awareness of the concerns regarding the sustainability of rapid
population growth. Problems such as the greenhouse effect can only get worse with a world population that is expected to reach 9.8
billion in 2050, with 1.64 billion of these individuals in India (United Nations, 2017). I wanted to develop an understanding of how
population predictions are made, and the contribution India is expected to have towards the world population in the next few decades.
Aim
My aim was to make use of past population data for India to produce a model that would fit previous trends, and would also predict
future population trends. I hope to understand various population modelling techniques by applying them to the Indian population
since the year 1950. I then choose which model best fits the observed values, and extrapolate this model to predict till the year 2065
– the year in which the Indian population is expected to peak. For clarity, I will remain consistent with symbols:
• R represents the growth rate for a population
• 𝑡! represents the year number (For e.g.: 𝑡" is year 0 or the base year)
• 𝑃! represents the population size at any given time, 𝑡! (For e.g.: 𝑃" is population in the base year)
• Pe is the estimated population size using a model at any given time, 𝑡!
• K represents the carrying capacity of a population (explained further later)
Table 1 shows the Indian population at the start of each year from 1950 to 2019 (World Bank, 2019):
Table 1: Indian observed population from 1950 to 2019 (to the nearest million)
Yr. 𝒕𝒏 𝑷𝒏 Yr. 𝒕𝒏 𝑷𝒏 Yr. 𝒕𝒏 𝑷𝒏 Yr. 𝒕𝒏 𝑷𝒏 Yr. 𝒕𝒏 𝑷𝒏
1950 0 376 1964 14 489 1978 28 668 1992 42 909 2006 56 1165
1951 1 382 1965 15 499 1979 29 683 1993 43 927 2007 57 1183
1952 2 389 1966 16 510 1980 30 699 1994 44 946 2008 58 1201
1953 3 396 1967 17 520 1981 31 715 1995 45 964 2009 59 1218
1954 4 403 1968 18 532 1982 32 732 1996 46 982 2010 60 1234
1955 5 410 1969 19 543 1983 33 749 1997 47 1001 2011 61 1250
1956 6 417 1970 20 555 1984 34 767 1998 48 1019 2012 62 1266
1957 7 425 1971 21 568 1985 35 784 1999 49 1038 2013 63 1281
1958 8 433 1972 22 581 1986 36 802 2000 50 1057 2014 64 1296
1959 9 442 1973 23 595 1987 37 820 2001 51 1075 2015 65 1310
1960 10 451 1974 24 609 1988 38 837 2002 52 1093 2016 66 1325
1961 11 460 1975 25 623 1989 39 855 2003 53 1112 2017 67 1339
1962 12 469 1976 26 638 1990 40 873 2004 54 1130 2018 68 1353
1963 13 479 1977 27 652 1991 41 891 2005 55 1148 2019 69 1370
Figure 1 illustrates the change in the Indian population

between 1950 to 2019. The scatter plot is positively skewed,
so I decided to explore the exponential, logistics and
Gompertz models. I then calculate Pearson correlation
coefficients for each model to determine the best model, and
use this model to predict future populations. Note: for the
purposes being consistent with the raw data obtained from
the World Bank, all population data displayed throughout
this exploration is to the nearest million.
1
Models
Malthusian (Exponential) model
In An Essay on the Principle of Population, written in 1798, Thomas Robert Malthus proposed a differential equation to model
populations such that R increases to ∞ as 𝑡! increases, making the assumption that R > 0 at all times. This is because birth rate tends
to be greater than the death rate, with the exception of unforeseen circumstances such as plagues or wars. Another assumption is
that 𝑃(𝑡" ) = 𝑃" > 0. The differential equation can be solved by separable variables. After grouping like terms together, we integrate
both sides with limits 𝑡" to 𝑡! , as this represents the time period for which we are modelling population growth (Mahaffy, 2004):
dPn
= RPn
dtn
dPn
The derivation of the Malthusian model
= Rdtn
Pn
is based on the assumption that the Pn t
dPn n
function modelling population growth ∫ Pn = t∫ R dtn
P
(Pn) is directly proportional to the rate at 0 0
Pn t
which the function grows. ⎡⎣ ln Pn ⎤⎦ = ⎡⎣ Rtn ⎤⎦ n
P0 t0
ln Pn − ln P0 = Rtn − Rt0
Pn
ln = Rtn − Rt0
P0
$!
≥ 1, as 𝑃! ≥ 𝑃" (assuming population is always increasing), so the equation can be written without the modulus sign:
$"
Pn
ln = Rtn − Rt0
P0
Pn
= e Rtn − Rt0
P0
Pn = P0 e Rtn As t0 = 0, while P0 > 0
To find R, I first calculated the percentage change in population for each year using the following equation:
Population in the current year − Population in the previous year

× 100%
Population in the previous year
Figure 2: Screenshot of first 5 rows of raw data (Author’s Own)
Figure 2 is a screenshot of 5 rows of the Microsoft excel sheet

used to calculate percentage changes in population between years.
I had to choose between using mean and median percentage change. As growth rate constantly changes, taking the mean percentage
would be an inaccurate representation of the growth. As such, I chose to use the median value in my models, which turned out to
be 1.96%, as per Microsoft excel. Expressed as a multiplier, R is equal to 0.0196 (3.s.f), leaving us with our first model:
Pn = 376e0.0196tn
While the model appears to follow the trend in the data quite closely initially, it poses some disadvantages. It plots Indian population
growth such that the growth rate is always increasing. This is unrealistic, as we know from our study of evolution that populations
eventually become stagnant or decline due to the scarcity of resources. Factors such as food supplies and areas of land for living
cannot match the geometric increase of the population, so eventually we expect scarcity, and a slowed rate of growth. The model
doesn’t take this into account, and therefore, cannot accurately plot population growth after a given point in time.
2
Verhulst’s (Logistics) Model

In trying to develop Malthus’ model, Pierre Francois Verhulst suggested that the rate of population growth was dependent on the
population density of region in consideration. At a low population density, population growth is fast, and R slows down as 𝑃!
approaches K – the carrying capacity. As such, the differential equation for the Verhulst population model relates the rate of change
of population with the growth rate and carrying capacity K. Carrying capacity is the estimated upper limit of population, considering
the resources available in the country, and once 𝑃! ≥ 𝐾, the population is not sustainable. Modifying the Malthusian differential
equation to consider these ideas leaves us with (Sharov, 1997):
dPn ⎛ P⎞ When the value of Pn is small, the value

= RPn ⎜ 1− n ⎟
dtn ⎝ K⎠ of this additional multiplier is closer to 1,
dPn so the growth curve mimics the

= Rdtn exponential curve in the initial stages.
⎛ P⎞
Pn ⎜ 1− n ⎟ However, as Pn increases, the value of
⎝ K⎠
the multiplier approaches 0, so the
dPn
∫ ⎛ P⎞
= ∫ R dtn growth curve tends to K
Pn ⎜ 1− n ⎟
⎝ K⎠
To solve the integral, I simplify the left-hand side of the equation. This allows us to use partial fractions:
1 K
LHS = =
Pn Pn (K − Pn )
) Pn (1−
K
K A B After Taking LCM of Pn (K - Pn), numerators
= +
Pn (K − Pn ) Pn K − Pn should be equal to each other
K = A(K − Pn ) + B(Pn )
If we now proceed to take 𝑃! = 𝐾,
K = A(K − K ) + B(K )
K = B(K )
Therefore, B = 1
K = A(K − Pn ) + B(Pn )
K = A(K − Pn ) + (Pn )
K − Pn = A(K − Pn )
Therefore A = 1
The original fraction can now be rewritten:
K 1 1
= +
Pn (K − Pn ) Pn K − Pn
Allowing us to solve our integral:
1 1
∫ P dP + ∫ K − P dP = ∫ R dt
n
n
n
n n
ln Pn − ln K − Pn = Rtn + C
K − Pn
ln = − Rtn − C
Pn
K − Pn
= e− Rtn −C
Pn
3
When A = ±𝑒 %&
K − Pn Substituting Pn with P0 and

= Ae− Rtn tn with t0 leaves us with our
Pn expression for the constant
− Rtn A (as t0 = 0)
Ae Pn + Pn = K
− Rtn
Pn ( Ae + 1) = K
Leaving us with:
K K − P0
Pn = A=
1+ Ae− Rtn P0
Given that India is the 7th largest country in the world (Worldometer, n.d.) and consists of varying individuals, cultures, terrains and
other such factors, it is difficult to predict the maximum number of a people that a country can hold. However, various sources
suggest that India’s carrying capacity is around 1.65 billion, and the country is expected to reach that number in the early 2060s
(World Bank, 2019) (Chandrashekar, 2019). For the purposes of this exploration, we can assume this to be our carrying capacity.
As such, A can now be written as:
⎛ 1650 − 376 ⎞ Note: Carrying capacity has been

A=⎜ = 3.3883 (5.s. f )
376 ⎟⎠
converted from billions to millions
⎝
Since our value for K is only an estimate, predictions may not necessarily be reliable. Another problem involved with the logistics
model is that it is a deterministic model – it does not take into account fluctuations in the value of K (Hillen, n.d.). Suppose India
works towards accommodating a larger population by, for example, reclaiming land on the coastlines, the value of K would increase.
On the contrary, situations ranging from natural disasters to other factors such as exportation of fuel would lead to a decrease in the
value of K. Overall, however, the consideration of a peak holding capacity within the model allows for a more realistic model of
growth when compared to previous models explored. Our logistics model can therefore be written as:
1650
Pn =
1+ 3.3883e−0.0196tn
I obtained a very poorly correlated model. This was possibly due to the fact that I chose to use the median growth rate value (value
of K). I realised I would have to consider other factors influencing population growth in order to find a more accurate model. This
would involve transformations (stretches and translations) of the current model based on these other. By using the online graphing
tool Desmos, I use transformations to produce the following modified logistics model:
1650
Pn = + 204.917
(
1.05 × 1+ 3.3883× e −0.0196×((2.3×tn )−45)
)
Figures 3 and 4 compare my logistics model before and after undergoing transformations:
Figure 3: Initial logistics model (Author’s Own, 2020) Figure 4: Transformed logistics model (Author’s Own, 2020)
4
Multiplying our value of 𝑡! by 2.3 causes a dilation of the model with the y-axis invariant, and multiplying the denominator of the
first term in the model by 1.05 causes a dilation of the model with the x-axis invariant. The need for these dilations suggests that
Indian population growth occurred far faster than expected. Reasons for this is covered below. Additionally, after being dilated, the
model had to undergo translations to ensure that it passed through the coordinate (0,376), as that was our 𝑃" . Explanations for why
these transformations are required are explored after further models, as the next model also required similar adjustments.
To be able to compare the rates of population change for the initial and transformed functions, I found the first derivatives of the
two functions. This would be the first step in helping me to compare the maximum rates of change, or in my context, the maximum
rates of population growth of the initial and transformed models. Finding this difference in actual and predicted growth rates would
allow me to see the relative difference between the two models:
Initial Logistics Function Transformed Logistics Function
d ⎡ 1650 ⎤ d ⎡⎢ 1650 ⎤
⎥
dtn ⎣ 1+ 3.3883e−0.0196tn ⎥⎦
⎢ + 204.917
0.0196tn
⎣ (
dtn ⎢ 1.05 × 1+ 3.3883e−0.0196(2.3tn −45) ) ⎥
⎦
10957762200e
= 24002717200e0.0196(2.3tn −45)
(10000e0.0196tn + 33883)2 =
(10000e0.0196(2.3tn −45) + 33883)2
I then had to determine which year to find growth rates for. One option was to find the rate of change at the point of inflection (POI)
of the logistics function, which is symmetrical about its POI. Alternatively, I considered finding the rate of change at the midpoint
of my data, which would be the year 1985. I chose to find the rate of change in the year 1985 to allow for a more controlled
comparison. Given the various transformations that have been applied, a comparison of the POIs of the two graphs would not be as
fruitful as comparing the rates suggested by the two models in one fixed year. This is because the POIs would occur at different
times, and it would be unreliable to compare the rates of two different years. I therefore found the rate of population growth in the
year 1985 (Note: the year 1985 is equivalent of tn = 35):
Initial Logistics Function Transformed Logistics Function
10957762200e0.0196×35 24002717200e0.0196(2.3×35−45)
= = 7.53 = = 16.55
(10000e0.0196×35 + 33883)2 (10000e0.0196(2.3×35−45) + 33883)2
Comparing the transformed logistics function with the initial logistics function, the actual maximum rate of population change was
2.20 times (16.55/7.53) higher than the predicted maximum rate. This is evaluated after the exploration of the next model.
Limitations of the Logistics model

Before discussing the limitations of this model, we can acknowledge the fact that the logistics model takes the carrying capacity of
a country into consideration. As such, the model doesn’t predict population growth to increase beyond control. However, there are
limitations using the value of K that World Bank estimates. While the World Bank could be considered a reliable and accurate
source for information, it is still very difficult to make an accurate prediction of the true carrying capacity on a country. Carrying
capacity can be dependent on a large array of factors, but for the purposes of this exploration we can make the assumption that the
World Bank prediction of 1.65 billion as India’s carrying capacity is an accurate estimate. Carrying on from the fact that there are
a number of factors that could alter a country’s carrying capacity, we must consider that our value of K may not be constant. As a
result of human or natural processes, the ability of a country to hold a given capacity of human beings can change. These include
the discovery of new resources or land reclamation (increase capacity) or events with a negative impact such as disease, war and
crime. The value of K is only an estimate, and changes in the conditions of a country between the time when the prediction is made
and the day when the country reaches its peak capacity can change this value of K, perhaps even drastically. If this exploration were
to be repeated, with further data on such factors, adding parameters that take these into consideration would be necessary.
5
Gompertz function
Another extension of Malthus’ model is the Gompertz function, invented by Benjamin Gompertz in 1825. His model suggested that
growth rate was slowest at the start, increased in the middle stages, but as lim 𝑃! = 𝐾, the growth rate slowed again. The Gompertz
!→(
function introduces an alternative parameter to the initial Malthusian model, combining growth rate with carrying capacity, just like
the Logistics model. One visual observation with Gompertz function in comparison with the Logistics function is that the period of
time over which the fastest population change occurs is far smaller. (Tjørve & Tjørve, 2017):
dPn ⎛ K⎞ The addition of the Gompertz parameter has

= R ln ⎜ ⎟ Pn
dtn ⎝ Pn ⎠ a similar effect to that of the Logistics
dPn parameter. At small values of Pn, the value
= Rdtn
⎛ K⎞ of the natural logarithm is large, so the
Pn ln ⎜ ⎟
⎝P⎠n growth curve mimics the exponential
dPn model. As Pn approaches K, the ratio K/Pn
∫ ⎛ K⎞ ∫
= R dtn
approaches 1, so the logarithm approaches
Pn ln ⎜ ⎟
⎝P⎠n
0, making the growth curve asymptotic to K
We can use the substitution method to help us integrate the above equation:
⎛ K⎞
Let u = ln ⎜ ⎟
⎝ Pn ⎠
1 ⎛ −K ⎞ P ⎛ −K ⎞ 1
du = × ⎜ 2 ⎟ dPn = n × ⎜ 2 ⎟ dPn = − dPn
K ⎝ Pn ⎠ K ⎝ Pn ⎠ Pn
Pn
Using this substitution:
1
−∫ − dP = ∫ R dtn
⎛ K⎞ n
Pn ln ⎜ ⎟
⎝P⎠
n
1
− ∫ du = ∫ R dtn
u
1
∫ du = − ∫ R dtn
u
ln u = − ⎡⎣ Rtn + C1 ⎤⎦
ln u = − Rtn + C2
u = e− Rtn +C2
u = e− Rtn × eC2
u = C3e− Rtn ⎡C =eC2 ⎤
⎢⎣ 3 ⎥⎦
u = C3e− Rtn → u = ±C3e− Rtn

) )
≥ 1, as 𝐾 ≥ 𝑃! . As such, 𝑙𝑛 must be positive, so we only take the positive value of 𝑢:
$! $#
u = C3e− Rtn
⎛ K⎞
ln ⎜ ⎟ = C3e− Rtn
⎝ Pn ⎠
6
We know that that at 𝑡 = 0, our population 𝑃! = 𝑃" , so:
⎛ K⎞
ln ⎜ ⎟ = C3e− R⋅0
⎝ P0 ⎠
⎛ K⎞
ln ⎜ ⎟ = C3
⎝ P0 ⎠
This is our value of the constant of integration.
Replacing our constant of integration into the equation leaves us with:
⎛ K⎞ ⎛ K⎞
ln ⎜ ⎟ = ln ⎜ ⎟ e− Rtn
⎝ Pn ⎠ ⎝ P0 ⎠
⎛ K ⎞ − Rtn
ln⎜ ⎟ e
K ⎝P ⎠
=e 0
Pn
⎛ K⎞
− ln⎜ ⎟ e− Rtn
⎝ P0 ⎠
Pn = Ke
We can further prove that our limit (carrying capacity) is, in fact, equal to K:
⎛ K⎞
− ln⎜ ⎟ e− Rt0
⎝ P0 ⎠
lim P(tn ) = lim Ke
tn →∞ t→∞
⎛ K⎞
− ln⎜ ⎟ ×0
⎝ P0 ⎠
= lim Ke
t→∞
= lim Ke0
t→∞
=K
So, our Gompertz function before any modifications is given by:
⎛ 1650 ⎞ −0.0196tn
− ln⎜ e
⎝ 376 ⎟⎠
Pn = 1650e
Similar to the logistics model, other factors influencing population growth are not considered by the Gompertz model so it, too,
needed to be modified. As such, our modified Gompertz function is:
⎛ 1650 ⎞ −0.0196×1.6tn
−2.7 ln⎜ e
⎝ 376 ⎟⎠
Pn = 346 + 1650e
Figures 5 and 6 display the Gompertz models before and after undergoing transformations
Figure 5: Initial Gompertz function (Author’s Own, 2020) Figure 6: Transformed Gompertz function (Author’s Own, 2020)
7
Given that population growth has occurred faster than what had been expected, the Gompertz model also needs to undergo dilations
of similar characteristics as the logistics model. The reasons for considering other factors is the same as those for the logistics model.
The actual transformations to the Gompertz model are, however, different to those for the logistics model. One reason for this is the
nature of the model itself. Different models would require different levels of transformation to see the same effect. Like the logistics
models, however, the Gompertz model is dilated in both directions, and then translated to ensure it passes through (𝑡" , 𝑃" ). In a
similar approach to comparing the maximum rates of the initial and transformed logistics functions, I then found the first derivatives
of the initial and transformed Gompertz functions:
Initial Gompertz Function Transformed Gompertz Function
d ⎡ ⎤ d ⎡ ⎤
⎛ 1650 ⎞ −0.0196 tn ⎛ 1650 ⎞ −0.0196×1.6 tn
− ln⎜ ⎟e −2.7 ln⎜ e
⎝ 376 ⎠⎟
⎢1650e ⎝ 376 ⎠ ⎥ ⎢346 + 1650e ⎥
dx ⎢ ⎥⎦ dx ⎢ ⎥⎦
⎣ ⎣
⎛ 1650 ⎞ ⎛ 1650 ⎞
⎛ 1650 ⎞ − ln⎜⎝ 376 ⎟⎠ e ⎛ 1650 ⎞ −2.7 ln⎜⎝ 376 ⎟⎠ e
−0.0196 tn −0.03136 tn
−0.0196tn −0.03136tn
1617 ln ⎜ e 87318ln ⎜ e
⎝ 376 ⎟⎠ ⎝ 376 ⎟⎠
= =
50 625
Next, I find the maximum rate of population at tn = 35, similar to the logistics function method:
Initial Gompertz Function Transformed Gompertz Function
⎛ 1650 ⎞ ⎛ 1650 ⎞
⎛ 1650 ⎞ − ln⎜⎝ 376 ⎟⎠ e−0.0196×35 −0.0196×35 ⎛ 1650 ⎞ −2.7 ln⎜⎝ 376 ⎟⎠ e−0.03136×35 −0.03136×35
1617 ln ⎜ e 87318ln ⎜ e
⎝ 376 ⎟⎠ ⎝ 376 ⎟⎠
= 11.44 = 18.19
50 625
The actual rate of population growth was 1.59 times (18.19/11.44) higher than the initial predicted rate.
The actual maximum rate was 2.20 times the predicted maximum rate shown by the initial and transformed Logistics functions.
The actual maximum rate was 1.59 times the predicted maximum rate shown by the initial and transformed Gompertz functions.
On average:
2.20 + 1.59
= 1.89
2
The actual rate of population growth is approximately 1.89 times the predicted growth rate, which suggests that the Indian population
grew at nearly twice the predicted rate. Comparing maximum rates between the initial and transformed models allows us to quantify
the difference in accuracy of the two models. The accelerated rate of the transformed models can be explained. Following
independence from the Britain in 1947, living standards in the India improved, which meant that fewer people were dying from
diseases related to sanitation. This, combined with a low literacy rate in the early stages of Indian independence, meant that there
was a significant lack of awareness on family planning. Entering the 1970s and 1980s, India’s growth rate was at its peak. In Indian
culture, it is almost a compulsion for individuals to get married and have children at some point in their lives. Some religions even
condemn the use of contraception. Gender bias in more rural communities means that a couple may have to have multiple children
until there is a boy in the family. Those living poverty grew up with the ideology of producing as many children as possible so as
to maximise their income. From a young age, these children work for the family instead of going to school, which means they too
remain uneducated, and the poverty cycle continues. Improvements in the healthcare around the country led to an increase in life
expectancy, so birth rate started to increase far beyond the death rate. The government did try to limit growth through policies such
as “Hum Do Hamare Do” – Hindi for “Two of us, two of ours”, but these failed. Illegal migration from Bangladesh, and other
countries in the Indian subcontinent, led to a surge in numbers. In recent years, however, media access has allowed the government
to spread word on concerns regarding rapid population growth and its effects on the economy. Increased awareness of contraception
and family planning limits their number of children born. As a result, population growth in India has indeed started to slow.
8
Given the vast number of events in Indian history that could have influenced population change, it is difficult to precisely connect
each transformation to specific events in the past. However, the need for these transformation in the first place raised some questions
– they allude to the possibility of inaccuracies in the data collection. Given the success of these functions in their ability to conform
to any population data set in the past, we would expect the mathematics to produce a model that fits the data well, without any
transformations. However, this is not the case. This could suggest that the producers of this population data may have used
conflicting metrics when generating the data, or it is also possible that the inaccurate presentation of data is a government attempt
to censor the true data. If this is true, the constants used in the generation of these models, including the growth rate and the carrying
capacity, are unable to facilitate a model that fits the data in a desirable fashion. As such, while the transformations cannot be
connected to events in the past, they highlight the difference between the actual and predicted trends in population and also function
as the corrections required for the model to conform to the data. This is reinforced by the fact that the transformations have allowed
the functions to emulate the data far more closely.
Comparison of observed and predicted populations

Using the three models, I was able to compare the predicted populations with the observed populations between 1950-2019:
Table 2: Comparison of observed and model predicted populations (to the nearest million) (Author’s Own, 2020)
Year tn Pn Pe Year tn Pn Pe
Exp. Log. Gom. Exp. Log. Gom.
1950 0 376 376 376 376 1985 35 784 747 789 781
1951 1 382 383 383 380 1986 36 802 761 806 800
1952 2 389 391 390 385 1987 37 820 776 823 818
1953 3 396 399 398 390 1988 38 837 792 840 837
1954 4 403 407 405 395 1989 39 855 808 857 855
1955 5 410 415 414 400 1990 40 873 824 874 874
1956 6 417 423 422 406 1991 41 891 840 891 893
1957 7 425 431 430 413 1992 42 909 856 909 912
1958 8 433 440 439 420 1993 43 927 873 926 931
1959 9 442 449 448 427 1994 44 946 891 944 950
1960 10 451 457 458 435 1995 45 964 908 962 969
1961 11 460 466 467 444 1996 46 982 926 979 988
1962 12 469 476 477 452 1997 47 1001 945 997 1007
1963 13 479 485 488 462 1998 48 1019 963 1015 1026
1964 14 489 495 498 472 1999 49 1038 982 1032 1045
1965 15 499 505 509 482 2000 50 1057 1002 1050 1064
1966 16 510 514 521 493 2001 51 1075 1022 1068 1082
1967 17 520 525 532 504 2002 52 1093 1042 1085 1101
1968 18 532 535 544 516 2003 53 1112 1063 1103 1119
1969 19 543 546 556 529 2004 54 1130 1084 1120 1138
1970 20 555 556 568 542 2005 55 1148 1105 1137 1156
1971 21 568 567 581 555 2006 56 1165 1127 1154 1174
1972 22 581 579 594 569 2007 57 1183 1149 1171 1192
1973 23 595 590 608 583 2008 58 1201 1172 1188 1209
1974 24 609 602 621 597 2009 59 1218 1195 1204 1227
1975 25 623 614 635 612 2010 60 1234 1219 1220 1244
1976 26 638 626 649 628 2011 61 1250 1243 1236 1261
1977 27 652 638 664 644 2012 62 1266 1267 1252 1278
1978 28 668 651 679 660 2013 63 1281 1293 1268 1294
1979 29 683 664 694 676 2014 64 1296 1318 1283 1311
1980 30 699 677 709 693 2015 65 1310 1344 1298 1327
1981 31 715 690 725 710 2016 66 1325 1371 1313 1343
1982 32 732 704 740 728 2017 67 1339 1398 1328 1358
1983 33 749 718 756 745 2018 68 1353 1426 1342 1374
1984 34 767 732 773 763 2019 69 1370 1454 1356 1389
9
Sample calculations for the year 1951 (all to the nearest million):
Exponential Gompertz
1650 −0.0196×1.6tn
−2.7 ln( )e
Pn = 346 + 1650e 376
Pn = 376e0.0196tn = 376e(0.0196×1) = 383.442 ≈ 383
1650 −0.0196×1.6×1
−2.7 ln( )e
Pn = 346 + 1650e 376 = 380.421 ≈ 380
Logistics
1650
Pn = + 204.917
(
1.05 1+ 3.3883e−0.0196(2.3tn −45) )
1650
Pn = + 204.917 = 382.99 ≈ 383
(
1.05 1+ 3.3883e −0.0196(2.3(1)−45)
)
Plotting the three models together, the Gompertz model appears to be the best fit for the data, as evident in Figure 7:
Figure 7: Plot of Malthusian (Black), Logistics (Blue) and Gompertz (Green) Models
against the population data on Desmos online graphing tool (Author’s Own, 2020)
Pearson’s correlation coefficient (r) test

To determine which model fit the data most closely, I used a Pearson correlation test. In my sample calculation for the Pearson’s
correlation I use the Gompertz model, as this best followed the trend out of all the models, with 𝑃* values most closely following
𝑃! values. Correlation coefficient are determined using the following equation:
⎛ (∑ P )(∑ P ) ⎞⎟
∑P P −⎜ n e
n
N
e
⎝ ⎠
r=
⎛
(∑ P ) ⎞⎟ ⎛⎜ (∑ P ) ⎞⎟
2 2
⎜ ∑ Pn 2 − n
∑P e
2
−
e
⎜⎝ N ⎟⎠ ⎜⎝ N ⎟⎠
10
‘N’ considered to be 70 years (1950-2019). Table 3 then displays the total values of each term in the Pearson equation:
Table 3: Sum of values in Pearson’s equation for the Gompertz model (to the nearest million) (Author’s Own, 2020)
Term Sum
𝑃! 56913
𝑃* 56813
𝑃! 𝑃* 53244567
𝑃! + 53107237
𝑃* + 53390067
We can now substitute these values into our equation in order to obtain a value for the coefficient:
⎛ 56913× 56813 ⎞
53244567 − ⎜ ⎟⎠
⎝ 70
r= = 0.99993
⎛ 569132 ⎞ ⎛ 568132 ⎞
⎜⎝ 53107237 − 70 ⎟⎠ ⎜⎝ 53390067 − 70 ⎟⎠
The correlation coefficient of 0.99993 obtained suggests that the modified Gompertz function fits the data very closely. Similarly,
I calculated coefficients for the exponential and logistics models, obtaining values of 0.99496 and 0.99989 respectively. These
correlation coefficients are also very high, but it is the Gompertz function that gave us the best model for our data. As such, I used
this in my prediction of Indian population going forward. While the Gompertz function gives the best correlation, the logistics model
is also a good option for predicting growth. However, while the exponential model shows high correlation, it is still not suitable for
modelling populations. This is because its range extends to infinity, and it is not possible for a population to grow in such a way.
Predicting the future

Using the modified Gompertz model developed in this exploration, I extrapolated the data to predict the Indian population for every
year up to the year 2065. This is shown in Table 4:
Table 4: Expected population until 2065, using the Gompertz function (to the nearest million) (Author’s Own, 2020)
Yr. 𝒕𝒏 𝑷𝒏 Yr. 𝒕𝒏 𝑷𝒏 Yr. 𝒕𝒏 𝑷𝒏
2020 70 1404 2036 86 1607 2052 102 1748
2021 71 1418 2037 87 1617 2053 103 1755
2022 72 1433 2038 88 1627 2054 104 1762
2023 73 1447 2039 89 1637 2055 105 1769
2024 74 1461 2040 90 1647 2056 106 1775
2025 75 1474 2041 91 1657 2057 107 1781
2026 76 1488 2042 92 1666 2058 108 1788
2027 77 1501 2043 93 1675 2059 109 1794
2028 78 1513 2044 94 1684 2060 110 1799
2029 79 1526 2045 95 1693 2061 111 1805
2030 80 1538 2046 96 1701 2062 112 1811
2031 81 1550 2047 97 1710 2063 113 1816
2032 82 1562 2048 98 1718 2064 114 1821
2033 83 1574 2049 99 1725 2065 115 1826
2034 84 1585 2050 100 1733
2035 85 1596 2051 101 1741
The prediction process highlighted a significant flaw in the Gompertz model. The model suggests that the population extends beyond
the World Bank predicted carrying capacity of 1.65 billion. As such, we need to question the viability, reliability, and accuracy of
the World Bank’s prediction. In calculating the carrying capacity of India, the World Bank would need to consider a large array of
factors, and given that these factors themselves can also change over time, it is difficult to consider any population prediction
accurate with complete certainty. In a country with more than a sixth of the world’s population, it is hard to accurately keep track
of births, deaths, resource availability and living conditions at the same time, even though rough estimated values can be generated.
11
Conclusion
The exponential model appears to fit the data well in the first two decades of the data. However, over time the function starts to
increase too quickly, and fails to model the decreasing rate of growth in the latter stages of the data. As such, the exponential model
may be used for interpolation up to the 1970s. However, after this it becomes an unreliable model for India’s population. It fails to
consider the fact that the country’s resources finite. As such, I believe that the Gompertz function and the logistics model are far
better models for modelling Indian population growth, These models have also confirmed the claim that Indian population growth
has already started to slow down, as the country approaches its expected peak capacity. As an extension I would like to explore how
to form a model, which considers the likely possibility that Indian population will start to decline at some point. While it is difficult
to make an accurate prediction currently as to when this will actually occur, it is still possible to extrapolate data from the current
trend. This is discussed further in the evaluation.
Limitations
This exploration was limited due to several reasons. It was difficult to identify which factors which influenced population growth
in India at different periods of its history. While discussing the modifications to each other models, it was difficult to identify the
extent to which a given factor influenced growth, so it was difficult to quantify the impact of the factor on growth. For example, if
information was provided on whether growth rate doubled or tripled due a specific factor, it would have been easier to tweak our
parameters to fit the data more closely. Most of the data used analysed in this exploration was obtained from the official World
Bank website, which we might expect to be reliable, but as discussed earlier, there might have been errors in data collection. Sources
that discussed significant events and other factors that could have impacted Indian population growth were all obtained from the
works of established researchers, universities and organisations.
While the Logistics and Gompertz models are renowned models for modelling population growth, I discovered that there is a
significant limitation of using them by themselves. Prior to transforming these two models, the plots I obtained did not fit the data
too well. This could be due to 2 reasons, as discussed earlier. To summarise, these models might require transformations because:
• They don’t take factors other than mean/median growth rate and carrying capacity into consideration such as improvements
in healthcare facilities, education, migration and war. Additionally, the carrying capacity is only an estimate.
• The data found online is purposely altered by governments to hide rapid population changes. False data, combined with
inaccurate values for carrying capacity would produce inaccurate models, which would require transformations
This exploration could have been further developed with more advanced software, which is difficult to obtain let alone make use of
at this stage. A greater understanding of the economic development and history of India used together with such technology would
also allow me to consider other reasons why populations change in a such a way.
Evaluation
Our study of evolution has taught us that most populations tend to follow a trend: in the initial stages of growth, there are a small
number of individuals, so the reproductive rate is relatively low. However, with abundant resources and favourable living conditions,
the rate at which more individuals join the population rapidly increases, before reaching its peak. After this peak, the population
drops rapidly. This trend is commonly seen in experiments involving bacterial growth. However, such a drop in population is
unprecedented in human history. Even though humans are the smartest species to grace planet Earth, conditions such as famine,
drought, advanced diseases, and toxic air quality could commence a rapid drop in populations. In India’s case, it is difficult to
predict which of these factors will determine the start of a rapid decline in population, and whether such a decline is even possible.
Scientific developments could extend the period of time before populations start to rapidly decline, but a large scale event such as
a life-threatening, contagious disease could cause this decline to occur far earlier than expected. My models only consider the initial,
rapid growth and peak stages; they don’t consider the possibility of this rapid decline in numbers. As such, if I were to develop this
exploration further, I would keep this is mind while finding an even more appropriate model for modelling Indian population growth.
12
Bibliography
United Nations. (2019). World Population Prospects 2019. Retrieved January 2020, from United Nations Department of
Economic and Social Affairs - Population dynamics:
https://population.un.org/wpp/Publications/Files/WPP2019_Highlights.pdf
Trading Economics. (2020). India Population. Retrieved January 2020, from Trading Economics:
https://tradingeconomics.com/india/population
United Nations. (2017, June 21). World population projected to reach 9.8 billion in 2050, and 11.2 billion in 2100. Retrieved
January 2020, from United Nations Department of Economic and Social Affairs:
https://www.un.org/development/desa/en/news/population/world-population-prospects-2017.html
World Bank. (2019). Population, total - India. Retrieved January 2020, from The World Bank:
https://data.worldbank.org/indicator/SP.POP.TOTL?end=2018&locations=IN&start=1960&view=chart
Mahaffy, J. (2004, March 23). Separable Differential Equations. Retrieved January 2020, from San Diego State University:
https://jmahaffy.sdsu.edu/courses/f00/math122/lectures/sep_diffequations/sepdiffeq.html
Sharov, A. (1997, February 3). Logistics model. Retrieved January 2020, from University of Texas:
https://web.ma.utexas.edu/users/davis/375/popecol/lec5/logist.html
Worldometer. (n.d.). Largest Countries in the World (by Area). Retrieved January 2020, from Worldometer:
https://www.worldometers.info/geography/largest-countries-in-the-world/
Chandrashekar, V. (2019, December 12). Why India Is Making Progress in Slowing Its Population Growth. Retrieved January
2020, from Yale school of Forestry and Environmental studies: https://e360.yale.edu/features/why-india-is-making-
progress-in-slowing-its-population-growth
Hillen, T. (n.d.). Applications and Limitations of the Verhulst Model for Populations. Retrieved January 2020, from University of
Alberta: https://www.math.ualberta.ca/pi/issue6/page19-20.pdf
Tjørve, K., & Tjørve, E. (2017, June 5). The use of Gompertz models in growth analyses, and new Gompertz-model approach: An
addition to the Unified-Richards family. Retrieved January 2020, from National Center for Biotechnology Information:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5459448/
13

Modelling India's Population

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modelling India's Population

Uploaded by

Copyright:

Available Formats

Downloaded

from www.clastify.com by Shreshth

Modelling India’s Population

Figure 1 illustrates the change in the Indian population

Population in the current year − Population in the previous year

Figure 2: Screenshot of first 5 rows of raw data (Author’s Own)

Figure 2 is a screenshot of 5 rows of the Microsoft excel sheet

Verhulst’s (Logistics) Model

dPn ⎛ P⎞ When the value of Pn is small, the value

dPn so the growth curve mimics the

K − Pn Substituting Pn with P0 and

⎛ 1650 − 376 ⎞ Note: Carrying capacity has been

Limitations of the Logistics model

dPn ⎛ K⎞ The addition of the Gompertz parameter has

u = C3e− Rtn → u = ±C3e− Rtn

We know that that at 𝑡 = 0, our population 𝑃! = 𝑃" , so:

Comparison of observed and predicted populations

Pearson’s correlation coefficient (r) test

Predicting the future

You might also like