You are on page 1of 76

Identifying Affluent Tehsils in Rural India

Approach to the project

Presented by : Sruthi Vempa Date : 18-05-2020


Pratyusha Ghosh

Project Mentor: Ann Mathew

Magic9 Media & Consumer Knowledge Private Limited 1


Introduction
▪ The rural landscape in India is hurtling ahead of the popular cliches in the
urban imagination.
▪ The rural market in India contributes to 70% of India’s population, 56%
of income, 64% of expenditure, and 33% of savings.
▪ Yet it is not the large homogeneous mass of popular presumptions.
▪ It is a vast untapped market that amounts to 12% of the world’s
population.
▪ It contributes to 50% of India’s GDP and 17%+ market growth.
▪ Much of this optimism around rural India is being contributed by a new
segment called the “Rurban”.
▪ This entirely new segment displays a hybrid characteristic.
▪ There is a keen marketing interest in decoding this new consumer.

Magic9 Media & Consumer Knowledge Private Limited 2


▪ Furthermore, It’s not always about the size, it’s more important about
the location.
▪ There exist, unique regional/state-level differences - one part of the
country may not at all be relevant a few thousand kilometers down
the road.
▪ Villages/Small Towns to show such variations.
▪ Hence regional affluence differs based on factors like urban
population, employment status, education, migration, etc.
▪ This difference in affluence is in turn reflected in the GDP.
▪ In this study, we aim at finding these affluent tehsils in the country
based on the variables that determine affluence.
▪ Hence identifying the budding markets in the country.

Magic9 Media & Consumer Knowledge Private Limited 3


Problem Statement
▪ Identifying wealthy tehsils in rural parts of India which are earmarked by
high living standards, decent jobs, advanced education and high economic
growth
▪ Furthermore, some questions may follow as
1. What are the variables that determine affluence?
2. How are these variables affecting affluence?
3. Any correlation between the variables?
4. Is the data on the explanatory variables available?
5. If available, to what level and what are the sources?
6. How can a model be formulated to obtain the affluence score?
7. Is it possible to obtain an affluence score?

Magic9 Media & Consumer Knowledge Private Limited 4


Affluence

Definition
Affluence is the state of having a lot of money or a high standard of
living
or
Affluence is the state of society characterized by plenteous
commodities and foodstuffs, high use of energy and considerable
leisure, all somewhat broadly distributed through the population.

Magic9 Media & Consumer Knowledge Private Limited 5


Literature search on affluence in rural
India

Magic9 Media & Consumer Knowledge Private Limited 6


Nielsen Study:Planning the route to growth in rural
markets
▪ Nielsen report is on “Planning the route to growth in rural markets”
▪ The study is conducted all across india in the period of 2017-18 and report is
published in June 2019.
▪ It says, a large part of this burgeoning opportunity lies in the “rurban” part of
India-the urban part of rural areas.
▪ The real challenge is being able to identify the affluent clusters.
▪ The holy grail would be a precise identification of the Rurban clusters
( including villages) that account for a bulk of the affluence.
▪ For instance, Nielsen’s advanced analytics and macroeconomic data reveal
that just 33% of villages, numbering about 200,000, account for 80% of all
rural fast moving consumer goods (FMCG) sales.
▪ Rural markets still account for half of India’s gross domestic product (GDP).
▪ This market activity make it necessary for marketers to continue focusing on
them.

Magic9 Media & Consumer Knowledge Private Limited 7


Factors considered
▪ Transportation infrastructure – ease of
reaching these markets—is one of the most
important considerations in selecting high-
potential clusters.
▪ Health infrastructure
▪ Market infrastructure
▪ Mobile connectivity
▪ These are some of the other important
considerations that indicate prosperity.

Source:
https://www.nielsen.com/in/en/insights/report/2018/planning-
the-route-to-growth-in-rural-markets/

Magic9 Media & Consumer Knowledge Private Limited 8


NFHS study:Health and Living Conditions in Eight Indian
Cities
▪ The Ministry of Housing and Urban Poverty Alleviation, with the support of the
United Nations Development Fund (UNDP), recently published a report on urban
poverty in India.
▪ This study is conducted as per the 2001 census
▪ According to this report, urban poverty in India remains high (over 25 percent).
▪ The MOHUPA report accepts the NSSO estimate that over 80 million poor people
live in the cities and towns in India.
▪ The size of the urban poor population in India is almost twice the size of the
slum population estimated in the 2001 Census (42.6 million).
▪ Even if we include all the houseless urban population in India (around 780,000)
in the category of urban poor, the estimates of the urban poor are much higher
than the estimates of the slum and houseless population.
▪ Thus, urban poverty is not indicated by the place of residence of a person
(slum/non-slum), although slums remain the most visible manifestation of
poverty.
Magic9 Media & Consumer Knowledge Private Limited 9
Factors considered
1. Number of consumer goods they own such as:
▪ Television
▪ Bicycle or car
2. Housing characteristics such as:
▪ Source of drinking water
▪ Sanitation facilities
▪ Flooring
▪ wall
▪ ceiling materials

Source: https://dhsprogram.com/pubs/pdf/FR339/FR339.pdf

Magic9 Media & Consumer Knowledge Private Limited 10


Economic Times study: India’s most affluent pin
codes
▪ According to the study, Affluence here is defined as a large proportion of
households with incomes above Rs 10 lakh
▪ Typically areas that have the highest proportion of households above such a
threshold also have very high per capita incomes.
▪ However, low- and high-income households exist side by side.
▪ This creates a measurement problem.
▪ When rich and not so rich households exist in close proximity to each other,
their incomes get averaged out.
▪ Moreover, since poor households tend to be more concentrated within an
area (the affluent prefer greater living space in terms of square foot per
person), they tend to outweigh the impact of affluence.

Source:
https://economictimes.indiatimes.com/wealth/personal-finance-news/where-money-lives-indias-most-affluent-
pin-codes/articleshow/40862425.cms?from=mdr

Magic9 Media & Consumer Knowledge Private Limited 11


Factors considered
▪ As per capita income is giving wrong estimates in calculation , it will be better
if we consider income per household.
▪ In most metros, we find that the central part of cities, generally marked by
better road infrastructure, better urban planning and proximity to business
centres and government are among the most affluent.
▪ Only those areas with a minimum of 1,000 households in the pin code were
included.
▪ This study is based on the Census 2011, geographic information system
(GIS) data, satellite imagery and the report was published in August, 2015.

Source:
https://economictimes.indiatimes.com/wealth/personal-finance-news/where-money-lives-indias-most-affluent-
pin-codes/articleshow/40862425.cms?from=mdr

Magic9 Media & Consumer Knowledge Private Limited 12


World Economic Forum Report:Shaping the Future of
Consumption

▪ A report was published by the World Economic Forum’s System Initiative on


Shaping the Future of Consumption prepared in collaboration with Bain &
Company.
▪ They conducted this study in 2018, the focus was on India and the report
was published in January 2019.
▪ This report builds on in-depth consumer surveys conducted across 5,100
households in 30 cities and towns in India, covering all key demographic
segments.
▪ It also draws from over 40 in-depth interviews with private and public-sector
leaders.
▪ Factors considered are consumption expenditure(MPCE) and mean household
income
Source:
https://www.bain.com/contentassets/7d6fe244898540f8b5b1a2fa09be3cdc/wef_future_of_consum
ption_fast-growth_consumers_markets_india_report_2019.pdf
Magic9 Media & Consumer Knowledge Private Limited 13
Harvard Business Review report:Unlocking the Wealth in
Rural Markets
▪ The report is Unlocking the Wealth in Rural Markets in which the survey is
conducted by Accenture Institute for High Performance
▪ The survey is conducted from 2009 to 2012 and the report is published in
June,2014
▪ Factors considered by them are disposable income,transportation
infrastructure,telecommunications and electricity services, distribution
networks, and banking services
▪ Study says, from 2009 to 2012, spending by India’s 800+ million rural
residents reached $69 billion, some 25% more than their urban counterparts
▪ Consumption in rural areas is growing at 1.5 times the rate in urban areas,
and today’s $12 billion consumer goods market in rural India is expected to
hit $100 billion by 2025
Source: https://hbr.org/2014/06/unlocking-the-wealth-in-rural-markets

Magic9 Media & Consumer Knowledge Private Limited 14


Mckinsey report: Next big spenders
▪ The report is on the India’s middle class and their spending which
changes the global consumer market
▪ The research is conducted by Mckinsey Global Institute all over india
and the report is published in may, 2019
▪ The factor it considered in its study is the household earnings.
▪ MGI's research portrays a dramatic transformation that will touch
Indians up and down the income pyramid, from the poorest rural
farmer to the wealthiest IT entrepreneur.
▪ The study says, most of these former poor will move into the class we
call the aspirers, households earning between 90,000 and 200,000
rupees ($1,969-$4,376) per year.

Magic9 Media & Consumer Knowledge Private Limited 15


Contd...
▪ The next two groups—seekers, earning between 200,000 and 500,000
rupees ($4,376- $10,941), and strivers, with incomes of between
500,000 and 1 million rupees ($10,941-$21,882)—will become India's
huge new middle class
▪ Companies that fail to understand the unique desires and tastes of the
new Indian consumer will miss out on a half-billion-strong market that
along with China ranks as one of the most important growth
opportunities of the next two decades.

Source:
https://www.mckinsey.com/mgi/overview/in-the-news/next-big-spenders
-indian-middle-class

Magic9 Media & Consumer Knowledge Private Limited 16


National Sample Survey Office report:Where do the richest 10%
Indians live
▪ The article on livemint ‘Where do the richest 10% Indians live’ is
based on the survey conducted by NSSO
▪ The report is published in November 2013
▪ The survey was based on the rate of change of growth in monthly per
capita expenditure from 2004-05 to 2011-12
▪ The factor considered by them is monthly per capita expenditure
▪ NSSO underestimates the consumption expenditure of Indians,
especially of high income groups.
▪ Nonetheless, as long as the extent of under-estimation does not vary
sharply across states, there is value in looking at the inter-state
distribution of the richest.
Source:
https://www.livemint.com/Specials/h5nVhT68gGmeEurVdgtmwL/Where-do-the-
richest-10-Indians-live.html
Magic9 Media & Consumer Knowledge Private Limited 17
Organisation for Economic Co-operation and Development
report : Rural Policy 3.0

▪ The study is conducted by OECD to come up with a framework to help


national governments support rural economic development.
▪ They introduced Rural Policy 3.0 which provides guidance to
governments about how to leverage opportunities and position rural
regions for prosperity and wellbeing.
▪ The report was published in 2013
▪ Factors they considered are Increasing population, labour productivity,
workforce participation, and proximity to city
Source: https://www.oecd.org/cfe/regional-policy/Rural-3.0-Policy-Note.pdf

Magic9 Media & Consumer Knowledge Private Limited 18


G scott Thomas study
▪ G Scott Thomas is a journalist who conducted study on numbers set
out to identify the most affluent communities in the united states of
america.
▪ His study’s objective was to find municipalities that have the earmarks
of affluence, notably upscale incomes, expensive houses, high-end
jobs and advanced educations.
▪ The study is based on 2009 federal estimates.

Source:
https://www.bizjournals.com/bizjournals/on-numbers/scott-thomas/2011/09/ho
w-the-affluence-rankings-were.html

Magic9 Media & Consumer Knowledge Private Limited 19


Factors considered

▪ Median household income


▪ Per capita income
▪ Percentage of households with annual incomes of $150,000 or more
▪ Percentage of households that derive income from interest payments
▪ Dividends or rental property
▪ Poverty rate
▪ Upper quartile house value
▪ Median house value
▪ Percentage of houses that have nine or more rooms
▪ Employment rate
▪ Percentage of adults having professional degree.

Magic9 Media & Consumer Knowledge Private Limited 20


Wharton University report:An Increasingly Affluent
Middle India
▪ The report is on An Increasingly Affluent Middle India is harder to
ignore
▪ The study is conducted by National Council of Applied Economic
Research (NCAER) all over india from 2004 to 2008
▪ The report is published on July,2008
▪ The factors considered in the study are
a. Purchasing power
b. Time spent on media
c. Product consumption
d. Rising disposable incomes
Source:
https://knowledge.wharton.upenn.edu/article/an-increasingly-affluent-middle-india-i
s-harder-to-ignore/
Magic9 Media & Consumer Knowledge Private Limited 21
GeoJournal:Wealth and poverty in rural India
▪ The article Wealth and poverty in rural India is based from the
journal
▪ The study is done by Geojournal through all over india
▪ It is published in december 1985
▪ The factors journal considered are size of holdings, income
▪ The main objectives of this paper are to identify and account for
the regional variations in the rural household incomes and
inequalities in income distribution between and within regions in
India.
Source: https://www.jstor.org/stable/41143579?seq=1

Magic9 Media & Consumer Knowledge Private Limited 22


Shodhganga report:Rise of rural middle class in india

▪ The study is on the “Rise of rural middle class in India”.


▪ The study is conducted by Shodhganga
▪ The factors considered are land transactions and urbanization
▪ In the village society, where traditionally peasants have been
considered as its characteristic feature, because of urbanization and
land transactions villagers have not remained peasants neither
socio-culturally nor economically.
▪ After selling their land they are adopting middle class occupations
and even if they have small plots of agricultural land they are not
interested in doing agriculture.
▪ Source: https://shodhganga.inflibnet.ac.in/bitstream/10603/130737/7/7%20chapter.pdf

Magic9 Media & Consumer Knowledge Private Limited 23


HDI and its relation to affluence
▪ According to UNDP (2013) the human development index (HDI) is a
comparative measurement of life expectancy, literacy, education and
living standards for all countries around the world.
▪ HDI is used to classify whether a country is a developed country, a
developing country or an underdeveloped country
▪ It also measure the influence of economic policy on quality of life.
▪ But due to the unavailability of the data, we failed to consider this
factor in our model
▪ HDI data is available at district level and that too for different
years(like Telangana and Andhra Pradesh HDI is available for 2005-06
and Maharashtra for 2010-11)

Magic9 Media & Consumer Knowledge Private Limited 24


Approach to the project

Magic9 Media & Consumer Knowledge Private Limited 25


Project Design
▪ Our main objective is to identify the affluent tehsils in rural india.
▪ To begin with we have to first figure out the factors that determine
affluence.
▪ The purpose of doing literature review is to identify the additional factors
taken by other significant studies to determine affluence.
▪ We will further list the availability and granularity of the factors
considered.
▪ We observe that the response variables GDP,per capita income and MPCE
are not available at tehsil level.
▪ We will frame the regression model at the district level and come up with
the formula that determine affluence score
▪ The rankings of the districts according to the affluence score calculated
are checked with the rankings of the districts according to GDP, per
capita income and MPCE.
▪ Thus the predicted model is applied to tehsil level data and calculate the
affluence score of each tehsil.
Magic9 Media & Consumer Knowledge Private Limited 26
Removing Multicollinear Factors
▪ Since we have large number of explanatory variables, we can expect
multicollinearity among the variables.
▪ From the correlation matrix we find out the variables that have high
correlation and we will drop those variables accordingly for a better
fitted model.
▪ Our aim is to consider the financial factors namely GDP, per capita
income and MPCE as the response variables separately.
▪ But here also we can not rule out the possibility of high correlation
between these three financial factors.
▪ For that reason, we will find the correlation among them.
▪ In presence of high correlation between any of the factors we will drop
that factor.
▪ we are considering the dataset after dropping the highly correlated
variables(both dependent and independent).
Magic9 Media & Consumer Knowledge Private Limited 27
Fitting the data into regression model

▪ To get an idea of how the independent variables are associated with


the dependent variables we will plot the distribution for each of them.
▪ Our next target will be to identify patterns in the data if any.
▪ Then we need to fit the regression model that can explain the dataset
at hand.
▪ The regression model should be accurate enough such that it can be
used to obtain the affluence at the tehsil level.
▪ The model thus obtained will give us the predicted responses for
every district.
▪ By using this model we will obtain the predicted values for the tehsils.

Magic9 Media & Consumer Knowledge Private Limited 28


Establishing explicit groups of affluent tehsils

▪ The regression model will give us predicted response values for all the
tehsils in India.
▪ With the range of predicted values at hand we will form explicit groups and
assign affluence scores to the tehsils on the basis of their predicted response
value.
▪ Based on the affluence scores assigned we can classify the tehsils into rich,
upper middle-class, lower middle-class and poor and below poverty level.
▪ At this point in the study we will have tehsils that fall in the category “rich”,
“upper middle class”
▪ Among these tehsils we will pinpoint the ones which satisfies all the
necessary requirements to be an affluent tehsil.

Magic9 Media & Consumer Knowledge Private Limited 29


Backward Elimination
▪ There exists a possibility that the regression approach will give predicted
values far more different from the observed values.
▪ In this case we will apply Backward elimination technique.
▪ By this method, we will only consider the variables that contribute
significantly in explaining the variation in the model.
▪ We will obtain the p values for all the variables.
▪ Considering a significance level of 0.05, we will drop the variable with the
highest p value above 0.05.
▪ This will repeated until all the variables remaining in the data have p
value less than 0.05(i.e. these variables are highly significant)
▪ Other variables will not be taken for computation.
▪ The only problem in backward elimination method is that it is a
cumbersome method as it repetitive.
▪ The same model thus obtained will be projected on the data at the tehsil
level.
Magic9 Media & Consumer Knowledge Private Limited 30
Alternate method (PCA approach)
▪ As already mentioned that backward elimination is a cumbersome and
repetitive method.
▪ A more efficient way would be to implement the PCA approach.
▪ We will reduce the multicollinearity and obtain a better fitted model we will
apply the PCA approach.
▪ PCA will reduce the dimensionality of the dataset i.e. reduce the columns.
▪ It will shrink the columns into required number of principal components.
▪ We will consider the number of principal components that explain 90 percent
variation in the data.
▪ we can frame a regression model for the entire data set simply obtaining a
regression model based on the principal components.
▪ This would give us a more accurate set of predicted values and a better
fitted model.

Magic9 Media & Consumer Knowledge Private Limited 31


GDP as a response variable

▪ GDP is taken as one of the response variable in this study.


▪ GDP is considered as a response variable because a country’s level of
economic development is strongly related to the composition of its
national wealth
▪ Wealth includes all assets, which means human capital (the value of
earnings over a person’s lifetime), natural capital (energy, minerals,
agricultural land), produced capital (machinery, buildings, urban
land), and net foreign assets.
▪ As said earlier in the project design, GDP rankings are checked with
the rankings according to the affluence score of the districts
▪ Thus , GDP moderates the affluence score in this project

Magic9 Media & Consumer Knowledge Private Limited 32


Factors determining affluence
▪ We considered the factors from the census that can determine affluence
▪ These are urbanization, migration, literacy rate, household assets,
working status
▪ We did secondary research and collated the factors from the existing
findings and reports
▪ Then we included more factors into our study that affect affluence like
health infrastructure, income factors like monthly per capita
consumption expenditure, mean household income etc
▪ We tabulated the factors accordingly in the next two slides.

Magic9 Media & Consumer Knowledge Private Limited 33


Factors taken from Census(2011)

1.Urbanization 8.Concrete Roof (Census


houses)
2.Literacy Rate 9.Concrete Wall (Census houses)
3.Migration
10.Concrete Floor (Census
4.Working Status houses)
5.Availing banking services 11.Sanitation Within Premises

6.LPG/PNG Users 12.Owning Television


13.Owning Computer/Laptop
7.Drinking Water Supply

14.Owning Mobile/landline
15.Owning Vehicles
Magic9 Media & Consumer Knowledge Private Limited 34
Factors considered from literature search
Below are the factors considered from literature searches and are
available:
▪ Caste
▪ Monthly per capita expenditure(Consumption on expenditure)
▪ Health infrastructure (PHC, sub centres, Community health care
centres, district Hospitals)
▪ House
▪ Income(Mean household income)
▪ Internet connection
▪ Per Capita income
▪ Size of households
▪ Land Ownership

Magic9 Media & Consumer Knowledge Private Limited 35


Availability of data and granularity
▪ The data is collected from census , official state websites of the
government, Data.gov.in, Economic Survey - Directorate of
economics and Statistics, India.gov.in, Secc.gov.in
▪ Dependent variables
Factor Source Data available Data available Year
at district level at tehsil level

GDP Data.gov.in, Yes No 2011


Directorate of
Economics & Statistics

Monthly per capita Directorate of Yes No 2011-12


expenditure Economics &
Statistics

Per Capita income District Hand Book Yes No 2010-11

Magic9 Media & Consumer Knowledge Private Limited 36


▪ Independent Variables

Factor Source Data Data Year


available at available at
district level tehsil level
Urbanization Census Yes Yes 2011

Literacy Rate Census Yes Yes 2011

Migration Census Yes No 2011

Working Status Census Yes Yes 2011

Household Census Yes Yes 2011


Amenities

Availing Census Yes Yes 2011


banking
services

Magic9 Media & Consumer Knowledge Private Limited 37


Factor Source Data available Data available Year
at district level at tehsil level
LPG/PNG Users Census Yes Yes 2011

Drinking Water Supply Census Yes Yes 2011

Concrete Roof (Census Census Yes Yes 2011


houses)

Concrete Wall (Census Census Yes Yes 2011


houses)

Concrete Floor Census Yes Yes 2011


(Census houses)

Sanitation Within Census Yes Yes 2011


Premises

Owning Television Census Yes Yes 2011

Owning Census Yes Yes 2011


Computer/Laptop

Owning Mobile/landline Census Yes Yes 2011

Magic9 Media & Consumer Knowledge Private Limited 38


Factor Source Data available Data available Year
at district at tehsil level
level
Owning Vehicles Census Yes Yes 2011

Caste Census Yes Yes 2011

Health data.gov.in Yes Yes 2010-11


infrastructure

House Census Yes Yes 2010-11

Income(Mean Secc.gov.in Yes Yes 2010-11


household income)

Internet connection Census Yes Yes 2010-11

Size of households Census Yes Yes 2010-11

Land Ownership Secc.gov.in Yes Yes 2010-11

Magic9 Media & Consumer Knowledge Private Limited 39


Challenges

▪ There is a chance that the regression model will not give a good fit because of high
correlation among the variables even after dropping a few highly correlated
variables.
▪ This will result in predicted values that are not exactly equal to the observed data at
hand.
▪ The common problem that arises in such situations is called overfitting.
▪ To reduce overfitting, the two simplest ways are to:
▪ Reduce the number of independent variables based on their significance(thus
applying backward elimination)
▪ To add more data so as to reduce variation in the dataset.
▪ Fitting the model on the dataset for entire India will help us to reduce variations.

Magic9 Media & Consumer Knowledge Private Limited 40


Andhra Pradesh: A trial model

Magic9 Media & Consumer Knowledge Private Limited 41


About

▪ Andhra Pradesh state is one of the 28 states with the total area of
272,282 km2
▪ It consists of 23 districts with the total population of 84,580,777
▪ It has the second-longest coastline after Gujarat of about 974 km
▪ State is formerly divided into three major regions Telangana,
Rayalaseema and Coastal Andhra.
▪ In 2014, state is bifurcated into Andhra and Telangana
▪ In this project the data of Andhra Pradesh is according to the 2011
Census

Magic9 Media & Consumer Knowledge Private Limited 42


Data Collection

▪ We collected the data for all the 24 factors determining the affluence
of the andhra pradesh state.
▪ The data is collected at district level
▪ The values of each district is collected and percentages are calculated
▪ Data is collected from the digital library in census, and from other
websites like data.gov.in, secc.gov.in, india.gov.in, official state
portals, directorate of economics and statistics portal
▪ We listed each factor in each sheet for all the districts of andhra
pradesh

Magic9 Media & Consumer Knowledge Private Limited 43


Data Computation
▪ We calculated
▪ Percentage of urban and rural population
▪ Percentage of literates and illiterates
▪ Percentage of illiterates from 0 to 6 age group
▪ Percentage of working population(main+marginal workers)
▪ Working age is considered from 15 to 59 years
▪ Percentage of migrants
▪ Percentage of households availing banking services
▪ Percentage of households with the source of drinking water in
premises
▪ Source of drinking water is taken that is Tap Water treated and
untreated, Covered and uncovered well,Hand
Pump,Borehole/Tubewell, Spring, River/Canal, Tank/Pand/Lake, and
other sources
Magic9 Media & Consumer Knowledge Private Limited 44
▪ Tap water treated and untreated,Covered and uncovered well,
Handpump, Borehole/Tubewell is considered to in calculating percentage
of households with the source of drinking water in premises.
▪ Percentage of households having roof made of concrete
▪ Here we took the raw data on the households with different roof
materials like Grass/ Thatch/ Bamboo/ Wood/Mud, Plastic/ Polythene,
Hand made Tiles, Machine made Tiles, Burnt Brick, Stone/Slate,
G.I./Metal/ Asbestos sheets, Concrete,and any other material.
▪ We considered concrete households for calculating percentage
▪ In the same way, from the raw data Percentage of households with
Stone with/without mortar, burnt brick, concrete as wall and Burnt brick,
stone, cement, mosaic/floor tiles as floor is calculated
▪ Percentage of households using LPG/PNG, electricity as a fuel for cooking
▪ Raw data on types of fuel used for cooking are Fire-wood, Crop residue,
Cow dung cake, Coal/Lignite/Charcoal, Kerosene, LPG/PNG, Electricity,
Biogas, Any other, and no cooking

Magic9 Media & Consumer Knowledge Private Limited 45


▪ Percentage of households using electricity or solar energy for source
of lighting
▪ Percentage of households having sanitation facilities within premises
▪ Percentage of households having television
▪ Percentage of households with computer/Laptop
▪ Percentage of households having internet connection
▪ Percentage of households having landline/mobile
▪ Percentage of health infrastructure that is total of Primary Health Care
centres,Community Health Centres, Sub-divisional hospitals, district
hospitals
▪ Percentage of households having vehicles(both two wheeler, four
wheeler and also bicycle)
▪ Percentage of population other than SC and ST
▪ Percentage of population having houses

Magic9 Media & Consumer Knowledge Private Limited 46


▪ Percentage of size of households (Households with 1- 4 as members)
▪ Percentage of rural households owning land
▪ Percentage of rural households earning income 10000 or more

Magic9 Media & Consumer Knowledge Private Limited 47


Data Analysis

Magic9 Media & Consumer Knowledge Private Limited 48


▪ We also tabulated Gross District Domestic Product (GDDP), Per Capita
Income (PCI) and Monthly Per Capita Income (MPCI) of districts in
andhra pradesh
▪ We calculated the correlation of the income factors

Magic9 Media & Consumer Knowledge Private Limited 49


▪ We also found the mean, median and
top 30 percentile and bottom 30
percentile of each factor for all the
districts of andhra pradesh
▪ We sorted the districts according to the
top 30, median 40 and bottom 30
percentiles for all the factors
▪ We performed conditional formatting on
the data
▪ Values greater than or equal to top 30
percentile are in green color ,
▪ Values less than or equal to bottom 30
percentile are in red color
▪ Values of median 40 percentile are in
yellow color and median value is
highlighted in red and bold
Magic9 Media & Consumer Knowledge Private Limited 50
Multicollinearity

▪ From the correlation matrix we will find out the variables that show
high correlation among them.
▪ This is called multicollinearity when the variables have high correlation
among them.
▪ Due to multicollinearity the estimates become very sensitive to minor
changes in the model.
▪ This means that it would reduce the precision of the estimates and
weaken the statistical power of our regression.
▪ Thus, the results obtained will be unstable and difficult to interpret.
▪ Hence dropping such variables will be beneficial for our model.

Magic9 Media & Consumer Knowledge Private Limited 51


Interpretation of Pearson correlation coefficient:
▪ The question arises that what range of correlation will be considered to be
high such that the variable can be dropped
▪ The table below shows interpretation of the range of correlation coefficient:

Correlation coefficient Interpretation


±1 Perfect correlation
±0.9-±1 Very high correlation

±0.7-±0.9 High correlation

±0.5-±0.7 Moderate Correlation

±0.3-±0.5 Weak correlation

Below 0.3 Very weak correlation

0 No correlation

https://www.andrews.edu/~calkins/math/edrm611/edrm05.htm)

Magic9 Media & Consumer Knowledge Private Limited


http://www.dummies.com/education/math/statistics/how-to-interpret-a-correlation-coefficient-r/ 52
Correlation matrix of all the independent variables

Magic9 Media & Consumer Knowledge Private Limited 53


Magic9 Media & Consumer Knowledge Private Limited 54
Variables dropped
▪ In our study we have dropped the variables which showed a correlation of more
than 0.70(i.e., variables with very high and high correlation)
▪ These variables in case of Andhra Pradesh are:
1. Literacy Rate
2. Working Status
3. Concrete Floor
4. LPG users
5. Electricity
6. Sanitation within premises
7. Owning Television
8. Owning Computer/Laptop
9. Owning Vehicles
10.Houseless
11.Land ownership
12.Internet connection

Magic9 Media & Consumer Knowledge Private Limited 55


Correlation matrix of significant independent variables
▪ The table below shows the correlation matrix after dropping the
variables with high correlation (i.e., above 0.70).

Magic9 Media & Consumer Knowledge Private Limited 56


Splitting the dataset into training set and test set
▪ We divided the dataset into training and test set.
▪ A training dataset is a dataset of examples used for learning, that is to fit the parameters.
▪ Approaches that search through training data for empirical relationships might lead to
overfit the data.
▪ They can identify and exploit apparent relationships in the training data that do not hold in
general.
▪ A test dataset is a dataset that is independent of the training dataset, but that follows the
same probability distribution as the training dataset.
▪ If a model fit to the training dataset also fits the test dataset well, minimal overfitting has
taken place.
▪ A better fitting of the training dataset as opposed to the test dataset usually points to
overfitting.
▪ A test set is therefore a set of examples used only to assess the performance.

Magic9 Media & Consumer Knowledge Private Limited 57


contd...
▪ Typically, when we separate a dataset into a training set and testing
set, most of the data is used for training, and a smaller portion of the
data is used for testing.
▪ In this study test set is taken as 20% of the dataset, making sure
that 80% of the data is in training set.
▪ However, we can choose as much of dataset as test set as we want.
▪ The values taken in the test set are picked at random by the
programming model.
▪ The program randomly samples the data to help ensure that the
testing and training sets are similar.
▪ By using similar data for training and testing, we can minimize the
effects of data discrepancies and better understand the characteristics
of the model.

Magic9 Media & Consumer Knowledge Private Limited 58


Regression model
▪ We obtained a regression model based on the independent variables after
dropping the insignificant ones.
▪ We considered GDP to be our response variable.
▪ After a model has been processed by using the training set, we test the
model by making predictions against the test set.
▪ The data in the testing set already contains known values for the attribute
that we want to predict.
▪ It is easy to determine whether the model's guesses are correct.
▪ The regression model obtained is not a good fit in the case of our study.
▪ This is a case of overfitting.
▪ Overfitting arises because our model has over-trained itself on the data that
is fed to train it.
▪ It could be because there are way too many independent variables or
because we have not supplied enough data.
Magic9 Media & Consumer Knowledge Private Limited 59
Prediction of training set

Magic9 Media & Consumer Knowledge Private Limited 60


Prediction of test set

Magic9 Media & Consumer Knowledge Private Limited 61


Need of backward elimination

▪ Our model has evidently shown overfitting.


▪ As a remedy we have to both reduce the number of features and increase
the data fed to the model.
▪ Unnecessary features increase the complexity of the model.
▪ Hence it is good to have only the most significant features and keep our
model simple to get the better result.
▪ So, in order to optimize the performance of the model, we will use the
Backward Elimination method.
▪ This process is used to optimize the performance of the model.
▪ It will only include the most affecting feature and remove the least affecting
feature.

Magic9 Media & Consumer Knowledge Private Limited 62


Backward Elimination
▪ By backward elimination we will only take into consideration the
variables that contribute significantly in our study.
▪ We began with all the variables at hand(all 23 variables).
▪ Since, in the regression model we observed that splitting the dataset
will lead to overfitting, this time we will predict the results for the
entire dataset.
▪ we obtained the p values for all the variables.
▪ By repeating backward elimination we will see that at a point the
predicted values will be very close to the observed values.
▪ Those variables in case of Andhra Pradesh are:
▪ % migration
▪ % Water Supply
▪ Castes other than SC/ST
▪ The regression model will now be fitted on the data based on these
variables only.
Magic9 Media & Consumer Knowledge Private Limited 63
Results(Applying Backward Elimination):

Magic9 Media & Consumer Knowledge Private Limited 64


Results:

▪ The results obtained by applying backward elimination are very close


to the observed GDP.
▪ This clearly shows that the model gives a good fit.
▪ The ranks of districts thus obtained from the observed values of GDP
are very similar to the ones obtained from the predicted GDP values.
▪ Hence, we will prefer to fit the model after doing backward elimination
rather than simple regression.
▪ While fitting the model for all the 640 districts we will split the dataset
into training and test set because then we will have ample data and
the chances of overfitting will be very less.

Magic9 Media & Consumer Knowledge Private Limited 65


Alternate approach:Using PCA

▪ Backward elimination is a very cumbersome and time consuming


method.
▪ An efficient way to fit a model will be to apply the PCA method.
▪ In the case of Andhra Pradesh it reduces the dimensionality to give us
6 principal components as these explain 90% of variation in the data.
▪ We obtain the regression model based on these 6 principal
components.
▪ The model thus obtained is a lot better than the simple regression
approach and also an efficient one.

Magic9 Media & Consumer Knowledge Private Limited 66


Results(Applying PCA approach):

Magic9 Media & Consumer Knowledge Private Limited 67


Results:

▪ The predicted values are close to the observed values.


▪ Thus it’s better than the only multiple regression approach for sure
but it might not fit as well as the backward elimination approach.
▪ From the ranks we can observe that the model fits quite well.
▪ We can also observe that the rank of the districts obtained by PCA
approach are close to the ranks we had from the observed GDP
values.
▪ We had showed before that the PCA approach is not repetitive and
cumbersome unlike the backward elimination approach.
▪ Thus in this study PCA approach could be more efficient.

Magic9 Media & Consumer Knowledge Private Limited 68


Choice of model
▪ We want to select an approach that gives us good result.
▪ Thus we will perform Spearman’s rank correlation between the
observed and predicted ranks for all the methods we have used.
▪ we will use the method with the highest rank correlation.
▪ A higher correlation between the predicted and observed ranks would
mean that the model fits better.
▪ By applying spearman’s rank correlation on ranks for all the three
methods we obtained that:

Method Spearman’s rank correlation


Multiple regression on training set 0.84

Multiple regression on test set 0.77

Multiple regression with backward elimination 0.79

Multiple regression with PCA approach 0.72

Magic9 Media & Consumer Knowledge Private Limited 69


Contd...

▪ Multiple regression with backward elimination gives the highest value


for rank correlation.
▪ This means that the ranks obtained for predicted values of GDP by
this approach are the closest to the ranks obtained from observed
GDP.
▪ This in turn says that the predicted values obtained by this approach
are very similar to the observed values in the study.
▪ Evidently, the model fits the data quite well.
▪ Hence, we will use the backward elimination method to obtain our
regression model.

Magic9 Media & Consumer Knowledge Private Limited 70


Projecting on tehsil level

▪ We will apply that model on the data at tehsil level and obtain
predicted values.
▪ Based on the predicted values we will assign affluence score to the
tehsils.
▪ We will group the tehsils into: rich, upper middle class, middle class,
lower middle-class, poor and below poverty level.
▪ Then from among the rich and upper middle-class tehsils we will
pinpoint the tehsils which satisfy all the requirements to be an
affluent tehsil.

Magic9 Media & Consumer Knowledge Private Limited 71


Categorising the tehsils of Andhra Pradesh
▪ We applied the same model on the tehsils of Andhra Pradesh
▪ We considered the top 5 percentile of tehsils to be affluent.
▪ The bottom 5 percentile to be poor.
▪ The section in the middle is grouped into upper middle class, middle
class and lower middle class.
▪ We grouped the tehsils exclusively into categories based on their
affluence scores.
Affluence Score Category
>=38629 Rich

38482-27024 Upper middle class

27023-23069 Middle class

23068-11872 Lower middle class

<=11871 Poor

Magic9 Media & Consumer Knowledge Private Limited 72


Results at tehsil level for Andhra Pradesh:
▪ We ranked them according
to their affluence score.
▪ This shows the tehsils with
top most affluent scores.
▪ 57 of the 1128 tehsils of
Andhra Pradesh fall in the top
5 percentile.
▪ These are the identified
affluent tehsils of Andhra Pradesh.
▪ We will similarly find the
affluent tehsils for India.

Magic9 Media & Consumer Knowledge Private Limited 73


Conclusion

▪ The affluent tehsils identified will have huge scope of future expansion in
the development of rural markets.
▪ In fact, these rural marketing should be recognized as developmental
marketing by big business firms.
▪ In India it has gained greater significance these days as the overall growth
of the economy has resulted in a substantial increase in the purchasing
power of the rural communities.
▪ On account of the green revolution in India, the rural areas are consuming
a large quantity of industrial and consumer products produced near the
urban areas.
▪ In this context, a special marketing strategy, namely, rural marketing has
replaced agricultural marketing which was confined merely to selling farm
machines and other inputs.
Magic9 Media & Consumer Knowledge Private Limited 74
Conclusion
▪ Growth in the rural markets result in an overall balance of people both socially
and economically.
▪ The rural market is growing at a faster pace than the urban market.
▪ The growth and development of rural marketing contribute to overall prosperity
and welfare.
▪ The growth of rural marketing leads to increased business operations,
professional activities, and services that can generate a lot of employment
opportunities.
▪ Improved standard of living in the rural area by better marketing because people
can access things at a relatively low price.
▪ Growth in the rural market will, in turn, improve transportation,
communication, banking, entertainment and other facilities or in other words,
we can say better rural infrastructure.
▪ This concludes why and how the identified tehsils will be extremely beneficial
for marketing strategies.
Magic9 Media & Consumer Knowledge Private Limited 75
Thank you

Magic9 Media & Consumer Knowledge Private Limited 76

You might also like