You are on page 1of 5

1

Census Analysis and Data Mining


Himanshu Mishra
1847227
PG SCOLAR
Department of Computer Science
CHRIST(Deemed to be university), Bengaluru,Karnataka,India

Abstract—The collection of census data is an important task self-completion,or completed by field staff. There are many
with respect to pro- viding support for decision makers. However, problems associated with the collection of census data, es-
the collection of census data is also resource intensive. This pecially in the case of national censuses. The first problem
is especially the case in areas which feature poor commu-
nication and transport. Census data provides an important is census budget, the collection of census data requires a
source of information with respect to decision makers operating considerable resource in terms of money and “manpower”.
in many different fields. However, census collection is a time Another problem is the cost of processing the data after it
consuming and resource intensive task. This is especially the has been collected. A third issue is that there is often a lack
case in rural areas where the communication and transportation of good will on behalf of a population to participate in a
infrastructure is not as robust as in urban areas. In this paper
the authors propose the use of satellite imagery for census census, even if they are legally required to do so because
collection. The proposed method is not as accurate as “on people are often suspicious of the motivation behind censuses
ground” census collection, but requires very little resource. The (especially when collected by government organisations) .
proposed method is founded on the idea of collecting census These problems are compounded in areas where there are
data using classification techniques applied to relevant satellite poor communication and transport infrastructures; and/or an
imagery. The objective is to build a classifier that can label
households according to “family” size. More specifically the idea extensive, but sparsely populated, hinterlands.an extensive, but
is to segment satellite images so as to obtain pixel collections sparsely populated, hinterlands.The solution proposed in this
describing individual households and represent these collections paper is founded on the idea of constructing classifiers that
using some appropriate representation to which a classifier can predict census information according to the nature of
generator can be applied. Population and Development is a satellite views of households. In some areas, such as inner
complex area for research and it includes issues related to many
variables, such as demography, eco-nomics, urbanization, gender, city areas, this is unlikely to be appropriate; however in
religion, politics, food and nutrition, health, human rights etc. sparsely populated rural areas, as will be demonstrated, this
The Indian census is used to collect exten-sive data at household can provide a effective and efficient mechanism for collecting
level. Wide Information about source of income, information census data. The fundamental idea of the proposed research
on demographic variables, household belonging, possession of is that, given a set of training data describing households and
various durable household goods etc. are collected in Census
2011. The present study is an attempt to analyze the popula- their geographical locations, we can obtain satellite images
tion and development in U.P, highly populated state in India.Its of these households and use image classification techniques
provide position of all districts of UP in respect of few selected to construct a classifier that can be applied to the entire
socio- demographic variables. This study concludes that some region and consequently automatically collect census data
districts of UP are performing better than many states in the at very low cost. Of course this will be an approximation,
country but some districts are much behind when compared with
some develop states and India.For find a census report of India there is always a trade of between resource reduction and
we have to take satelite halps because with the help of satelite accuracy; and, as already noted, is likely to be more effective
we can easily find out the population,growth,development,literacy in rural and suburban areas, than in city centres and com-
level and many more other things.This paper is an effort towards mercial areas.Census analysis is often not critically analyzed
harnessing the power of data-mining technique to develop mining to bring out some of the basic and important attributes of
model applicable to the analysis of census data that could
uncover some hidden patterns to get their geo-spatial distribution. census data information to give geo-spatial distribution of
This could help better-informed business decisions and provide population. This is due to non-availability of the required
government with the intelligence for strategic planning,tactical tools for carrying out such analysis.Data-mining is the process
decision-making and better policy formulation. of discovering previously unknown, actionable and profitable
Index Terms—Population; Development; Demographic and information from large consolidated databases and using it to
Social Variables; Census Etc. support tactical and strategic decisions (Gajendra, 2008). It
is also extraction of hidden predictive information from large
databases, is a powerful new technology with great potential to
I. I NTRODUCTION
help companies, industries, institutions, government e.t.c focus
A census is a collection of information about the nature on the most important information in their data warehouses.
of a population of a given area. Census collection tends to
be undertaken using a questionnaire format. Questionnaires
are usually either distributed by post or electronically for
2

II. L ITERATURE R EVIEW


Census analysis is often not critically analyzed to bring
out some of the basic and important attributes of census
data information to give geo-spatial distribution of population.
This is due to non-availability of the required tools for
carrying out such analysis. This paper suggests the use of
data-mining technique(Decision tree algorithm technique) to
extract hidden information from large census data warehouse
and geographical information system (GIS) as an integrating
technology that gives geo-spatial distribution of the population.
The United Nation (UN) defines census as total as total process
of collecting compiling, analyzing, evaluating, publishing and
disseminating demographic economic, social and housing data
pertaining at a specified time to all persons and all building
in a country or in a well delineated part. A population and
housing census is of great relevance to the economics, political Figure 1 :These terms are included in census analysis
and socio-cultural planning of a country. Reliable and detailed
data on the size, structure, distribution and socio-economic
and demographic characteristics of a country’s population
is required for planning, policy intervention and monitoring
of development goals. Within the masses of information
in the census database lays hidden information of strategic
importance. Data-mining is a key element in finding the
particular pattern and relationship that can help governments,
organizations and businesses.The data-mining model that an
algorithm creates takes various forms including; number of
males,number of females, sex, literacy, employment and il-
literacy to give geo-spatial distribution of population. The
data-mining extract these attributes out from pool of census
database and give geo-spatial distribution. In its simplest form, Figure 2 :This figure shows the satelite image
a Geographic Information System (GIS) is a computer-based
data management system for storing,editing, manipulating, an-
alyzing, and displaying geographically referenced information.

III. DATA S ET
in first figure we show the which term is included in 2011
year in census analysis.These points are very important to
conclude in our survey and see how the change in these
time previous 5 to 10 years back or how the change come
in previous census years.Then you after find a results how
then change in our growth,development, literacy leve and
many other things which are concluded in time of census
analysis.Because the census report is very important for to see Figure 3 :This figure shows our education growth
a official records of growth and development of our country
and our state as well as our city.So this reason Census Analysis
is very imporatnt for our Country.In the third figure show
how the change in our educational level day by day or year
by year.So this mean something better than in compare to
previous years.And in figure4 you can see that our economical
growth in compare to others country in previous years or
this year.In the 5 figure you can see that how change in our
population previous 10 or 20 years.This graph is show the
population upto to 2016 years not to 2018 years.

IV. M ETHODOLOGY
The Census of India is conducted once in a decade, fol- Figure 4 :This figure shows our economical growth in
lowing an extended de facto canvassermethod. Under this compare to others country
3

Figure 9:This pie graph show the how change in population


Figure 5 :This figure shows how the change in population
after 10 years
previous ten years

Figure 6 :This figure shows the mean,median and mode of


the data set. Figure 10:This figure show the use of mobile phone in india
in phase of our population

approach, data is collected from every individual by visiting


the household and canvassing the questionnaire all over the
country, over a period of three weeks. The count is then
updated to the reference date and time by conducting a
Revisional Round. In the Revisional Round, changes in the
entries that arise on account of births, deaths and migration
between the time of the enumerator’s visit and the reference
date/time are noted down and the record is updated.In Census
2011, for Madhya Pradesh the first phase of House listing

Figure 7 :This figure shows the correlation and regression.

Figure 11:This figure show how the literacy level change in


previous 10 to 20 years in India

TABLE I
S IMPLE TABLE

Literacy Level Growth


Men,70p 60p
Figure 8:This pie graph show the girls birth in past years Women,60p 75p
4

Operations or Housing Census was completed between 7th for analyzing census datasets. Gender inequality is observed
May to 22nd June 2010. The second phase of canvassing at all levels and all parts of population in the dataset.The
questionnaire for Population Enumeration was conducted from future scope of data mining techniques is multifold. For
9th to 28th of February 2011. Enumeration of the houseless example in terms of medical field, these can be used for early
population was done on the night of 28th February. Revisional identification of breast cancers, tumors and other such major
Round was then conducted from 1st to 5th March 2011 and the health problems which will be beneficial to everyone involved
count updated to the Reference moment of 00:00 hours of 1st in the health sector. Data mining, if used to its true potential
March 2011. The present study follows a district wise analysis may cause path breaking knowledge discoveries which will
approach to know the development of districts. District wise not only help the present generation but also the future one.
analysis has been done by making different groups of districts.
For this study we have taken selected demographic and social
VII. R EFRENCES
variables for analysis and it has been done at district level.
Selected variables are given below. 1) Literacy rate 2) Sex ratio 1. Otto Bretscher. Linear Algebra with Applications (3rd
3) Population decadal growth rate 4) Percentage of households Edition). Prentice Hall, July 2004.
with electricity supply 5) Percentage of households without 2. John Canny. A computational approach to edge detection.
toilet facility 6) Percentage of households availing Banking Pattern Analysis and Machine Intelligence, IEEE Transactions
Facility on, PAMI-8(6):679 –698, nov. 1986.
3. Jin-Song DENG, Ke WANG, Jun LI, and Yan-Hua DENG.
V. R ESULT AND D ISCUSSION Urban land use change detection using multisensor satellite
images. Pedosphere, 19(1):96 – 103, 2009.
After concluded this problem we have to able to easily find
4. RichardO.DudaandPeterE.Hart. Useofthehoughtransfor-
out the our country growth and population and many other
mationtodetectlinesandcurves in pictures. Commun. ACM,
things in phase of census analysis.And it is very important for
15(1):11–15, January 1972.
goverment for officialy purpose because without seeing the
5. J. Faichney and R. Gonzalez. Combined colour and contour
previous or compare to current sensus report government can
representation using anti-aliased histograms. InSignalProcess-
not take a better step in phase of developing or growth in our
ing,20026thInternationalConferenceon,volume1,pages735 –
Country.
739 vol.1, aug. 2002.
6. Juan A. X. Fanoe. Lessons from census taking in south
VI. C ONCLUSION AND F UTURE S COPE africa: Budgeting and accounting experiences. The African
If the entire census data along with the day by the day data Statistical, 13(3):82–109, 2011.
generated on the internet are collected then we will a vast store 7. Rafael C. Gonzalez and Richard E. Woods. Digital Image
of knowledge. This treasure of knowledge will help planning Processing (3rd Edition). Prentice Hall, 3 edition, August
commission, policy makers, law makers, scheme drafters etc 2007.
to have an insight of the actual need of the country, some 8. E. L. Hall. Almost uniform distributions for computer
trends, present status, and prevalent problems. We can also image enhancement. IEEE Trans. Comput., 23(2):207–208,
employ this in healthcare, disaster management, space research February 1974.
etc. For e.g. if government wants to implement some law 9. Gajendra, S. (2008). Data-mining, Data-ware housing and
related female foeticide, we can get the trend of lower sex OLAP. Kataria and sons: New Delhi.
ratio in particular parts of the country, so laws can be drafted 10. Koh, Chye, H. and Kee, C.L (2004). Going concern
according to social and cultural background of those areas. prediction using Data-mining Techniques Managerial Auditing
Satellite Images of the areas where some natural disaster had Journal, 19:3.
occurred can give some parameters that may occurs before 11. Kwedlo, W. and Kretowski, M. (2001). Learning Decision
such disaster. So when we mine such historical data we can Rules using a Distributed Evolutionary Algorithm. Gdansk
predict some natural calamities. We can find out the need of Press: Poland.
the nation sate wise, region wise, social class wise, culture 12. Mena, K.C (2005). Data mining and Statistics: Guild
wise, religion wise, gender wise, age wise etc which will Form press: New York
be very use full in mending policies and laws. These all 13. Pregibon,D.(1997).Data-mining “Statistical Computing
the graph help in to census analyse and it is also show the and Graphics, pp 7-8.”
different types of predictions or developement or changes in 14. RedLands,C.A.(1990).Understanding GIS. Environmental
the growth and development of life in compare to previous System Research Institute Oxford University Press: New
years.Because with the help of census analysis government York.
can find out the our Country growth or literacy level is 15. Rambaldi, G., and J. Callosa (2000). Manual on
going up or not or economical level in increase or not in Participatory 3-Dimensional Modeling fo Natural Resource
compare to previous years or in compare to previous census Management .
analysis.Various data mining techniques have been applied to 16. Rhind’s (2001): review of activities at the Experimental
look at relationships. Clustering technique is done through Cartographic Unit in the United Kingdom.
the EM clustering method whose results easily show that the 17. Sieber, R. (2000). ’Conforming (to) the Opposition: the
accuracy is quite high and such kind of exercises can be done Social Construction of Geographical Information Systems in
5

Social
18.J. Han, M. Kamber, and J. Pei. Data Mining: Concepts
and Techniques. Morgan Kaufmann, 3rd ed. 2011.
19. B. Liu, Web Data Mining, Springer 2006.
20. R. Akerkar and P. Lingras, Building an Intelligent Web:
Theory and Practice. Jones and Bartlett Publishers, 2008.

You might also like