You are on page 1of 7



Studies on E Governance in India using

Data Mining Perspective
Ms. Sonali Agarwal, and Prof. G.N. Pandey

Abstract— The fast expansion, exploitation and propagation of the innovative and promising Information and Communication Technologies (ICTs) indicate
new opportunities for growth and development. Data Mining is a well established approach of discovering knowledge from databases for the purpose of
Knowledge Management. There is large number of data and information generated and collected by the different levels of governments. In case of gov-
ernment, proper decision making is important to better utilization of all resources. Data Mining could help administrators to extract valuable knowledge and
practices out of this voluminous data, which can be used to obtained knowledge and practices for strategically reducing costs and increasing organization
expansion opportunities and also detect fraud, waste and abuse. The present investigation taken Education Data related with primary education in order to
analyze status of primary education in Allahabad and in Uttar Pradesh, India. Clustering and Classification methods are used to find out similarity or dissi-
milarity among various districts of Uttar Pradesh. This will create groups of districts as clusters so that these districts may further treated together under
one policy. Classification method is based on reported Gross Enrollment Ratio (GER). In this method some unusual classification of district highlighted
that the Data Mining could also establish the impact of migration from one district to another when all the students are given unique identification through
social security number.

Index Terms—Information and Communication Technologies, Knowledge Management, Data Mining, Clustering, Classification

——————————  ——————————
(i) Efficient methods for capturing, storing and han-
1 INTRODUCTION dling government data collected from various re-

D ata Mining is a process of Knowledge Discovery in- sources over a period of time.
cludes methods used to recognize, generate, represent (ii) Efficient Knowledge Management for improved in-
and distribute knowledge for better utilisation of any ternal processes, government policies and programs
system. There is large number of data and information gen- on the basis of historical data stored in its databases.
erated and collected by the different levels of governments.
In case of government, proper decision making is important The present work proposes an E Governance model
to better utilization of all resources. Data Mining could help framework based on Data Mining and Data Warehousing
administrators to extract valuable knowledge and practices techniques which may be efficiently used by the government
out of this voluminous data, which can be used to obtained at all its administrative levels Nation-
knowledge and practices for strategically reducing costs and al/State/District/Block).The proposed Model serves all
increasing organization expansion opportunities and also possible aspects of E Governance with the help of four basic
detect fraud, waste and exploitation. building blocks:
 Administrative Block
The research work is aimed to represent the potential of  Technical Know How Block
data mining in the context of smart techniques of E Gover-  Service Block
nance. Data Mining provides efficient techniques for gov-  Stakeholder Block
ernment agencies to analyze data quickly and with lesser
economic efforts [1]. The data extraction process generates
interesting hidden patterns. The discovered hidden patterns
enable the government systems in making better decisions
and having a more advanced plan in serving the citizens [8]. There is an extensive range of Data Warehousing and Data
Here we are representing an E Governance Model based on Mining applications in government’s regulatory, develop-
Data Mining and Data Warehousing to facilitate mental and social welfare organization. The followings are
some examples reported in different literatures.
 Ms. Sonali Agarwal is with the Indian Institute of Information Technolo-
gy, Allahabad, U.P., India The project Total Information Awareness (TIA) was
 Prof. G.N. Pandey is with the Indian Institute of Information Technology, launched by the US government after the terrorist attack of
Allahabad, U.P., India 9/11. The objective of Total Information Awareness (TIA)
was to search large data and determine associations and pat-
terns related with terrorist activities. The project conducted ation rules may establish the similarity, difference between
discovery of associations among transactions such as work customer’s behaviors [6].
permits, credit card, airline tickets, passports, visas, rental
cars, gun purchases, driver’s license and events such as ar-
rest or doubtful activities [17][15].
CAPPS is known as Computer Assisted Passenger Pre- GOVERNANCE MODEL
Screening System. It is a prescreening system initiated by the The proposed E Governance model covers all important as-
Department of Homeland Security US. It is implemented to pect of E Governance in a single model. There are four Basic
check all airline passengers against a database of commer- Building Blocks of proposed E Governance Model. The low-
cially available information. After checking it provides a risk est block is the Administration Block, which regulates the
color or status to each passenger. CAPPS collect information overall function of any country through efficient govern-
provided by the passenger for example Paasenger’s name, ment.
permanent address, contact number etc. These records are
then given to commercial data providers for assessment of
the validity of the passenger and passenger’s correlation
with other events. The commercial data provider would
assign a numerical score back to the owning system indicat-
ing a particular risk level. The passengers having “green”
score is considered as normal and safe passenger. The pas-
sengers having “yellow” score then they would have to face
second level screening test. The passengers having “red”
score is considered as high risk passenger and high risk pas-
sengers may not be allowed for traveling and they must be
further enquired about their identity and purpose of travel-
ling [9].

In May 2004, a report on federal data mining activities in-

dicates that US government agencies have very well adopted
the data mining practices in e governance. Currently there
are 199 data mining projects ongoing in various stages. Stu-
dies indicates that the government is also running some un- Fig. 1: Basic Building block of the Proposed E Governance Model
disclosed data mining projects for example national security
aagency's eavesdropping project and state level security The overall regulation of government bodies may be car-
project matrix [10]. ried out by using appropriate Technical know how. The
Technical know how block includes computerization of ma-
There are several research work published in the field of nual processes, commonly agreed technological standard,
model-building phase of the Data Mining process. A paper Database related applications and easy access of information.
based on Data Mining application for income tax department The third block is Service Block, which includes all available
discusses how to build a Data Mining algorithm centered operations of the E Governance. It provides an interface be-
application for the regulation of different government activi- tween user and government system. The upper block is
ties [13]. Main concern of this paper includes architecture of Stakeholder Block, which has various categories of users
Data Mining based application, working methodology and working with the system. The user categories may be a Citi-
the integration of knowledge of domain experts. zen, Business organization or any Government organization
A Data Mining tool iHealth was developed by a health
organization CSIRO. It provides a web based interface for 3.1 Module 1: Administration
Data Mining and Data analyses tool for large health related Administration is a way of management of any working sys-
databases. The tool provides various clustering and classifi- tem supervised by an administrator. In any democratic sys-
cation methods to identify patients having certain specific tem the administration may be governed by a structured
profiles. The patients’ profile could be visualized by using body name as government. The term Governance is basically
various visualization techniques [6]. the responsibility of a Government which includes each and
every processes performed by the government body. The
A paper presented a Data Mining based approach to main activity of the government is to controll the working of
study about student performance and dropout rate [11]. The different departments for exmaple Finance, Health, Educa-
method used Clustering and Decision Rule Data Mining tion, Agriculture, Employment etc. All these activities are
techniques to identify collection of clusters, which have been now maintained efficiently by using ICT. The transformation
helpful to understand the nature of data. A Data Mining of the working from conventional methods to modern me-
based approach is discovered to classify the selected custom- thods of Information Technology (IT) is now known as E
ers into clusters using Recency, Frequency, Monetary value Government. The use of ICT in government activities have
(RFM) model to identify high-profit, gold customers. Associ- given a new idea of governance knows as E Governance.
 Centralized E Governance models have a single in-
3.1.1 Salient features of the proposed model terface for its different users and these models could
The purpose of E Governance is to establishing good gover- be easily enforced.
nance and have seamless coordination between government
authorities, public and business parties. The utilization of Decentralized Model
ICT may join all three different sectors and support devel- Decentralized model is required at lower level so that vari-
opment and management. Therefore, following are the sa- ous projects can be handled saparatelty from initiation to
lient features of the proposed model. executation [3]. There are following features of Decentralized
E Governance model.
 All government functions could be distributed
among various divisions or organizations.
 Generally has a high coordination cost.

3.1.2 State level Model of E Governance

The State level model is based on the combination of both
centralized and decentralized approaches. In State level,
State government becomes the main coordinator of the
project and lower government offices with their departments
become the partners of that project. Figure 1.2 describes ho-
rizontal and vertical interconnections of E Governance.

1. To provide proper information and awareness to the

citizen about the political practices and choices available.
2. To provide online services and active participation for
different citizen services.
3. To utilize ICT in government functions, that provides
quick and well-organized communication with the
people, business and other agencies.
4. To provide better decision-making through greater de-
centralization of governance [4].

The proposed model is based on ICT, which may reform Fig.2 : Horizontal and Vertical interconnection for E Governance
organizational structures in both centralized as well as de-
centralized manner. These approaches of E Government  Certain important decisions are jointly made and
have their own set of advantages and disadvantages. then standardized across the various levels.
 Responsibilities as well as capabilities are decentra-
Centralized Model lized at different government departments/levels,
Centralize government initiatives are favorable as portals with infrastructure and output sharing across the
and services to reduce cost and integration issues. Centralize State as a system.
government initiatives may share technical, financial and  Generally, high E Governance set up costs but more
human resources. A Single portal access is very useful for responsive to stakeholder needs. Higher level com-
any end user because all the information may be centrally mittees are formed to manage various Government
available here. There are following features of Centralized E activities. These committees have authority to con-
Governance model. trol the functioning of large area.
 All government process based on ICTs are centra-
lized in one organizational unit. Intra-department or horizontal and vertical collaborations
 Generally limited Infrastructural and set up costs are very essential for success of any E Governance project. It
but less effective. is very necessary to perform governance functions, share
information and deliver services to all stakeholders. These
collaborations depend on issues like what are the different
types of intra- department collaborations exist in E Gover- 3.4 Module 4 Stakeholder Block
nance and why intra- department collaborations are impor- Stakeholder is an individual person, group of persons or a
tant [4]. community having common area of interest and commonly
affected by any system. Here E Governances has a wide rage
3.2 Module 2: Technical Know How of stakeholders. The main groups are identified in 3 parts.
For E Governance, there are many applications need to be
automated. Various departments seek computerization and
other technological transformation of their working strate-
gies. Now it is necessary to conceptualize the whole ap-
proach and develop a standard framework and protocols for
the regulation of all E Governance activities. The proposed
Model uses Data Mining and Data Warehousing for improv-
ing the service performance of the E Governance system.

3.2.1 Case Study: Data Mining in Department of

Education related organizations are major application area
for Data Mining since it collect large amount of data on stu-
dents enrollment, courses taught, students academic record
history etc. The data collection trend is also increasing be-
cause of the availability and popularity of courses taught.
Today many institutions also have websites where students
may study online. Educational Data Mining may help identi-
fy student academic performance, discover student’s beha-
vior regarding selection of subjects. These patterns and
trends may further improve the quality of education, achieve
better student admission and satisfaction, and enhance good
academic practice and policies.

Data Mining algorithms are used to distinguish different set

of data by using the test data. For example an algorithm Fig.3: Data Mining in different Government Department by using
identifies characteristics that distinguish students who took Distributed Databases
out a particular kind of study loan from those who did not.
Finally, it predicts rules regarding issuance of study loan.
3.4.1 Citizen
The rule is based on the attributes of the previous good stu-
dents who are successfully paid their loans. These rules are Citizen is associated with the E Governance by using Gov-
further used to recognize such students on the remaining of ernment to Citizen (G2C) interface. Government to Citizen
the database. (G2C) interface is an online interaction between government
and private individuals.
In the same way, various algorithms are implemented to
convert the database into clusters of students with several 3.4.2 Business
similar attributes and this may certainly reveal interesting Business is associated with the E Governance by using Gov-
and unexpected patterns. The patterns of the clusters are ernment to Business (G2B) interface. Government to Business
further interpreted by the experts, in collaboration with insti- (G2B) interface is important because various trades and
tutions personnel. business related transactions are required by the government
for the regulatory purpose.

3.3 Module 3 : Service Block 3.4.3 Government

In the service block, services of E Governance as end results are Various governments departments are associated with one
provides to the citizens for betterment of their lives. It also pro- other by the means of E Governance by using Government to
vides an interface so that a common citizen may participate Government (G2G) interface. It provides online interaction of
in decision making processes. The Service Block also helpful different levels of government. The objective of G2G is to
to simplify complex government process in which too many build new relationships between different departments of
offices and manpower required. The final center of attention government. These relationships help collaboration between
will be on efficient and well-organized delivery of govern- levels of government, and reform state and local govern-
ment services [14]. The commonly used services are informa- ments to convey better services to the citizen.
tion access, making payments, submitting complaints and
downloading forms for some purpose. 4 Data Mining Tool
For the idea of testing the framework, it is necessary to pro-
vide at least one data mining tool to work with. The present
investigation adopted WEKA as Data Mining Tool [3]. ther treated together under one policy.

It contains tools for a whole range of data mining tasks like

Data pre-processing, Classification, Clustering, Association
and Visualization [4]. It is Open Source Software, has stable
releases, is well documented. It is experimental in nature and
it offers the ability to be extended. It provides an excellent
graphical user interface. It takes database in ARFF or CSV
formats [5].

Fig. 5: Comparison between number of Primary and Upper Primary


Fig. 4 : Different Views of WEKA Tool

4.1 Data Mining by using Data Visualization

Data mining by using Data Visualization is a method in
which various trends in databases may be visualized by us-
ing graphs and charts [18]. Following issues are analyzed by
using Data Visualization. Fig.6: Clusters based on number of Enrollment in Govt. and Private
The analysis indicates that there is large difference in
number of primary schools and upper primary schools.
The Data Mining approach based on clustering clearly
There must be one Upper Primary School for two Primary
indicates significant variations between clusters of districts
School. But it is not actually present.
from another cluster. However the cluster approach could be
This will also obvious that, for maintaining the ratio of sharpen when data for each district- rural, urban; category
number of primary school to number of upper primary wise-general, OBC, SC,ST, handicapped-visually impaired,
school as two, more number of upper primary school will hearing impaired, mentally retarded are classified on the
have to be opened. The data further indicates that the drop basis of social security number to have qualitative approach
out after primary school is or than the expected range. It is to entire planning and implementation of “Education for all”
apparent from the data mining that the growth in number of program.
primary school has not been uniform. This main reason may
be may the duplication of records. So, in order to remove any Decision tree and IF THAN Rules are used for Classifica-
possibility in duplication of data, allotment of social security tion [78]. In this study various Districts are classified accord-
number to each citizen or student is very important. ing to their Literacy Rate, Growth Rate and available re-
sources. The above classification is based on reported Gross
4.2 Data Mining by using Clustering Enrollment Ratio (GER). However, Mahoba, Ambedkar Na-
Clustering is a Data Mining approach which creates clusters gar, Lalitpur, Pratapgarh, Barabanki is placed in very good
of data items within a data set. Clusters are closed occur- class where Gross Enrollment Ratio (GER) is between 101 to
rence of data items under the consideration of certain para- 118.99 and Lucknow, Varanasi, Meerut, Gaziabad, Allaha-
meters [19]. These clusters further represent similar groups. bad, Gautam Buddha Nagar find a place in very poor catego-
In this study raw data of education for Uttar Pradesh, India ry where Gross Enrollment Ratio (GER) is in between 45 to
has been taken. The database has 70 instances, which 60.99. It appears that the above position is due to migration
represents all 70 districts of Uttar Pradesh. In the proposed of learners from one district to another district, where they
approach various districts may be clustered according to find better educational facilities. The Data Mining could also
their similarity. These groups of districts as clusters may fur- establish the impact of migration from one district to another
when all the students are given unique identification which is based on experiences as well as quicker data analy-
through social security number. The Data in Data Warehouse sis methods.
based on social security number will eliminate any scope for
duplication and obviously the Data Warehouse developed The study shows that in top 20 competitive nations in
on the basis of social security number will be more reliable education, Sweden, Japan, USA, Norway and Canada are in
and dependable for strategic planning for improving the very good positions. All these countries are using Data Min-
percentage education in primary sector through “Education ing techniques for studying, monitoring and evaluating dif-
for All” scheme. ferent ongoing projects for the development of future stra-
tegic planning. Previously it was understood that the coun-
tries having better education level were also having better
GDP factor. But, recent studies have found that increases in
educational achievements are not linked to the economic
growth. It is also found that the primary level of education is
not going to affect on economic growth of the country.

The importance of Data Warehouse and effective Data

Mining should be obvious especially when there is delay
practically in all the developmental activities which general-
ly fail in achieving the target as per schedule. The Data
Warehouse and Data Mining technique will have to be
rooted through dynamic process to ensure implementation
as per schedule. The Data Warehouse and Data Mining will
also ensure the efficacy of monitoring, control and evalua-
tion, as integrating tool to achieve the target. The frequency,
intensity, sensitivity of monitoring and control will have to
be in dynamic mode all the time to ensure completion of the
task as per targeted schedule.

Fig.7: Categorization of district according to Gross Enrollment Ratio by In fact Data Mining with Data Warehousing should be an
using Decision Tree ongoing process. It should be integrated with strategic futu-
ristic planning of the entire government. The analysis
5 Conclusion through Data Mining would clearly establish the strong and
Indian scenario is converting now in the form of an efficient, weak areas of planning and implementation of the whole
accountable and transparent society. It is essential that all government process. However, it would take some time to
government functions use ICTs to provide better interfaces develop appropriate Data Warehouse of the past data to car-
or interactions for the public at state and central level. It in- ry out qualitative analysis on the basis of Data Mining tech-
dicates that appropriate software has to be developed which niques.
includes common practices related with government func-
tions. Data Warehousing and Data Mining has been estab- The entire process of Data Warehouse development for
lished to be an excellent option for speeding up reporting any application may be based on the basis of unique identifi-
and integrating data from various department of any gov- cation of critical species, i.e., the citizen of the nation with no
ernment. duplication of the process. Similarly, since district is the cen-
ter of implementation, all the development action, regulatory
The use of Data Mining in government department function of various departments, as well as social welfare
presents several potential advantages for better administra- activities should be quantitatively associated with the unique
tion, including timely access to evaluate data. Different de- identification with each development activities so that all the
partments may quickly identify troublesome trends in its developmental activities are completed as per targeted date
functions and evaluate why they are occurring.The various for the utilization by their stakeholders.
departments may associate this information with trends in
[1]. Junfeng Pan, et al., “Cost-Sensitive Data Preprocessing for Mining
their future policies. Customer Relationship Management Databases”, This paper ap-
pears in: Intelligent Systems, Publication Date: Jan.-Feb. 2007, Vo-
The use of Knowledge Discovery in Databases allows an lume: 22, Issue: 1
individual department to use this information in making On page(s): 46-51
appropriate decisions and enhance the working methodolo- [2]. “WEKA 3: Data Mining Software in Java”, Retrieved March 2007
gies. This, unquestionably, translates into increased efficien- from`
[3]. Usman Muhammad Anwar, et al. “Multi-Agent Based Semantic E-
cy, higher progress rates, and economical society.
Government Web Service Architecture” IEEE/WIC/ACM Interna-
tional Conferences on Web Intelligence and Intelligent Agent Technolo-
Along with the development of the relatively new E Go- gy - Workshops (2006) pp. 599-604.
vernance Model based on Data Mining and Data Warehous- [4]. Gregory B. White et al. “Introduction to the 2006 Minitrack on E-
ing, it is also important to determine multiple rules and poli- Government Security” Proceedings of the 39th Hawaii Internation-
cies for future implementation and better administration al Conference on System Sciences - 0-7695-2507-5/06/$20.00 (C) 2006
IEEE ieeex-
WWW.JOURNALOFCOMPUTING.ORG 40 versity, Varanasi, and Post Doctoral degree at University of Michigan,
. USA. He worked as a Reader/Lecturer in Chemical Engineering, Bana-
[5]. Graham Williams, Data Mining Desktop Survival Guide ras Hindu University, Varanasi, India, Director, Institute of Engineering & html. Technology, Lucknow, India and Founder Vice-Chancellor, JRH Universi-
ty, Chitrakoot, India. His research interest includes ERP, E Governance,
[6]. Ruey-Chyi Wu, Ruey-Shun Chen, Chen, C , “Data mining applica-
Data Mining and Envionmental Science and Engineering.G.N. Pandey is
tion in customer relationship management of credit card business”. the author of 12 books and more than 200 research papers.
Computer Software and Applications Conference 2005. COMPSAC
2005. 29th Annual International Volume 2, Issue , 26-28 July 2005
Page(s): 39 - 40 Vol.
[7]. “About Kiosk”, E Governance of Government of West Bengal,
Retrieved December 2006
[8]. U.S. General Account Office (GAO) “Data Mining Federal Efforts
Cover a Wide Range of Uses” GAO-04-548,
[9]. United States General Accounting Office Report to Congressional
Committees “Aviation Security, Computer-Assisted Passenger
Prescreening, Faces, Significant Implementation, Challenges”
[10]. Krouse William J CRS Report for Congress Received through the
CRS Web Order Code RL32536 “The Multi-State Anti-Terrorism In-
formation Exchange (MATRIX) Pilot Project”
[11]. Salazar, A, Gosalbez, J, Bosch, I Miralles, R Vergara, “A case study
of knowledge discovery on academic achievement, student deser-
tion and student retention”, Information Technology: Research and
Education, 2004. ITRE 2004. 2nd International Conference on Vo-
lume, Issue, 28 June-1 July 2004 Page(s): 150 – 154
[12]. Thomas Zwahr and Matthias Finger, “Enhancing the e-Governance
model: Enterprise Architecture as a potential methodology to build
a holistic framework” Proceedings of the International Conference on
Politics and Information System: Technologies and Applications. Orlan-
do, Florida, USA
[13]. Riley Thomas B. International Tracking Survey Report ‘03 Number
Two “Knowledge Management and Technology”
/intlrackingRpt June03no2.pdf
[14]. Dunham, M.H. , “Data mining introductory and advanced topics” Up-
per Saddle River, NJ: Pearson Education, Inc.
[15]. Report to Congress “Terrorism Information Awareness Program”
In response to Consolidated Appropriations Resolution, 2003, Pub.
L. No. 108-7, Division M, § 111( b)
[16]. Goharian and Grossman, “Data Mining Classification”, Illinois
Institute of Technology,
[17]. Mack Gregory, “Total Information Awareness program (TIA)”
System Description Document Version 1.1,
[18]. Bob Mann, et al. “Scientific Data Mining, Integration, and Visuali-
zation” UK e-Science Technical Report Series ISSN 1751-5971
[19]. Jain A.K, Murty M.N., Flynn P.J., “Data Clustering: A Review”
ACM Computing Surveys, 31, 3:264-323.
[20]. Apte C. & Weiss S.M. “Data Mining with Decision Trees and Deci-
sion Rules” T.J. Watson Research Center

Sonali Agarwal is a lecturer in Indian Institute of Information Techology,

Allahabad, India. She received her bachelor Degree in Electrical Engi-
neering in 1997 at Bhilai Institute of Technology, India and her Masters
Degree in Computer Science at the Motilal Nehru National Institute of
technology, Allahabad, India in 2000. Her research interests include
Data Mining, Data Warehousing, E Governance, Knowledge Manage-
ment and Support Vector Machine.

G. N. Pandey is Adjunct professor in Indian Institute of Information

Techology, Allahabad, India. He received his bachelor degree in Chemi-
cal Engineering in 1962 at Banaras Hindu University, Varanasi, India and
his Masters Degree at Indian Institute of Technology, Kharagpur, India in
1963. He received his Doctoral degree in 1966 at Banaras Hindu Uni-