Studies on E Governance in India using Data Mining Perspective
Ms. Sonali Agarwal, and Prof. G.N. Pandey

Abstract— The fast expansion, exploitation and propagation of the innovative and promising Information and Communication Technologies (ICTs) indicate new opportunities for growth and development. Data Mining is a well established approach of discovering knowledge from databases for the purpose of Knowledge Management. There is large number of data and information generated and collected by the different levels of governments. In case of government, proper decision making is important to better utilization of all resources. Data Mining could help administrators to extract valuable knowledge and practices out of this voluminous data, which can be used to obtained knowledge and practices for strategically reducing costs and increasing organization expansion opportunities and also detect fraud, waste and abuse. The present investigation taken Education Data related with primary education in order to analyze status of primary education in Allahabad and in Uttar Pradesh, India. Clustering and Classification methods are used to find out similarity or dissimilarity among various districts of Uttar Pradesh. This will create groups of districts as clusters so that these districts may further treated together under one policy. Classification method is based on reported Gross Enrollment Ratio (GER). In this method some unusual classification of district highlighted that the Data Mining could also establish the impact of migration from one district to another when all the students are given unique identification through social security number.

Index Terms—Information and Communication Technologies, Knowledge Management, Data Mining, Clustering, Classification

——————————  —————————— (i) Efficient methods for capturing, storing and han1 INTRODUCTION dling government data collected from various resources over a period of time. ata Mining is a process of Knowledge Discovery in(ii) Efficient Knowledge Management for improved includes methods used to recognize, generate, represent ternal processes, government policies and programs and distribute knowledge for better utilisation of any on the basis of historical data stored in its databases. system. There is large number of data and information generated and collected by the different levels of governments. The present work proposes an E Governance model In case of government, proper decision making is important framework based on Data Mining and Data Warehousing to better utilization of all resources. Data Mining could help techniques which may be efficiently used by the government administrators to extract valuable knowledge and practices at all its administrative levels Nationout of this voluminous data, which can be used to obtained al/State/District/Block).The proposed Model serves all knowledge and practices for strategically reducing costs and possible aspects of E Governance with the help of four basic increasing organization expansion opportunities and also building blocks: detect fraud, waste and exploitation.  Administrative Block  Technical Know How Block The research work is aimed to represent the potential of  Service Block data mining in the context of smart techniques of E Governance. Data Mining provides efficient techniques for gov Stakeholder Block ernment agencies to analyze data quickly and with lesser economic efforts [1]. The data extraction process generates interesting hidden patterns. The discovered hidden patterns 2 RELETED WORK enable the government systems in making better decisions There is an extensive range of Data Warehousing and Data and having a more advanced plan in serving the citizens [8]. Mining applications in government’s regulatory, developHere we are representing an E Governance Model based on mental and social welfare organization. The followings are Data Mining and Data Warehousing to facilitate some examples reported in different literatures.


 Ms. Sonali Agarwal is with the Indian Institute of Information Technology, Allahabad, U.P., India  Prof. G.N. Pandey is with the Indian Institute of Information Technology, Allahabad, U.P., India


The project Total Information Awareness (TIA) was launched by the US government after the terrorist attack of 9/11. The objective of Total Information Awareness (TIA) was to search large data and determine associations and pat-



terns related with terrorist activities. The project conducted discovery of associations among transactions such as work permits, credit card, airline tickets, passports, visas, rental cars, gun purchases, driver’s license and events such as arrest or doubtful activities [17][15]. CAPPS is known as Computer Assisted Passenger PreScreening System. It is a prescreening system initiated by the Department of Homeland Security US. It is implemented to check all airline passengers against a database of commercially available information. After checking it provides a risk color or status to each passenger. CAPPS collect information provided by the passenger for example Paasenger’s name, permanent address, contact number etc. These records are then given to commercial data providers for assessment of the validity of the passenger and passenger’s correlation with other events. The commercial data provider would assign a numerical score back to the owning system indicating a particular risk level. The passengers having “green” score is considered as normal and safe passenger. The passengers having “yellow” score then they would have to face second level screening test. The passengers having “red” score is considered as high risk passenger and high risk passengers may not be allowed for traveling and they must be further enquired about their identity and purpose of travelling [9]. In May 2004, a report on federal data mining activities indicates that US government agencies have very well adopted the data mining practices in e governance. Currently there are 199 data mining projects ongoing in various stages. Studies indicates that the government is also running some undisclosed data mining projects for example national security aagency's eavesdropping project and state level security project matrix [10]. There are several research work published in the field of model-building phase of the Data Mining process. A paper based on Data Mining application for income tax department discusses how to build a Data Mining algorithm centered application for the regulation of different government activities [13]. Main concern of this paper includes architecture of Data Mining based application, working methodology and the integration of knowledge of domain experts. A Data Mining tool iHealth was developed by a health organization CSIRO. It provides a web based interface for Data Mining and Data analyses tool for large health related databases. The tool provides various clustering and classification methods to identify patients having certain specific profiles. The patients’ profile could be visualized by using various visualization techniques [6]. A paper presented a Data Mining based approach to study about student performance and dropout rate [11]. The method used Clustering and Decision Rule Data Mining techniques to identify collection of clusters, which have been helpful to understand the nature of data. A Data Mining based approach is discovered to classify the selected customers into clusters using Recency, Frequency, Monetary value (RFM) model to identify high-profit, gold customers. Associ-

ation rules may establish the similarity, difference between customer’s behaviors [6].


The proposed E Governance model covers all important aspect of E Governance in a single model. There are four Basic Building Blocks of proposed E Governance Model. The lowest block is the Administration Block, which regulates the overall function of any country through efficient government.

Fig. 1: Basic Building block of the Proposed E Governance Model

The overall regulation of government bodies may be carried out by using appropriate Technical know how. The Technical know how block includes computerization of manual processes, commonly agreed technological standard, Database related applications and easy access of information. The third block is Service Block, which includes all available operations of the E Governance. It provides an interface between user and government system. The upper block is Stakeholder Block, which has various categories of users working with the system. The user categories may be a Citizen, Business organization or any Government organization [13].

3.1 Module 1: Administration
Administration is a way of management of any working system supervised by an administrator. In any democratic system the administration may be governed by a structured body name as government. The term Governance is basically the responsibility of a Government which includes each and every processes performed by the government body. The main activity of the government is to controll the working of different departments for exmaple Finance, Health, Education, Agriculture, Employment etc. All these activities are now maintained efficiently by using ICT. The transformation of the working from conventional methods to modern methods of Information Technology (IT) is now known as E Government. The use of ICT in government activities have given a new idea of governance knows as E Governance.



3.1.1 Salient features of the proposed model
The purpose of E Governance is to establishing good governance and have seamless coordination between government authorities, public and business parties. The utilization of ICT may join all three different sectors and support development and management. Therefore, following are the salient features of the proposed model. TABLE 1 SALIENT FEATURES OF THE PROPOSED MODEL

Centralized E Governance models have a single interface for its different users and these models could be easily enforced.

Decentralized Model Decentralized model is required at lower level so that various projects can be handled saparatelty from initiation to executation [3]. There are following features of Decentralized E Governance model.  All government functions could be distributed among various divisions or organizations.  Generally has a high coordination cost.

3.1.2 State level Model of E Governance
The State level model is based on the combination of both centralized and decentralized approaches. In State level, State government becomes the main coordinator of the project and lower government offices with their departments become the partners of that project. Figure 1.2 describes horizontal and vertical interconnections of E Governance.

1. 2. 3.


To provide proper information and awareness to the citizen about the political practices and choices available. To provide online services and active participation for different citizen services. To utilize ICT in government functions, that provides quick and well-organized communication with the people, business and other agencies. To provide better decision-making through greater decentralization of governance [4].
Fig.2 : Horizontal and Vertical interconnection for E Governance

The proposed model is based on ICT, which may reform organizational structures in both centralized as well as decentralized manner. These approaches of E Government have their own set of advantages and disadvantages. Centralized Model Centralize government initiatives are favorable as portals and services to reduce cost and integration issues. Centralize government initiatives may share technical, financial and human resources. A Single portal access is very useful for any end user because all the information may be centrally available here. There are following features of Centralized E Governance model.  All government process based on ICTs are centralized in one organizational unit.  Generally limited Infrastructural and set up costs but less effective.

 

Certain important decisions are jointly made and then standardized across the various levels. Responsibilities as well as capabilities are decentralized at different government departments/levels, with infrastructure and output sharing across the State as a system. Generally, high E Governance set up costs but more responsive to stakeholder needs. Higher level committees are formed to manage various Government activities. These committees have authority to control the functioning of large area.

Intra-department or horizontal and vertical collaborations are very essential for success of any E Governance project. It is very necessary to perform governance functions, share information and deliver services to all stakeholders. These collaborations depend on issues like what are the different



types of intra- department collaborations exist in E Governance and why intra- department collaborations are important [4]. 3.2 Module 2: Technical Know How For E Governance, there are many applications need to be automated. Various departments seek computerization and other technological transformation of their working strategies. Now it is necessary to conceptualize the whole approach and develop a standard framework and protocols for the regulation of all E Governance activities. The proposed Model uses Data Mining and Data Warehousing for improving the service performance of the E Governance system.

3.4 Module 4 Stakeholder Block
Stakeholder is an individual person, group of persons or a community having common area of interest and commonly affected by any system. Here E Governances has a wide rage of stakeholders. The main groups are identified in 3 parts.

3.2.1 Case Study: Data Mining in Department of Education
Education related organizations are major application area for Data Mining since it collect large amount of data on students enrollment, courses taught, students academic record history etc. The data collection trend is also increasing because of the availability and popularity of courses taught. Today many institutions also have websites where students may study online. Educational Data Mining may help identify student academic performance, discover student’s behavior regarding selection of subjects. These patterns and trends may further improve the quality of education, achieve better student admission and satisfaction, and enhance good academic practice and policies. Data Mining algorithms are used to distinguish different set of data by using the test data. For example an algorithm identifies characteristics that distinguish students who took out a particular kind of study loan from those who did not. Finally, it predicts rules regarding issuance of study loan. The rule is based on the attributes of the previous good students who are successfully paid their loans. These rules are further used to recognize such students on the remaining of the database. In the same way, various algorithms are implemented to convert the database into clusters of students with several similar attributes and this may certainly reveal interesting and unexpected patterns. The patterns of the clusters are further interpreted by the experts, in collaboration with institutions personnel.

Fig.3: Data Mining in different Government Department by using Distributed Databases

3.4.1 Citizen
Citizen is associated with the E Governance by using Government to Citizen (G2C) interface. Government to Citizen (G2C) interface is an online interaction between government and private individuals.

3.4.2 Business
Business is associated with the E Governance by using Government to Business (G2B) interface. Government to Business (G2B) interface is important because various trades and business related transactions are required by the government for the regulatory purpose.

3.3 Module 3 : Service Block
In the service block, services of E Governance as end results are provides to the citizens for betterment of their lives. It also provides an interface so that a common citizen may participate in decision making processes. The Service Block also helpful to simplify complex government process in which too many offices and manpower required. The final center of attention will be on efficient and well-organized delivery of government services [14]. The commonly used services are information access, making payments, submitting complaints and downloading forms for some purpose.

3.4.3 Government
Various governments departments are associated with one other by the means of E Governance by using Government to Government (G2G) interface. It provides online interaction of different levels of government. The objective of G2G is to build new relationships between different departments of government. These relationships help collaboration between levels of government, and reform state and local governments to convey better services to the citizen.


Data Mining Tool

For the idea of testing the framework, it is necessary to provide at least one data mining tool to work with. The present



investigation adopted WEKA as Data Mining Tool [3]. It contains tools for a whole range of data mining tasks like Data pre-processing, Classification, Clustering, Association and Visualization [4]. It is Open Source Software, has stable releases, is well documented. It is experimental in nature and it offers the ability to be extended. It provides an excellent graphical user interface. It takes database in ARFF or CSV formats [5].

ther treated together under one policy.

Fig. 5: Comparison between number of Primary and Upper Primary School

Fig. 4 : Different Views of WEKA Tool

4.1 Data Mining by using Data Visualization
Data mining by using Data Visualization is a method in which various trends in databases may be visualized by using graphs and charts [18]. Following issues are analyzed by using Data Visualization. The analysis indicates that there is large difference in number of primary schools and upper primary schools. There must be one Upper Primary School for two Primary School. But it is not actually present. This will also obvious that, for maintaining the ratio of number of primary school to number of upper primary school as two, more number of upper primary school will have to be opened. The data further indicates that the drop out after primary school is or than the expected range. It is apparent from the data mining that the growth in number of primary school has not been uniform. This main reason may be may the duplication of records. So, in order to remove any possibility in duplication of data, allotment of social security number to each citizen or student is very important. 4.2 Data Mining by using Clustering Clustering is a Data Mining approach which creates clusters of data items within a data set. Clusters are closed occurrence of data items under the consideration of certain parameters [19]. These clusters further represent similar groups. In this study raw data of education for Uttar Pradesh, India has been taken. The database has 70 instances, which represents all 70 districts of Uttar Pradesh. In the proposed approach various districts may be clustered according to their similarity. These groups of districts as clusters may fur-

Fig.6: Clusters based on number of Enrollment in Govt. and Private Schools

The Data Mining approach based on clustering clearly indicates significant variations between clusters of districts from another cluster. However the cluster approach could be sharpen when data for each district- rural, urban; category wise-general, OBC, SC,ST, handicapped-visually impaired, hearing impaired, mentally retarded are classified on the basis of social security number to have qualitative approach to entire planning and implementation of “Education for all” program. Decision tree and IF THAN Rules are used for Classification [78]. In this study various Districts are classified according to their Literacy Rate, Growth Rate and available resources. The above classification is based on reported Gross Enrollment Ratio (GER). However, Mahoba, Ambedkar Nagar, Lalitpur, Pratapgarh, Barabanki is placed in very good class where Gross Enrollment Ratio (GER) is between 101 to 118.99 and Lucknow, Varanasi, Meerut, Gaziabad, Allahabad, Gautam Buddha Nagar find a place in very poor category where Gross Enrollment Ratio (GER) is in between 45 to 60.99. It appears that the above position is due to migration of learners from one district to another district, where they find better educational facilities. The Data Mining could also establish the impact of migration from one district to another



when all the students are given unique identification through social security number. The Data in Data Warehouse based on social security number will eliminate any scope for duplication and obviously the Data Warehouse developed on the basis of social security number will be more reliable and dependable for strategic planning for improving the percentage education in primary sector through “Education for All” scheme.

which is based on experiences as well as quicker data analysis methods. The study shows that in top 20 competitive nations in education, Sweden, Japan, USA, Norway and Canada are in very good positions. All these countries are using Data Mining techniques for studying, monitoring and evaluating different ongoing projects for the development of future strategic planning. Previously it was understood that the countries having better education level were also having better GDP factor. But, recent studies have found that increases in educational achievements are not linked to the economic growth. It is also found that the primary level of education is not going to affect on economic growth of the country. The importance of Data Warehouse and effective Data Mining should be obvious especially when there is delay practically in all the developmental activities which generally fail in achieving the target as per schedule. The Data Warehouse and Data Mining technique will have to be rooted through dynamic process to ensure implementation as per schedule. The Data Warehouse and Data Mining will also ensure the efficacy of monitoring, control and evaluation, as integrating tool to achieve the target. The frequency, intensity, sensitivity of monitoring and control will have to be in dynamic mode all the time to ensure completion of the task as per targeted schedule.

Fig.7: Categorization of district according to Gross Enrollment Ratio by using Decision Tree



Indian scenario is converting now in the form of an efficient, accountable and transparent society. It is essential that all government functions use ICTs to provide better interfaces or interactions for the public at state and central level. It indicates that appropriate software has to be developed which includes common practices related with government functions. Data Warehousing and Data Mining has been established to be an excellent option for speeding up reporting and integrating data from various department of any government. The use of Data Mining in government department presents several potential advantages for better administration, including timely access to evaluate data. Different departments may quickly identify troublesome trends in its functions and evaluate why they are occurring.The various departments may associate this information with trends in their future policies. The use of Knowledge Discovery in Databases allows an individual department to use this information in making appropriate decisions and enhance the working methodologies. This, unquestionably, translates into increased efficiency, higher progress rates, and economical society. Along with the development of the relatively new E Governance Model based on Data Mining and Data Warehousing, it is also important to determine multiple rules and policies for future implementation and better administration

In fact Data Mining with Data Warehousing should be an ongoing process. It should be integrated with strategic futuristic planning of the entire government. The analysis through Data Mining would clearly establish the strong and weak areas of planning and implementation of the whole government process. However, it would take some time to develop appropriate Data Warehouse of the past data to carry out qualitative analysis on the basis of Data Mining techniques. The entire process of Data Warehouse development for any application may be based on the basis of unique identification of critical species, i.e., the citizen of the nation with no duplication of the process. Similarly, since district is the center of implementation, all the development action, regulatory function of various departments, as well as social welfare activities should be quantitatively associated with the unique identification with each development activities so that all the developmental activities are completed as per targeted date for the utilization by their stakeholders.
[1]. Junfeng Pan, et al., “Cost-Sensitive Data Preprocessing for Mining Customer Relationship Management Databases”, This paper appears in: Intelligent Systems, Publication Date: Jan.-Feb. 2007, Volume: 22, Issue: 1 On page(s): 46-51 [2]. “WEKA 3: Data Mining Software in Java”, Retrieved March 2007 from` [3]. Usman Muhammad Anwar, et al. “Multi-Agent Based Semantic EGovernment Web Service Architecture” IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops (2006) pp. 599-604. [4]. Gregory B. White et al. “Introduction to the 2006 Minitrack on EGovernment Security” Proceedings of the 39th Hawaii International Conference on System Sciences - 0-7695-2507-5/06/$20.00 (C) 2006 IEEE ieeex-

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ 40 WWW.JOURNALOFCOMPUTING.ORG versity, Varanasi, and Post Doctoral degree at University of Michigan, USA. He worked as a Reader/Lecturer in Chemical Engineering, Bana. ras Hindu University, Varanasi, India, Director, Institute of Engineering & [5]. Graham Williams, Data Mining Desktop Survival Guide Technology, Lucknow, India and Founder Vice-Chancellor, JRH Universi html. ty, Chitrakoot, India. His research interest includes ERP, E Governance, [6]. Ruey-Chyi Wu, Ruey-Shun Chen, Chen, C , “Data mining applicaData Mining and Envionmental Science and Engineering.G.N. Pandey is tion in customer relationship management of credit card business”. the author of 12 books and more than 200 research papers. Computer Software and Applications Conference 2005. COMPSAC 2005. 29th Annual International Volume 2, Issue , 26-28 July 2005 Page(s): 39 - 40 Vol. [7]. “About Kiosk”, E Governance of Government of West Bengal, Retrieved December 2006 [8]. U.S. General Account Office (GAO) “Data Mining Federal Efforts Cover a Wide Range of Uses” GAO-04-548, [9]. United States General Accounting Office Report to Congressional Committees “Aviation Security, Computer-Assisted Passenger Prescreening, Faces, Significant Implementation, Challenges” [10]. Krouse William J CRS Report for Congress Received through the CRS Web Order Code RL32536 “The Multi-State Anti-Terrorism Information Exchange (MATRIX) Pilot Project” [11]. Salazar, A, Gosalbez, J, Bosch, I Miralles, R Vergara, “A case study of knowledge discovery on academic achievement, student desertion and student retention”, Information Technology: Research and Education, 2004. ITRE 2004. 2nd International Conference on Volume, Issue, 28 June-1 July 2004 Page(s): 150 – 154 [12]. Thomas Zwahr and Matthias Finger, “Enhancing the e-Governance model: Enterprise Architecture as a potential methodology to build a holistic framework” Proceedings of the International Conference on Politics and Information System: Technologies and Applications. Orlando, Florida, USA [13]. Riley Thomas B. International Tracking Survey Report ‘03 Number Two “Knowledge Management and Technology” /intlrackingRpt June03no2.pdf [14]. Dunham, M.H. , “Data mining introductory and advanced topics” Upper Saddle River, NJ: Pearson Education, Inc. [15]. Report to Congress “Terrorism Information Awareness Program” In response to Consolidated Appropriations Resolution, 2003, Pub. L. No. 108-7, Division M, § 111( b) [16]. Goharian and Grossman, “Data Mining Classification”, Illinois Institute of Technology, [17]. Mack Gregory, “Total Information Awareness program (TIA)” System Description Document Version 1.1, pdf [18]. Bob Mann, et al. “Scientific Data Mining, Integration, and Visualization” UK e-Science Technical Report Series ISSN 1751-5971 [19]. Jain A.K, Murty M.N., Flynn P.J., “Data Clustering: A Review” ACM Computing Surveys, 31, 3:264-323. [20]. Apte C. & Weiss S.M. “Data Mining with Decision Trees and Decision Rules” T.J. Watson Research Center ith_cover.pdf

Sonali Agarwal is a lecturer in Indian Institute of Information Techology, Allahabad, India. She received her bachelor Degree in Electrical Engineering in 1997 at Bhilai Institute of Technology, India and her Masters Degree in Computer Science at the Motilal Nehru National Institute of technology, Allahabad, India in 2000. Her research interests include Data Mining, Data Warehousing, E Governance, Knowledge Management and Support Vector Machine. G. N. Pandey is Adjunct professor in Indian Institute of Information Techology, Allahabad, India. He received his bachelor degree in Chemical Engineering in 1962 at Banaras Hindu University, Varanasi, India and his Masters Degree at Indian Institute of Technology, Kharagpur, India in 1963. He received his Doctoral degree in 1966 at Banaras Hindu Uni-

Sign up to vote on this title
UsefulNot useful