You are on page 1of 91

CHAPTER 1

INTRODUCTION

This chapter provides the overview of this research project and discussed about research background, problem statement, objectives of the research, research scope and significance of the research.

1.1

Research Background

E-filing provides access to large database that consist list of electronic files. According to Olson, Edwards and Monty (2003), e-filing is a highly secure and reliable method for sending, receiving and managing legal documents. This is because, it takes time to find needed files manually and e-filing provides secured access to identify needed files easily without searching manually at huge shelf. Olson et al. (2003) also stated that state courts, federal courts and law firms across the country are using e-filing more and more to improve access to documents, maximize resources and streamline filing and service activities. It is much easier to know status of the needed files and identified location of the files before going through to the real files.

The purpose of this research is to develop a prototype of e-filing web-based system for Majlis Daerah Kerian. Majlis Daerah Kerian, Parit Buntar, Perak act as local government which is a government unit that is closest to the citizens and these includes municipalities, local authorities, town councils and city councils. There are eight departments in Majlis Daerah Kerian which is Law and Administration Unit, Assessment Unit, Information Technology Unit, Account and Finance Unit, License and Parking Unit, Town Service Unit, Garden and Recreation Unit, and Building Unit.

Within the e-filing web-based system, staffs easily gather information about status of the files and identify suitable files that meet their requirement. The system is developed using data mining technique specifically clustering technique. According to Phyu (2009), data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large database set. This is because data mining not even consists of more than collection and managing data, but also includes analysis and prediction. Garofalakis, Rastogi, Seshadri and Shim (1999) stated that there are three popular data mining techniques which are association rules, classification and clustering. This research identified suitable searching method using data mining techniques either association, classification or clustering techniques in order to develop a prototype of efiling web-based system.

1.2

Problem Statement

The staffs in Majlis Daerah Kerian face difficulties in managing and identifying needed files that meet their requirement. This is because, it is difficult to search needed files manually. According to Mrs. Shalina, Administrative Assistant of Majlis Daerah Kerian, there are many steps to search files manually which is : a. Searching suitable number of file that required by using a log book. b. Determine file name by using file number. c. Check needed file on many big shelves that required long time. d. Surveying on each staffs table or other department in Majlis Daerah Kerian if the file is not on the shelf.

All this steps will create barriers in order to give best respond for each action. By developing this system, staff can find the file that satisfies their needs so that it will create interactive environment for them.

1.3

Aim

The aim for this research project is to provide a suitable searching method using data mining techniques for e-filing web-based system.

1.4

Objective of the Research

To achieve the aim of the project above, the objective can be divided into four. The objectives are:

a. To identify requirements that will be needed for E-Filing from Majlis Daerah Kerian. b. To identify the searching method based on data mining techniques. c. To design e-filing web-based system. d. To demonstrate e-filing web-based system using identified data mining technique.

1.5

Significance of Research

The significance of this development is that this system can be used by staff in Majlis Daerah Kerian. E-filing will act as an information center for staff to gather information about status of the files. Besides that, it also provides staff with interactive environment in making their choice in determining the suitable files that meets their requirement.

1.6

Scope of Study

The e-filing web-based system is developed using PHP with MySQL database. The development is for Majlis Daerah Kerian, Parit Buntar, Perak and focused on filing management only. This is a web-based application that can be accessed via browser and will be used internally by Majlis Daerah Kerians employees.

1.7

Limitation

The important task carried out in this study is to gather information from staffs in Majlis Daerah Kerian who are involved in filing management. It is conducted through the interview that requires arranging schedules and need the right interviewee to gain the proper and effective interview sessions.

Conducting the interview time is the main constraint. This is because, the researcher have to reschedule the interview when the interviewee canceled the interview session. It is difficult for the researcher to gather all of the information and possibility of missing some important information. Interview session was conducted at Majlis Daerah Kerian, Parit Buntar, Perak.

Another limitation is that there are three different data mining techniques, but researcher must select the best data mining technique that suite with the objective. Researcher need to study properly for each data mining techniques and come out with the related journals that support the findings.

Next, there are a large number of data mining tools available, but not all the tools support different kind of data mining techniques. So researcher need to study the tools based on their function and usability with the selected techniques. Furthermore, the tool used in this research is new to the researcher so that requires time to familiarize with the tool.

Experience of the researcher is another limitation factor of the research. This is the first research for the researcher. However, researcher can learn and have the proper guide based on the research plan and instruction from the supervisor and examiner.

1.8

Outcomes/Deliverables

The outcome from the research project is a suitable searching method using data mining technique for e-filing web-based system.

1.9

Layout of Dissertation

This research project has both a theoretical and practical part. The theoretical part will describes the concepts and literature review of the e-filing and data mining techniques. The practical part consists of an analysis of data gathered from the interview session and secondary data from literature review.

The remaining chapters of this research are:

Chapter 2 is about the literature review on the e-filing and data mining techniques. These literatures will act as a reference for this research project.

Chapter 3 describes the research approach and methodology used in this research project. The choice of method, how data is gathered and the strategy used to perform an analysis of the data are explained.

Chapter 4 discusses the construction of the systems prototype. Chapter 5 discusses the findings and the analysis from the interview sessions and secondary data.

Chapter 6 provides suggestion of conclusion and recommendations for further research.

1.10 Summary

This chapter explains the background of the problem and its proposed solution together with a brief explanation of the solution. The important aspects of the projects such as research background, objectives of the project, scope of the project and significance of the project are included in this chapter. The methodology diagram as shown in Figure 3.1 in Chapter 3 and other contents of this chapter will be used in the following chapter as the basis for direction.

The next chapter discusses the literature review for the research project.

CHAPTER 2

LITERATURE REVIEW

2.1

Introduction

This chapter describes in detail the related literatures to support the research project. Literature review also clarifies the relationship between the study and previous work conducted on the topic. This chapter covers overview of efiling and data mining, brief explanation for each technique in data mining and steps in selecting data mining tools.

2.2

E-Filing

2.2.1 Introduction to E-Filing

E-Filing provides access to large database that consist list of electronic files. According to Olson et al. (2003), e-filing is a highly secure and reliable method for sending, receiving and managing legal documents. Besides, e-filing is a highly secure and reliable method for sending, receiving and managing legal documents and case information. However, the rules to implement e-filing need to be fully understand in order to achieve the best filing.

2.2.2 Purposes of the Rules in E-Filing

According to Olson et al. (2003), there are reason why rules are important for electronic filing :

To define the electronic filing system : Electronic filing and services can mean anything. So, the exact information regarding type of files must clearly defined in order to provide guidance for where and how to access the files.

To authorize electronic filing and service : Rules of procedure are very specific when it comes to defining the mechanical rules of filing. The valid method for delivering document into right files need to identify for the best filing.

To clearly specify the procedural mechanics : How to file electronically, security, service and filing deadlines, and how to sign documents electronically can more easily for simplicity and to avoid complexity.

To encourage use of electronic filing : Electronic filing looks new to some people and training is the solution in order to encourage them to use this system.

2.2.3 Proposed Model Rules for E-Filing

According to Olson et al. (2003), these rules below may be cited as e-filing rules :

Short title Clear definitions of files Give authority Determine authorized users Give effective date Signature to identify responsible user

2.3

What is Data Mining?

2.3.1 Definition of Data Mining

According to Phyu (2009), data mining is the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large database set.

According to Chen, Han and Yu (1996), data mining which is also referred to as knowledge discovery in databases, means a process of nontrivial extraction of implicit, previously unknown and potentially useful information (such as knowledge rules, constraints, regularities) from data in databases.

Tang, Steinbach and Kumar (2006) stated that data mining is the process of automatically discovering useful information in large database repositories. Data mining techniques are deployed to scour large database in order to find novel and useful patterns that might otherwise remain unknown.

There are also many other terms founded in some articles and journals that carry a similar or slightly different meaning, such as knowledge meaning from databases, knowledge extraction, data archeology, data dredging or data analysis.

2.3.2 Data Mining and Knowledge Discovery

Data mining is an integral part of knowledge discovery in database, which is the overall process of converting raw data into useful information, as shown in Figure 2.1. This process consists of a series of transformation steps, from data preprocessing to postprocessing of data mining results. (Tang et al., 2006)
9

Figure 2.1 : The Process of knowledge discovery in database.

Tang et al. (2006) stated that the input data can be stored in a variety of formats (flat files, spread-sheets, or relational tables) and may reside in a centralized data repository or be distributed across multiple sites. The purpose of preprocessing is to transform the raw input data into an appropriate format for subsequent analysis. The steps involved in data preprocessing include fusing data from multiple sources, cleaning data to remove noise and duplicate observations, and selecting records and features that are relevant to the data mining task at hand. Because of the many ways data can be collected and stored, data preprocessing is perhaps the most laborious and time-consuming step in the overall knowledge discovery process.

Closing the loop is the phrase often used to refer to the process of integrating data mining results into decision support systems. For example, in business applications, the insights offered by data mining results can be integrated with campaign management tools so that effective marketing promotions can be conducted and tested. Such integration requires a postprocessing step that ensures that only valid and useful results are incorporated into the decision support system. Statistical measures or hypothesis testing methods can also be applied during postprocessing to eliminate spurious data mining results.

10

According to Shyu, Chen and Haruechaiyasak (2005), data mining or knowledge discovery in databases has emerged recently as an active research area for extracting implicit, previously unknown, and potentially useful information from large databases mining techniques into the IR context, specifically as the information filtering tools for the recommender system framework.

The overall process for designing and implementing a recommender system is illustrated in Figure 2.2. The process involves the following five steps.

Figure 2.2 : Process for designing and implementing a recommender system (Shyu et al., 2005)

Data Collection: This initial step involves the collection of data sets for executing the data mining algorithms. Three data components are considered: (a) textual content (i.e., index terms or keywords), (b) link structure (embedded hyperlinks within Web pages), and (c) user log records.

Data Preprocessing: This step is required to clean and transform the collected data sets into the formats which are suitable for the data

11

mining algorithms. This step includes the data reduction and selection techniques to improve the efficiency of the data mining algorithms.

Information Filtering via Data Mining: This step is the core process of the recommender system framework, where the data sets are analyzed and the data mining algorithms are applied as the information filtering tools to generate and discover any useful and interesting recommended outputs.

Database Design and Implementation: To improve the efficiency of data and information access and retrieval.

User Interface Design and Implementation: The user interface acts as an intermediary between the users and the recommender system. This step involves the design and implementation of a Web (i.e., HTTP) server which receives the users requests via the WWW, processes the requests by accessing the database, and responds by returning the results to the users. The user interface provides a recommendation function with the user personalization technique by requiring each user to log into the system in order to keep track of the preferences.

2.3.3 Challenges of Data Mining

According to Tang et al. (2006), traditional data analysis techniques have often encountered practical difficulties in meeting the challenges posed by new data sets.

Chen et al. (1996) stated the importance to examine what kind of features an applied knowledge discovery system is expected to have and what kind of challenges may face at the development of data mining techniques.
12

Chen et al. (1996) also provide the list of challenges that will face during development of data mining techniques which is :

a. Handling of different types of data. There are many kinds of data and databases used in different applications. This will cause knowledge

discovery system should be able to perform effective data mining on different kinds of data. Since most available databases are relational, it is crucial that a data mining systems performs effective knowledge discovery on relational data. Besides, most databases contain complex data types, such as structured data and complex data objects, hypertext and multimedia data, spatial and temporal data, transaction data, legacy data and so on. So, powerful system should be able to perform efficient data mining on complex types of data as well. However, data mining system can handle specific kinds of data such as systems dedicated to knowledge mining in relational databases, transaction databases, spatial databases,

multimedia databases and so on in order to face diversity of data types.

b. Efficiency and scalability of data mining algorithms. In order to extract information from a large amount of data in databases, the knowledge discovery algorithms must be efficient and scalable. This is because, the running time of a data mining algorithms must be predictable and acceptable for large databases.

c. Usefulness, certainty, and expressiveness of data mining results. The contents of the database must accurately portray and be useful for certain application in order to discover
13

knowledge. This also encourage a systematic study of measuring the quality of the discovered knowledge, including interestingness and reliability, by construction of statistical, analytical and simulative models and tools.

d. Expression of various kinds of data mining requests and results. Different kinds of knowledge can be discovered from a large amount of data. It is important to discovered knowledge from different views and presents them in different forms. This task requires them to express both the data mining requests and the discovered knowledge in high-level languages or graphical user interfaces so that the data mining process can be specified by none expert, understandable and directly usable by users.

e. Interactive mining knowledge at multiple abstraction levels. A high-level data mining query should be treated as a probe which may disclose some interesting traces for further exploration. Interactive discovery allow users to interactively refine a data mining request, dynamically change data focusing, progressively deepen a data mining process and flexibly view the data and data mining results at multiple abstraction levels from different areas.

f. Mining information from different sources of data. Many sources of data are available through local and widearea computer network, including internet. Mining knowledge from different sources either formatted or unformatted data with diverse data are becomes new challenges to data mining. Data mining may help by come out with simple query systems.

14

g. Protection of privacy and data security. Protecting data security and guarding against the invasion of privacy are important when data viewed from many different angles and at different abstraction levels. The measurement of security can avoid disclosure of sensitive information.

However, these requirements may cause conflict. For example, protection of data security may conflict with the requirements of interactive mining of multiple-level knowledge from different angles.

2.4

Data Mining Techniques

2.4.1 Overview of Data Mining Techniques

According to Garofalakis et al. (1999), data mining techniques describe key data mining algorithms that have been developed for large databases.

Garofalakis et al. (1999) also stated the popular data mining techniques which are association rules, classification and clustering.

2.4.2 Classifying Data Mining Techniques

Chen et. al (1996) stated the kinds of techniques that can be utilized during classification which is :

Type of databases to work on A data mining system can be classified according to the kinds of databases on which the data mining is performed. This is important to identify the data type in order to
15

specific the area that system will perform. For example, a system is a relational data miner if it discovers knowledge from relational data, or an object-oriented one if it mines knowledge from object-oriented databases. In general, a data miner can be classified according to its mining of knowledge from the following different kinds of databases: relational databases, transaction databases, object oriented databases, deductive databases, spatial databases, temporal databases, multimedia databases, heterogeneous databases, active databases, legacy databases, and the Internet information-base.

Type of knowledge to be mined Data miners should identify several kinds of knowledge including association rules, characteristic rules,

classification rules, clustering and deviation analysis. However, this knowledge depends on abstraction level of the databases.

Type of techniques to be utilized Data miners will be categorized according to the underlying data mining techniques and approach. For example, it can be categorized according to the driven method into autonomous knowledge miner, data-driven miner, query-driven miner, and interactive data miner. It can also be categorized according to its underlying data mining approach into generalization based mining, patternbased mining, mining based on statistics or mathematical theories, and integrated approaches, etc.

16

2.4.3 Association Rules

Association rules provide a useful mechanism for discovering correlations among items belonging to customer transactions in a market basket database (Garofalakis et al., 1999). For example : given a database of sales transactions, it is desirable to discover the important associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction.

Chen et. al (1996) stated the problem of mining association rules that composed into the following two steps :

a. Discover the large item sets. b. Use the large item sets to generate the association rules for the database.

It is noted that the overall performance of mining association rules is determined by the first step. After the large item sets are identified, the corresponding association rules can be derived in a

straightforward manner.

Figure 2.3 : The general architecture of Mining Association Rule model (Defit & Md Sap, 2001)
17

Figure 2.3 represents the general architecture of Mining Association Rule (MAR) model. MAR model consists of two main modules, preprocessing and processing module. The first module, pre-processing is used to transform data, identify and remove inconsistent data from databases. Next, processing is executed to generate rules and evaluate the generated rules.

2.4.4 Classification

Data classification is the process which finds the common properties among a set of objects in a database and classifies them into different classes, according to a classification model. (Chen et al., 1996)

Chen et al. (1996) also stated the objective of the classification which is : a. Analyze the training data. b. Develop an accurate description or a model for each class using the features available in the data.

Garofalakis et al. (1999) stated that classification are useful in the Web context to build taxonomies and topic hierarchies on Web pages, and subsequently perform context-based searches for Web pages relating to a specific topic. Decisions tree classifiers are popular since they are easily interpreted by humans and are efficient to build.

18

Figure 2.4 : Hierarchical Classification Process (Khodra & Widyantoro, 2007)

Figure 2.4 shows the hierarchical classification process that consists of two stages: offline stage, and online stage. Offline stage encodes classification scheme metadata for each web page. In online stage, all search results are hierarchically categorized using the classification scheme provided in the metadata of retrieved documents.

Classification scheme is total ordering class from the most general (i.e. root of ontology) to the most specific class (i.e. leaf of ontology). They use Lucene as search engine. They combined Lucene with interactive navigation interface generator that uses this hierarchical structure to present list of search results hierarchically.

19

2.4.5 Clustering

Visnick (2003) defined clustering as a technique to achieve high data density. She classifies clustering into different techniques which is isolate index, object pooling and object modeling that conduct different function.

Chen et al. (1996) defined clustering as the process of grouping physical or abstract objects into classes of similar objects. It helps data miner to construct meaningful partitioning of a large set of objects based on a divide and conquer methodology which decomposes a large scale system into smaller components to simplify design and implementation.

Garofalakis et al. (1999) defined clustering as a useful technique for discovering interesting data distributions and patterns in the underlying data.

Qiu, Davis and Ikem (2004) stated that clustering techniques are heuristic in nature. Almost all techniques have a number of arbitrary parameters that can be adjusted to improve results.

Clustering techniques fall into the following broad categories :

a. Hierarchical vs partitional : Hierarchical techniques produce a nested sequence of partitions, with a single, all inclusive cluster at the top and singleton clusters of individual instances at the bottom. Each intermediate level can be viewed as a combination of two clusters from the next lower level, or a split of a cluster from the next higher level into two. Partitional (or non-nested) techniques create a one-level partitioning of the data instances. After the user specifies the desired number of clusters, a partitional approach typically
20

finds all clusters at once. This is in contrast to traditional hierarchical schemes, which bisect a cluster to get two clusters or merge two clusters to get one.

b. Divisive

vs

agglomerative

Hierarchical

clustering

techniques proceed either from the top to the bottom or from the bottom to the top, i.e. clustering starts with one large cluster and splits it, or starts with clusters each containing a point and then merges them.

c. Incremental

vs

non-incremental

Some

clustering

techniques work with one instance at a time and decide how to place it into an appropriate cluster, but most clustering techniques are non-incremental, using information about all the instances at once to form clusters.

Typical pattern clustering activity involves the following steps (Jain, Murty and Flynn, 2000) : Pattern representation (optionally including feature

extraction and/or selection) Definition of a pattern proximity Measure appropriate to the data domain Clustering or grouping Data abstraction (if needed), and Assessment of output (if needed).

Figure 2.5 : Stages in clustering (Jain et al., 2000)

21

Figure 2.5 depicts a typical sequencing of the first three of these steps, including a feedback path where the grouping process output could affect subsequent feature extraction and similarity computations. Pattern representation refers to the number of classes, the number of available patterns, and the number, type, and scale of the features available to the clustering algorithm. Some of this information may not be controllable by the practitioner. Feature selection is the process of identifying the most effective subset of the original features to use in clustering. Feature extraction is the use of one or more transformations of the input features to produce new salient features. Either or both of these techniques can be used to obtain an appropriate set of features to use in clustering. Pattern proximity is usually measured by a distance function defined on pairs of patterns. A variety of distance measures are in use in the various communities. A simple distance measure like Euclidean distance can often be used to reflect dissimilarity between two patterns, whereas other similarity measures can be used to characterize the conceptual similarity between patterns. The grouping step can be performed in a number of ways. The output clustering (or clusterings) can be hard (a partition of the data into groups) or fuzzy (where each pattern has a variable degree of membership in each of the output clusters).

Hierarchical clustering algorithms produce a nested series of partitions based on a criterion for merging or splitting clusters based on similarity. Partitional clustering algorithms identify the partition that optimizes (usually locally) a clustering criterion.

2.5

Selecting Data Mining Techniques

It is important for the researcher to select a suitable searching method using data mining techniques in order to accomplish the objective. Researcher decided to review three main data mining techniques which are classification,
22

association and clustering. These techniques deliver the same objective of data mining, but different in terms of their function and suitability for the system.

According to Tang et al. (2006), data mining is a technology that blends traditional data analysis methods with sophisticated algorithms for processing large volumes of data. It has also opened up exciting opportunities for exploring and analyzing new types of data and for analyzing old types of data in new ways.

Classification, which is the task of assigning objects to one of several predefined categories, is a pervasive problem that encompasses many diverse applications. Examples include detecting spam email messages based upon the message header and content, categorizing cells as malignant or benign based upon the results of MRI scans and classifying galaxies based upon their shapes. (Tang et al., 2006)

Association is useful for discovering interesting relationships hidden in large data sets. The uncovered relationships can be represented in the form of association rules or sets of frequent items. Besides, many business enterprises accumulate large quantities of data from their day-to-day operations. For example, huge amounts of customer purchase data are collected daily at the checkout counters of grocery stores. Retailers are interested in analyzing the data to learn about the purchasing behavior of their customers. Such valuable information can be used to support a variety of business-related applications such as marketing promotions, inventory management and customer relationship management. Association techniques will discover the patterns from a large transaction data and evaluating the discovered patters in order to prevent the generation of spurious results. (Tang et al., 2006)

Cluster divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should capture the natural structure of the data. The concept of clustering has been around for a long time. It has several applications, particularly in the context of information
23

retrieval and in organizing web resources. The main purpose of clustering is to locate information and in the present day context, to locate most relevant electronic resources. In database management, data clustering is a technique in which, the information that is logically similar is physically stored together. In order to increase the efficiency of search and the retrieval in database management, the number of disk accesses is to be minimized. In clustering, since the objects of similar properties are placed in one class of objects, a single access to the disk can retrieve the entire class. If the clustering takes place in some abstract algorithmic space, we may group a population into subsets with similar characteristic, and then reduce the problem space by acting on only a representative from each subset. Clustering is ultimately a process of reducing a mountain of data to manageable piles. For examples, analyze the large amounts of genetic information that are now available, group the search result into a small number of clusters, identify different types of depression and to segment customers into a small number of groups for additional analysis and marketing activities. (Ravichandra, 2003)

However, it is important for the researcher to identify suitability of each technique in order to implement the good searching method. Researcher reviewed the techniques based on their definition, concept, functions, suitability and examples given in several journals. (Refer Table 2.1)

24

Table 2.1 : Differences of Classification, Association and Clustering techniques


DM Techniques

Classification
Differences

Association

Clustering

Data classification is the process which finds the common properties among a set of Definition objects in a database and classifies them into different classes, according to a classification model. (Chen et al., 1996)
Classification, which is the task of assigning objects to one of several predefined

Association rules Clustering as provide a useful mechanism for discovering correlations among items belonging to customer transactions in a market basket database (Garofalakis et al., 1999) the process of grouping physical or abstract objects into classes of similar objects. (Chen et al., 1996)

Association is useful for discovering interesting relationships hidden in large data sets. (Tang et al., 2006)

Cluster divides data into groups (clusters) that are meaningful, useful, or both. (Ravichandra, 2003)

Concept

categories, is a pervasive problem that encompasses many diverse applications.


(Tang et al., 2006)

25

DM Techniques

Classification
Differences

Association It will discover the patterns from a large

Clustering
It helps data miner to construct meaningful partitioning of a large set of objects based on a divide and conquer methodology which decomposes a large scale system into smaller components to simplify design and implementation. (Chen et al., 1996)

Classification is useful in the Web context to

build taxonomies transaction data and topic hierarchies on Web pages, and subsequently perform contextbased searches Functions for Web pages relating to a specific topic. (Garofalakis et al., 1999) and evaluating the discovered patters in order to prevent the generation of spurious results. (Tang et al., 2006)

Develop an accurate description or a model for each class using the Suitability features available in the data. (Chen et al., 1996)

Discovering correlations among items belonging to customer transactions in a market basket database for market analysis (Garofalakis et al., 1999).
26

Increase the efficiency of search and the retrieval in database management. (Ravichandra, 2003)

DM Techniques

Classification
Differences

Association

Clustering

Detecting spam email messages based upon the message header and content, categorizing cells as malignant or benign based upon the results of MRI scans Examples and classifying galaxies based upon their shapes. (Tang et al., 2006)

Huge amounts of Analyze the customer large amounts

purchase data are of genetic collected daily at the checkout counters of grocery stores. Retailers are interested in analyzing the data to learn about the purchasing behavior of their information that are now available, group the search result into a small number of clusters, identify different types of depression and to segment

customers. (Tang customers into et al., 2006) a small number of groups for additional analysis and marketing activities. (Ravichandra, 2003)

According to the comparison above, after reviewing each technique based on their definition, concept, functions, suitability and examples given by several journals, researcher found that clustering is the suitable searching method for e-filing web-based system.

27

Although Classification, Association and Clustering have similarity in terms of information retrieval, but there are differences regarding how the information retrieved, analyzed and delivered. Classification assigning objects to several predefined categories in order to develop a model for each data using the features available in the data. Association is useful to discover correlations among data in order to identify interesting relationships hidden in large data sets especially for market analysis. However, clustering groups the physical or abstract objects into list of similar objects to provide simplified list of data. In other words, it divides data into groups that have similarity, meaningful and useful.

This is because, partitioning of a large set of data by clustering will decompose a large search result into smaller components to simplify the content. It helps user to review accurate search result that fulfill their needs and expectation.

In terms of suitability, clustering increase the efficiency of search and the retrieval of information in database management (Ravichandra, 2003). It analyzed the search result to identify similarity between the results and provide simplified list of results. Further information regarding why clustering is suitable for searching method in e-filing web-based system are discussed in Chapter 5 (Result and Findings).

28

2.6

Selecting Data Mining Tools

Data mining tools are used widely to solve real-world problems in engineering, science and business. (Abbott, Matkovsky & Elder, 1998)

Nowadays, numbers of data mining tools are increases and it has become more challenges in order to select effective tools. The data mining tool market has become more crowded in recent years, with more than 50 commercial data mining tools as stated at the KDNuggets website (http://www.kdnuggets.com). KDnuggets.com is the Data Mining

Communitys Top Resource since 1997 for data mining and analytics news, tools, jobs, courses, data and more.

Collier, Carey, Sautter and Marjaniemi (1999) proposed four categories of criteria for selecting from among the assortment of commercially available data mining tools which is :

a. Performance As per Table 2.2 is the ability to handle a variety of data sources in an efficient manner. From a computational perspective, hardware configuration has a major impact on tool performance. Besides, some data algorithms are more efficient than others. However, this category focuses on the qualitative aspects of a tools ability to easily handle data under a variety of hardware configuration. The criteria that should consider in this task are platform variety, software architecture, heterogeneous data access, data size, efficiency, interoperability and robustness.

29

Table 2.2 : Computational Performance Criteria (Collier et al., 1999)


Criteria Platform Variety Software Architecture Heterogeneous Data Access Data Size Efficiency Interoperability Robustness Description Does the software run on a wide-variety of computer platforms? More importantly, does it run on typical business user platforms? Does the software use client-server architecture or a stand-alone architecture? Does the user have a choice of architectures? How well does the software interface with a variety of data sources (RDBMS, ODBC, CORBA, etc)? Does it require any auxiliary software to do so? Is the interface seamless? How well does the software scale to large data sets? Is performance linear or exponential? Does the software produce results in a reasonable amount of time relative to the data size, the limitations of the algorithm, and other variables? Does the tool interface with other KDD support tools easily? If so, does it use a standard architecture such as CORBA or some other proprietary API? Does the tool run consistently without crashing? If the tool cannot handle a data mining analysis, does it fail early or when the analysis appears to be nearly complete? Does the tool require monitoring and intervention or can it be left to run on its own?

b. Functionality There are variety of capabilities, techniques, and methodologies for data mining (Table 2.3). In order to know well the tool adapt to different data mining problem, software functionality will help to solve it. The criteria in functionality aspect are algorithm variety, prescribed methodology, model validation, data type flexibility, algorithm modifiability, data sampling, reporting, model exporting, user interface, learning curve, user types, data visualization, error reporting, action history and domain variety.

Table 2.3 : Functionality Criteria (Collier et al., 1999)


Criteria Algorithmic Variety Description Does the software provide an adequate variety of mining techniques and algorithms including neural networks, rule induction, decision trees, clustering, etc.? Does the software aid the user by presenting a sound, step-by-step mining methodology to help avoid spurious results? Does the tool support model validation in addition to model creation? Does the tool encourage validation as part of the methodology? Does the implementation of the supported algorithms handle a widevariety of data types, continuous data without binning, etc.? Does the user have the ability to modify and fine-tune the modeling algorithms? Does the tool allow random sampling of data for predictive modeling? Are the results of a mining analysis reported in a variety of ways? Does the tool provide summary results as well as detailed results? Does the tool select actual data records that fit a target profile? After a model is validated does the tool provide a variety of ways to export the tool for ongoing use (e.g., C program, SQL, etc.)?

Prescribed Methodology Model Validation Data Type Flexibility Algorithm Modifiability Data Sampling Reporting

Model Exporting

30

c. Usability Different level and types of user will cause usability (Table 2.4). One problem with easy-to-use mining tools is their potential misuse. The criteria should consider are data cleansing, value substitution, data filtering, binning, deriving attributes,

randomization, record deletion, handling blanks, metadata manipulation and result feedback.

Table 2.4 : Usability Criteria (Collier et al. 1999)


Criteria User Interface Learning Curve User Types Description Is the user interface easy to navigate and uncomplicated? Does the interface present results in a meaningful way? Is the tool easy to learn? Is the tool easy to use correctly? Is the tool designed for beginning, intermediate, advanced users or a combination of user types? How well suited is the tool for its target user type? How easy is the tool for analysts to use? How easy is the tool for business (end) users to use? How well does the tool present the data? How well does the tool present the modeling results? Are there a variety of graphical methods used to communicate information? How meaningful is the error reporting? How well do error messages help the user debug problems? How well does the tool accommodate errors or spurious model building? Does the tool maintain a history of actions taken in the mining process? Can the user modify parts of this history and re-execute the script? Can the tool be used in a variety of different industries to help solve a variety of different kinds of business problems? How well does the tool focus on one problem domain? How well does it focus on a variety of domains?

Data Visualization Error Reporting

Action History Domain Variety

Data mining tools is costly and generally accompanied by moderately step learning. Selection of the wrong tool is expensive both in terms of waste money and time. These categories for selecting data mining tools will help practitioners avoid spending much time only to discover that a particular tool does not provide the necessary solution. (Collier et al., 1999)

Bialynicka (2008) stated that there are data mining tools that suite with clustering which are : Scatter Grouper Carrot Vivisimo
31

Scatter is designed for browsing that support online clustering based on two novel clustering algorithms which are buckshot and fractionation. Buckshot fast for online clustering and fractionation is accurate for offline initial clustering of the entire set. (Bialynicka, 2008)

Grouper is suitable for online purposes that operate on query result snippets. It will cluster together documents with large common subphrases. (Bialynicka, 2008)

Carrot is component framework that allows substituting components for input (from other search engines), filter (stemming, distance measure and clustering) and output the result. (Bialynicka, 2008)

Vivisimo is the commercial online clustering that support hierarchical and conceptual clustering techniques. (Bialynicka, 2008)

However, for this research project, researcher used free tools that available for learning purposes which is Carrot. Carrot2 is an open source search results clustering engine. It can automatically organize small collections of documents, e.g. search results, into thematic categories. (Carrot, 2010) Apart from two specialized document clustering algorithms, Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, MSN Live API, eTools Meta Search, Lucene, SOLR, Google Desktop and more. Carrot2 is implemented in Java, but it easily integrates with non-Java software, such as PHP, Ruby or C#. (Carrot, 2010)

32

2.7

Summary
This chapter provides overview of e-filing and data mining techniques based on the literature review from several journals. Rules in e-filing, overview of data mining and challenges in data mining are discussed. Researcher also reviews three basic data mining techniques which are classification, association and clustering. After that, researcher come out with comparison between them and selects the suitable data mining techniques for searching method in e-filing web-based system (Refer Table 2.1). Based on the comparison in Table 2.1, researcher found that clustering is the suitable searching method for e-filing web-based system. Besides, for this research project, researcher used free tools that available for learning purposes which is Carrot (open source search results clustering engine) after review several journals regarding data mining tools.

The next chapter discusses the research approach and the methodology for the research project.

33

CHAPTER 3

RESEARCH APPROACH AND METHODOLOGY

3.1

Introduction

This chapter describes the methodology and approaches that were used in the research from problem identification until development of the system. To achieve the objective of this project, the right approach must be applied for best conclusions. This research used five major steps to start developing prototype of e-filing web-based system using data mining techniques. It consists of problem identification and planning, requirement gathering, requirement analysis, design model and develop prototype. The overview of this methodology can be shown below in Figure 3.1.

Figure 3.1 : Overview of Research Approach and Methodology


34

3.2

Problem Identification and Planning

This phase will identify the goal, scope, budget, schedule, technology and system development process, methods and tools to ensure that everything are in right place. However, it depends to what researcher wants to plan according to the stakeholder requirement.

Before start to plan the projects planning, the researcher should know the current situation and problem that the old system have. An understanding of potential problems is the main process to make the development successful. After the researcher identifies the problems, scope of the project is defined. The goal must be determined and the objectives of the project must solved on the problems that have been identified. After analyzing all the problems and identifying what task need to be done, a measurable and achievable project plan is schedule using a Microsoft Project tool.

For this research, Microsoft Project is used to produce Gantt Chart (Refer Appendix A- Project Planning) as a guideline for researcher in order to finish the project. Besides, this phase involves list of steps which is :

a. Discuss the current problem with staff at Majlis Daerah Kerian The current problems for this research need to identify in order to solve the problem in the next task.

b. Identify goal, objective, scope, and significance of research The goal, objective, scope and significance of research need to be clearly defined.

c. Plan related task Plan the related task using Microsoft Project to schedule all the planning. Time must be allocated carefully and entire task must be stated to ensure the completion of the research.
35

3.3

Requirement Gathering

Requirement gathering is the process to gather all information that is needed to develop the system. In this analysis phase, a method of data collection has been applied. This phase is to identify some of the concept and

requirement that will be required and apply in developing the e-filing webbased system. For this research, there are two types of data collection which are :

3.3.1 Primary Data


Primary data is about gathering requirement from the original resource such as interviews, questionnaire and observation. For this research, the researcher used data from the interview with staff at Majlis Daerah Kerian. Interviewing is a technique used to gain detailed information regarding the related subject of interest of this research. This includes software and hardware used and also the problem that arises in current system so that requirements identified.

Table 3.1 below shows the information of people that involved in interview session for gathering requirement of e-filing web-based system.

Table 3.1 : Information of people that involve in interview Respondent Name Mr. Gobibaskaran A/L Govindaraju Department Head of IT Department, Majlis Daerah Kerian. Administrative Assistant, Majlis Daerah Kerian.

Puan Shalina Mat Piah

36

The main advantages of interviews are that the answer of the interviewees is more spontaneous without an extended reflection. This can be done by using a top down approach where the interviewer starts with a general question and progress to specific question about task. Interviews should plan in advance by defining a set of interview questions to be asked. This does not only assist in ensuring consistency between interviews conducted with different interviewees but also help to focus on the purpose of the interview session.

The deliverable of this activity is an identified requirement that needed for e-filing web-based system.

3.3.2 Secondary Data

The secondary data for this research is about data collection through many resources such as articles, journals, books and other related academic publication information about e-filing and data mining. It is important to gain deeper understanding to e-filing and data mining.

3.4

Requirement Analysis

This is the next stage after all data has been collected from the requirement gathering phase. The primary data collected is needed to be analyzed to define the system requirement for developing e-filing web-based system. The collected data need to be studied and analyzed properly in order to have accurate, reliable and relevant information during the development. This entire requirement helped researcher to identify the use case that produce system functions and finally researcher come out with Software Requirement Specification (SRS) documentation.

37

Besides, secondary data that collected during requirement gathering phase is useful to identify suitable searching method using data mining techniques. Researcher made comparison between three popular techniques (association, classification and clustering) in data mining in order to identify suitable searching method from selected data mining techniques. Researcher finally comes out with suitable searching method using data mining techniques. The tool used during this phase is Rational Rose.

3.5

Design Model
The model will be designed and determine before proceeding with the actual construction of the database and system. System interface, classes, objects and their relation will be designed using Rational Rose. The entire related diagram to this research that includes class diagram, use case, sequence diagram will be designed based on the result from the requirement analysis phase.

After all the objects and classes are illustrated clearly with its attributes and methods, a development of database was conducted. This activity is accomplished by using MySQL database. At the end of this activity, a detailed design (database model) is produced. The deliverable of this phase has been documented in Software Design Document (SDD).

3.6

Develop Prototype

Develop prototype is related with building the application of the system using the appropriate development technologies. In this phase, researcher will develop the prototype of e-filing web-based system using data mining techniques. The Apache is use as a web server, MySQL database as a database server, and PHP programming language as the platform of the development. In order to write programming code, Dreamweaver is used as a
38

workspace and Carrot as a data mining tool. At the end of this phase, e-filing prototype system using data mining technique will be produced.

3.7

Summary

The research methodology describes the research strategy that is used in this research project. For this research project, a plan of action is laid out that shows how the problem will be investigated, what information will be collected using which method and how this information will be analyzed to come to the conclusion. It consists of problem identification and planning, requirement gathering, requirement analysis, design model and develop prototype.

The methodology stated above was followed to develop the e-filing webbased system in order to achieve the projects objectives as well as to fulfill requirements specified by the user. With understandable and

achievable methodology, the project is carried out in a proper manner that consequently completed effectively.

The next chapter discusses the construction for the research project.

39

CHAPTER 4

PROTOTYPE CONSTRUCTION

4.1

Introduction

This chapter explained about the construction of prototype in depth and details in developing the project development of the e-filing web-based system. It explains on the result and ways it achieves the project objectives.

4.2

Software Requirements

Specified below is the list of software tools that are selected during the development process. These include operating system and other applications that are compulsory for the system to be developed and deployed.

4.2.1 Software Tools


Table 4.1 : Software Tools Specifications No. 1. 2. 3. 4. 5. 6. 7. 8. Software Windows XP SP2 MySQL PHP Apache Rational Rose Enterprise Edition Type

Operating System (OS) Database Server Programming Platform Web Server Unified Modeling Language Software Adobe Photoshop CS3 Graphics Design Software Macromedia Dreamweaver MX 2004 Workspace Software Carrot Open source framework for building search clustering engines

40

4.2.2 Software Tools Installation

Referring to Table 4.1, the installation of the three basic tools related which is Apache, MySQL Server version 5, Rational Rose Enterprise Edition, Adobe Photoshop CS3, Macromedia Dreamweaver MX 2004 are explain further as the following.

a.

Apache The Apache HTTP Server, commonly referred to as Apache is web server software notable for playing a key role in the initial growth of the World Wide Web. In 2009 it became the first web server software to surpass the 100 million web site milestone. Apache was the first viable alternative to the Netscape Communications Corporation web server (currently known as Sun Java System Web Server), and has since evolved to rival other Unix-based web servers in terms of functionality and performance. Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from serverside programming language support to authentication schemes. Some common language interfaces support Perl, Python, Tcl, and PHP. Apache provides a variety of MultiProcessing Modules (MPMs) which allow Apache to run in a processbased, hybrid (process and thread) or event-hybrid mode, to better match the demands of each particular infrastructure. This implies that the choice of correct MPM and the correct configuration is important. Where compromises in

performance need to be made, the design of Apache is to reduce latency and increase throughput, relative to simply handling more requests, thus ensuring consistent and reliable processing of requests within reasonable time-frames. (Apache, 2002)
41

b.

MySQL Version 5 MySQL is the world's most popular open source database software, with over 100 million copies of its software downloaded or distributed throughout its history. With its superior speed, reliability, and ease of use, MySQL has become the preferred choice for Web, Web 2.0, SaaS, ISV, Telecom companies and forward-thinking corporate IT Managers because it eliminates the major problems associated with downtime, maintenance and administration for modern, online applications. (MySQL, 2009)

MySQL server is chosen as the storage for the data in E-Filing web-based system because of its consistency, fast

performance, high reliability and ease of use. The researcher only need to follow all the instruction on the wizard until the installation process is completed. Once the installation is completed, MySQL Server Version 5 can be used in the development of E-Filing web-based system.

c.

Rational Rose Enterprise Edition According to IBM Corporation (2006), Rational Rose enables the creation of the following types of UML based diagrams: activity diagrams, class, component, deployment, sequence, state chart, use case, collaboration, physical storage and deployment, and physical data and tables.

Researcher used Rational Rose Enterprise Edition to create UML modeling for e-filing web-based system. It consists of use case diagram, sequence diagram and class diagram for efiling web-based system.

42

d.

Adobe Photoshop CS3 Photoshop CS3 is part of Adobes Creative Suite (along with a host of other products such as Illustrator). It is Adobes flagship bit map editor, and a professional level editor for fine art photography there is no viable alternative. Photoshop is the industry standard because of its flexibility and extensibility (it supports a wide range of third-party plug-ins), its support for color management, and the robustness of its tools. (Levy, 2007)

Researcher used Adobe Photoshop CS3 to design the interface of E-Filing web-based system that consists of header, logo and systems layout.

e.

Macromedia Dreamweaver MX 2004 Dreamweaver is a powerful web page creation and web site management tool. It offers numerous, sophisticated functions that can be used to create professional quality web sites. Because of this, its one of the most popular web authoring tools among web designers. (San Diego State University, 2004)

Researcher used Macromedia Dreamweaver MX 2004 as the workspace software in order to develop coding using PHP language for E-Filing web-based system.

f.

Carrot According to Carrot (2010), Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small
43

collections of documents, e.g. search results, into thematic categories.

Apart from two specialized document clustering algorithms, Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, MSN Live API, eTools Meta Search, Lucene, SOLR, Google Desktop and more. Besides, Carrot2 is implemented in Java, but it easily integrates with non-Java software, such as PHP, Ruby or C#.

Researcher used Carrot which is open source framework to build a search results clustering engine. It will organize the search results into topics, fully automatically and without external knowledge such as taxonomies or reclassified content.

4.3

Hardware Requirements

In developing and deploying e-filing web-based system, the minimum hardware requirement that project needed is standard personal computer with Intel or AMD processor, standard motherboard, 80 GB hard disk and 512MB DDRAM memory. No additional external device is needed for this project.

4.4

Development Phase

Based on research methodology depicts in Figure 3.1, system construction process involved in last 3 phases of research methodology, which are Requirement Analysis, Design and Development phase. Each process involved in mentioned phase is explained further below.

44

4.4.1 Requirement Analysis Phase

In this construction process, the researcher analyzed the requirement in more detail. The researcher illustrated use case diagram using Rational Rose Software which focused on high level view that concentrated on a user-centered view of the system. This is to analyze class diagram which is the primary model for describing the internal structure and behavior of the project system. Furthermore, each use case is described thoroughly that stated the flows involved within it as well as the production of sequence diagram are also taken placed. As a result, a summary of requirements for development of E-Filing webbased system is fully constructed. For details on the requirement, please refer Appendix D: Software Requirement Specification (SRS).

4.4.2 Design Phase


The design phase is concerned with specifying the e-filing web-based system that will meet the requirements. The design of this project takes place at two main levels, which is system design and detailed design.

a.

System Design

System design is focuses on architectural aspects that affect the entire system (Bennett, McRobb & Farmer, 2006). The system design of e-filing web-based system involved setting of standard such as the design of the human computer interface, the development of coding standard are specified, and the suitable database management for data storage is selected. This project uses the MySQL as the database management and PHP as a programming language.
45

b.

Detailed Design

Detailed Design is addresses the design of classes and the detail working of this project system. It was based on the requirement designed in the Software Requirement

Specification (SRS) that follows object-oriented design approach. In an object-oriented system, the detailed design is concerned the design of objects. Object Design is mainly concerned with the specification of attributes types, how operations function, and how objects are linked to other object (Bennett et al., 2006). For details description of class diagram, please refer Appendix E: Software Design Document (SDD).

4.4.3 Development Phase

In this development phase, a series of development tasks were performed during this phase. It consists of constructing database establishing its connection and coding task. These tasks are explained further as below.

a.

Coding

This task was concurrently done with the enhancement of the interfaces. The necessary codes were added in the programs to enable the interfaces to function correctly. Figure 4.1 shows one of the coding segments that has been constructing during development using Macromedia Dreamweaver MX 2004.

46

Figure 4.1 : Coding index.php

b.

Data Mining Techniques

This task was concurrently done with the enhancement of the e-filing web-based system with searching method using data mining techniques. Clustering selected as the suitable data mining techniques for searching method. Researcher used Carrot which is open source framework for building search clustering engines. The necessary codes were added in the system to cluster search results.

c.

Interface

Figure 4.2 shows the main page of the system. This page appear after the authorize user (staff) enter into the system. This page shows the list of menu for staff to handle the system. (Refer Appendix F Description of Interface System)

47

Figure 4.2 : The main page interface of e-filing

4.5

Summary

This chapter explained about the construction of the system in details in developing the E-Filing web-based system. Researcher reviews the list of software tools that are selected during the development process. These include operating system and other applications that are compulsory for the system to be developed and deployed which is Dreamweaver MX 2004, Apache, MySQL, Rational Rose and Carrot. Besides, researcher comes out with the minimum hardware requirements in developing and deploying EFiling web-based system. In the development phase, researcher reviews a series of development tasks that were performed. It consists of requirement analysis, design and development phase.

The next chapter discusses the result and findings for the research project.

48

CHAPTER 5

RESULT AND FINDINGS

5.1

Introduction

This chapter will explain how the collected data is organized, analyzed and finalized to be used in the development phase of the research. The result of the research that has been conducted will be explained in depth in this chapter. It includes the findings and result gathered from the interviews and discussions.

5.2

Interview Results

In order to generate good interview question, researcher follows a model for navigating interview processes in requirements elicitation (Refer Figure 5.1).

Figure 5.1 : A Model for Navigating Interview Processes in Requirements Elicitation


49

In developing a Software Requirements Specification (SRS) of good quality, it is quite important to correctly elicit requirements from stakeholders. The interview session has been conducted with Encik Gobibaskaran A/L Govindaraju, the Head of Information Technology at Majlis Daerah Kerian and Puan Shalina Mat Piah, the Administrative Assistant at Majlis Daerah Kerian. The interview questions are categorized into two categories. The first category focused more on the current problems faced by staffs in Majlis Daerah Kerian. All the necessary data from the current problems has been collected through this category. The second category is focusing on the functional requirement for the system to be developed. The sample interview question can be found in Appendix C.

5.2.1 Current Problems

Interviewee : Puan Shalina Mat Piah, Administrative Assistant, Majlis Daerah Kerian.

The results gained from the first category of the interview questions are presented in the Table 5.1 below.

Table 5.1 : The problems that have been identified from the interviews. Problem PQ.1 Researcher Is the current manual system easier and comfortable to you? PQ.2 Please describe the current system in regarding the manual managing and searching files. Involve many step : Searching suitable number of file that required by using log book. Determine file name by using file
50

Interviewee No

number. Check needed file on many big shelf that required long time. Surveying on each staffs table or other department in Majlis Daerah Kerian if the file not have on the shelf. PQ.3 Is it easy to identify the suitable files manually according to your requirement? PQ.4 Why you think it is not easy to identify the suitable files manually? Difficult to search the suitable files. Difficult to know status of the files. Required long time. There are thousands of files on the shelf. Sometimes, there are interchanges of files between departments. PQ.5 In your opinion, is it important for MDK to have web-based system that will act as information center for staff to gather information about the status of the files?
51

No

Yes, of course

5.2.2 Functional Requirements

Interviewee : Encik Gobibaskaran A/L Govindaraju, Head of IT Department, Majlis Daerah Kerian.

Apart from that, the second category of the interview is focusing more on the functional requirement of the system. The requirements and suggestions gathered from the interviews are represented in the Table 5.2 below.

Table 5.2: The requirement and suggestion that had been identified from the interviews Requirement RQ.1 Researcher How many users required involving in the system? RQ.2 What do you think E-Filing web-based system should have? Stored general staffs information. Stored files information. Stored status and location of the files. Implement automated searching to identify suitable files. RQ.3 What is the rule for Administrator, Manager and Staff in the system?
52

Interviewer Three users which is Administrator, Manager and Staff

Admin : handle user account, view and delete files.

Manager : handle user information, maintain files and delete staff. Staff : handle user

information and maintain files. RQ.4 What is your suggestion about the language to develop the system? RQ.5

Use the open source language that suite with any platform such as PHP programming.

What is your suggestion about the database to develop the system?

MySQL database

Based on the Table 5.2, several processes for the system are identified. This requirement is all about system functionality for efiling web-based system. This requirement is collected and analyzed to produce the new system.

53

5.3

Use Case Diagram

Maintain User Account

View Files Admin

Delete Files

Validate User

Staff

Maintain User Information

Manager

Maintain Files Information

Maintain Customer Information

Delete Staff

Figure 5.2 : Use Case Diagram for E-Filing web-based system

Referring to Figure 5.2 above, it shows the use case diagram for e-filing webbased system. This use case illustrated the functionality for the administrator, manager and staff. First, the admin, manager and staff must login into the system. They must registered first before can use the system. Upon they have login into the system, admin can maintain user account, view files and delete files. Manager can maintain user information, maintain files information, maintain customer information and delete staff. Staff can maintain user information, maintain files information and maintain customer information.

54

The description about the use cases is described in Table 5.3.

Table 5.3 : Description of Use Case diagram Use Cases Description Maintain User Account use case is used by Maintain User Account administrator to update and delete users account that used the system. View Files use case is used by administrator to View Files view files from all departments in Majlis Daerah Kerian. Delete Files use case is used by administrator Delete Files to delete files from all departments in Majlis Daerah Kerian. Validate User Validate User use case is used by administrator, manager and staff to login into the system. Maintain User Information is used by manager Maintain User Information and staff for their registration and update their information. Maintain Files Information is used by manager Maintain Files Information and staff to add new files, update files and delete files. Maintain Customer Information Maintain Customer Information is used by manager and staff to add new customer, update customer and delete customer. Delete Staff is used by manager to delete their staff that not belonging to their department.

Delete Staff

55

5.4

Class Diagram

<<entity>> advisor <<boundary>> advisor_form advisor_no advisor_ic advisor_name advisor_hp advisor_email <<control>> advisor_control set_advisor_detail() set_advisor_update() <<PK>> advisor_no advisor_ic advisor_name advisor_hp advisor_email dept_name add_advisor() update_advisor() display_advisor() validate 1 1 <<entity>> file <<PK>> file_id file_name file_status file_remark open_date update_date staff_no dept_name 0..* 1 <<boundary>> staff_form staff_no staff_ic staff_name staff_add1 staff_add2 staff_city staff_postcode staff_state staff_hp staff_email dept_name advisor_no 0..* <<entity>> staff <<PK>> staff_no staff_ic staff_name staff_add1 1 staff_add2 staff_city staff_postcode staff_state staff_hp staff_email dept_name advisor_no 1 add_staff() update_staff() delete_staff() display_staff() manage add_files() update_files() delete_files() display_files() 0..n <<control>> file_control search_files() set_file_detail() set_file_update() remove_file()

<<entity>> login <<boundary>> login_form user_name user_password <<control>> login_control set_user_update() remove_user() validate_user() <<PK>> user_name user_password user_id user_level user_dept update_user() delete_user() display_user()

<<boundary>> file_form file_id file_name file_status file_remark open_date update_date staff_no dept_name

manage

validate 1 <<control>> staff_control search_staff() set_staff_detail() set_staff_update() removeStaff()

have

manage

1 <<entity>> customer <<PK>> cust_id file_id cust_ic cust_name cust_add1 cust_add2 cust_city cust_postcode cust_state cust_phone staff_no add_cust() update_cust() delete_cust() display_cust() <<boundary>> customer_form <<control>> customer_control search_cust() set_cust_detail() set_cust_update() remove_cust() file_id cust_ic cust_name cust_add1 cust_add2 cust_city cust_postcode cust_state cust_phone staff_no

0..*

Figure 5.3 : Class Diagram for E-Filing web-based system

Referring to Figure 5.3, it is a class diagram for e-filing web-based system. The class diagram is a type of static structure diagram of the system. It shows the system's classes, their attributes, and their relationships between the classes.

56

5.5

Clustering as the Suitable Searching Method

5.5.1 Introduction

For this research project, it is important for the researcher to select the suitable searching method using data mining techniques. Researcher decided to review three main data mining techniques which are classification, association and clustering. These techniques deliver the same objective of data mining, but different in terms of their function and suitability for the system.

Researcher reviewed the techniques based on their definition, concept, functions, suitability and examples given in several journals. (Refer Table 2.1 in Chapter 2-Literature Review, page 25)

According to the comparison in Table 2.1, after reviewing each technique based on their definition, concept, functions, suitability and examples given by several journals, researcher found that clustering is the suitable searching method for e-filing web-based system.

5.5.2 Why Clustering Search Result

This decision supported by several journals that stated clustering as the suitable searching method. According to Zhang, Zie and Wu (2006), clustering will cluster the search results that can help users find the results in several clustered collections, so it is easy to locate the valuable search results that the users really needed.

57

Aliakbary, Khayyamian and Abolhassani (2008) stated that clustering search results helps the user to overview returned results and to focus on the desired clusters. Most of search result clustering methods use title, URL and snippets returned by a search engine as the source of information for creating the clusters.

According to Lipai (2008), clustering search tools results means grouping them into object classes which are constructed using the search results characteristics, with the purpose of simplifying the users work to retrieve the information it needs, helping him to find faster better quality results.

Bialynicka (2008) stated that, clustering will organize search result into groups, so that different groups correspond to different user needs. This is because, flanked list is not enough and documents pertaining to different topics cannot be compared. Besides, there are relationships between the results that can be utilized in order to cluster the search results.

5.5.3 Examples of Clustering Search Result

Jasco (2007) gives example the useful of clustering techniques in search result list. Figure 5.3 below shows googles one dimensional result list without clustering techniques. By using clustering search result keywords, google gives about 15,500,000 list of result which is large and difficult to choose.

58

Figure 5.4 : Googles One Dimensional Result List

Figure 5.4 below shows the good search result list with clustering technique. By using clustering search result keywords same as Figure 5.3 above, it gives about 194 list of result only which is more accurate, simple and easy to choose.

Figure 5.5 : Good clustering result list

59

Figure 5.5 below shows the search result list with clustering technique that available in the World Wide Web (http://search.carrot2.org).

Figure 5.6 : Good clustering result list from http://search.carrot2.org

60

5.5.4 Clustering Search Result from e-filing web-based system

Figure 5.7 below shows the search result list with clustering technique that available in e-filing web-based system.

Figure 5.7 : Good clustering result list from e-filing web-based system

Figure 5.8 below shows the data mining tool provided by Carrot which is the open source framework for building search clustering engines. The necessary codes were added in the system to cluster search results.

61

Figure 5.8 : Data Mining Tool by Carrot

5.6

Summary
On this chapter, researcher explained how the collected data is organized, analyzed and finalized to be used in the development phase of the research. Researcher analyzed interview results with two staffs in Majlis Daerah Kerian in terms of their current problems and functional requirements for efiling web-based system. Besides, researcher also discussed the reasons why clustering is selected as the suitable searching method for e-filing web-based system. Researcher comes out with several journals, examples that support clustering as the suitable method to cluster search result and clustered result from e-filing web-based system.

The next chapter discusses the conclusion and recommendations for the research project.

62

CHAPTER 6

CONCLUSION AND RECOMMENDATIONS

6.1

Introduction

This chapter will conclude what has been done by the researcher from defining the objectives until obtaining the findings through developing the prototype of e-filing web-based system using data mining techniques. This chapter also concludes the report for this project and provides limitations of the software and recommendations for those who wish to pursue the research on the development of the e-filing web-based system.

6.2

Conclusions

As for the conclusion of the research project on a development the prototype of e-filing web-based system using data mining techniques, the researcher managed to achieve the entire objectives based on defined

research approach and methodology that consists of a proper theoretical findings (Secondary Data) and data findings (Primary Data). The achievement of these objectives is hoped to provide solutions to the current problems in Majlis Daerah Kerian, Parit Buntar, Perak.

The first objective of the research project is to identify requirements that will be needed for e-filing from Majlis Daerah Kerian. This objective has been achieved through requirement gathering by conducting interview session with staffs in Majlis Daerah Kerian in order to know the current problems and functional requirements for e-filing web-based system. The deliverable for this objective has been documented and can be referred in the Appendix D: Software Requirement Specification (SRS).
63

The second objective of the research project is to identify the searching method based on data mining techniques. For this phase, researcher reviewed many resources such as article, journal, books and other related academic publication information about e-filing and Data Mining in order to gain deeper understanding to e-filing and Data Mining. This secondary data is useful to identify suitable searching method using data mining techniques. Researcher make comparison between three popular data mining techniques (association, classification and clustering) in order to identify suitable techniques for searching method in e-filing web-based system. This objective has been achieved when researcher found that clustering is the suitable searching method for e-filing web-based system.

After the second objective has been achieved, the research proceeds with the third objective of designing e-filing web-based system. This objective has been achieved through the design stage, which is system design and detailed design. In system design, the development of e-filing web-based system highlight the importance of interface design with the human computer interface characteristics through proper choosing of colors, buttons, and fonts. Despite, overall system structure is produced to illustrate how the overall system works. In detailed design, it addressed the design of classes and the detail working of this project system. The detail design described the attributes, operations, and classes. The third objective deliverables been documented and can be referred in the Appendix E: Software Design Document (SDD).

The fourth objective of this project is to demonstrate e-filing web-based system using identified data mining technique. The third objective must follow the three objectives that have been achieved. It was based on the project methodology that consists of requirement gathering and analyzing, then designing the model that must follows the user requirements. Finally, the process of development the prototype is implemented by translating the design into program code using selected programming platform, database server, web server and selected data mining technique. Thus, the last objective has been realized.
64

By developing e-filing web-based system for Majlis Daerah Kerian, it is expected that it will providing staff interactive environment in making their choice in determining the suitable files that meets their requirements. Besides, it also expects that it will help staff to identify their needed files more accurate and faster as a result of using suitable searching method using selected data mining technique. This system also expected to become information center for staff in Majlis Daerah Kerian to gather information about status of the files.

Although all the objectives have been achieved, the e-filing web-based system using data mining technique is far from complete and has its own limitations. There are still lots of improvement that can be considered to enhance this project. The limitations and recommendation for this project are discussed below.

6.3

Limitations

The project had encountered a number of limitations while in progress. The limitations are as follows :

a. The interview session for gathering the information about the current problems and functional requirements was conducted only with Head of Information Technology and Administrative Assistant of Majlis Daerah Kerian. Interview with two person only, provide less information about the requirements.

b. Due to the time constraint, researcher developed the prototype of efiling web-based system which is the system for demonstrate purposes.

65

c. There are a lot of journal regarding data mining techniques, but researcher faces difficulties to understand each journal because not familiar with this knowledge.

d. There are three different data mining techniques, but researcher must select the better data mining techniques that suite with the objective. Researcher need to study properly for each data mining techniques and come out with the related journals that support the findings.

e. There are a large number of data mining tools available, but not all the tools support different kind of data mining techniques. So researcher need to study the tools based on their function and usability with the selected data mining techniques. Furthermore, the tool used in this research is new to the researcher so that requires time to familiarize with the tool.

f. Experience of the researcher is another limitation factor of this research. This is the first research for the researcher. However, researcher can learn and have the proper guide based on the research plan and instruction from the supervisor and examiner.

6.4

Recommendations
There are several recommendations that can be considered to further enhance the development of e-filing web-based system as the following:

a. Suggest that project scope of the system to be expanded to know contents of the files other than status of the files.

b. Suggest that this system can be used by others local government, not only Majlis Daerah Kerian.

66

c. Suggest that project can be online through the Internet so that it can be access by everyone at anytime and anywhere. It is because, this project has limited access by using Local Area Network (LAN) only.

Through the implementation of this system, hopefully there will be other enhancement made for further project.

67

REFERENCES

Abbott, D.W., Matkovsky, I.P., & Elder, J.F. (1998). An Evaluation of High-end Data Mining Tools for Fraud Detection. IEEE Transaction on Knowledge and Data Engineering, 2836.

Aliakbary, S., Khayyamian, M., & Abolhassani, H. (2008). Using Social Annotations for Search Result Clustering. Retrieved February 10, 2010, from http:// www.springerlink.com/index/v770wm385n256p68.pdf

Apache. (2002). Retrieved February 14, 2010, from The Apache Software Foundation: http://apache.org/

Bennett, S., McRobb, S., & Farmer, R. (2006). Object-Oriented Systems Analysis and Design Using UML Third Edition. McGraw-Hill Education(UK) Limited.

Bialynicka, I. (2008). Clustering Web Search Results. Retrieved March 2, 2010, from http://medialab.di.unipi.it/web/Search+QA/Seminar/Clustering.ppt

Carrot (2010). Carrot-Open Source Search Results Clustering Engine. Retrieved March 1, 2010, from Carrot Website : http://project.carrot2.org/index.html

Chen, M., Han, J., & Yu, S.Y. (1996). Data Mining : An Overview from a Database Perspective. IEEE Transaction on Knowledge and Data Engineering, 8, 6.

Collier, K., Carey, B., Sautter, D., & Marjaniemi, C. (1999). A Methodology for Evaluating and Selecting Data Mining Software. IEEE Transaction on Knowledge and Data Engineering, 2-4.

68

Defit, S., & Md Sap, M. N. (2009). Mining Association Rule from Large Databases. Retrieved October 10, 2009, from http://fsksm.utm.edu.my

Garofalakis, M. N., Rastogi, R., Seshadri, S., & Shim, K. (1999). Data Mining and the Web : Past, Present and Future. Retrieved July 17, 2009, from http://www.softnet.tuc.gr/~minos/Papers/widm99.pdf

IBM Corporation. (2006). IBM Rational Rose. Retrieved March 1, 2010, from http://ftp.software.ibm.com/software/rational/web/datasheets/rose_ds.pdf

Jain, A. K., Murty, M. N., & Flynn, P. J. (2000). Data Clustering: A Review. ACM Computing.

Jasco, P. (2007). Clustering Search Result, Part 1: Web-wide Search Engines. Retrieved January 5, 2010, from http://www.emeraldinsight.com/1468-4527.htm

Khodra, M. L., Widyantoro, D. H. (2007). An Efficient and Effective Algorithm for Hierarchical Classification of Search Results. Retrieved March 20, 2010, from http://repository.gunadarma.ac.id:8000/711/1/C-07.pdf

Lee, H. K. (2005). Inductive Clustering : A Technique for Clustering Search Results. Retrieved July 15, 2009, from http://sifaka.cs.uiuc.edu/course /598cxz05s/report-hle.pdf

Levy, P. (2007). A Review of Adobe Photoshop CS3. Retrieved February 3, 2010, from http://www.becs-wa.org/PhotoShop_CS3.pdf

Lipai, A. (2008). World Wide Web Metasearch Clustering Algorithm. Retrieved March 13, 2010, from http://revistaie.ase.ro/content/46/Adina%20Lipai.pdf

MySQL. (2009). Retrieved Disember 28, 2009, from MySQL Website: http://www.mysql.com/

69

Olson, T., Edwards, M., & Monty, H.A. (2003). A Guide to Model Rules for Electronic Filing and Service. Retrieved July 15, 2009, from

http://www.ncsconline.org/WC/Publications/External_ElFileModelRulesLexi sPub.pdf

Phyu, T.N. (2009). Survey of Classification Techniques in Data Mining. Retrieved August 5, 2009, from http://www.iaeng.org/publication/IMECS2009/IMECS2009pp727-731.pdf

Qiu, M., Davis, S., & Ikem, F. (2004). Evaluation of Clustering Techniques in Data Mining Tools. Retrieved January 5, 2010, from

http://www.iacis.org/iis/2004_iis/PDFfiles/QiuDavisIkem.pdf

Ravichandra, R. (2003). Data Mining and Clustering Techniques. Retrieved April 1, 2010, from https://drtc.isibang.ac.in/bitstream/handle/1849/121 /K_ikr_datamining.PDF?sequence=2

San Diego State University. (2004). Dreamweaver MX 2004 Introduction. San Diego, Berkeley. Academic Affairs.

Shyu, M. L., Chen, S. C., & Haruechaiyasak, C. (2005). Retrieved February 12, 2010, from http://www.hlt.nectec.or.th/Publications/Conferences/A%20 Data%20Mining%20Framework%20for%20Building%20A%20WebPage%20Recommender%20System.pdf

Tang, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston : Pearson Education.

Visnick, L. (2003). Clustering Techniques. Retrieved July 30, 2009, from http://www.progress.com/realtime/docs/whitepapers/ clustering_techniques.pdf

Zhang, H., Xie, K., & Wu, H. (2006). An Efficient Algorithm for Clustering Search Engine Results. Retrieved February 6, 2010, from http://www.ieee.org
70

APPENDICES

71

APPENDIX A PROJECT PLANNING

A 72

APPENDIX B PROGRESS SLIDE PRESENTATION

73 B

APPENDIX C INTERVIEW QUESTION

C 74

APPENDIX D SOFTWARE REQUIREMENT SPECIFICATION (SRS)

D 75

APPENDIX E SOFTWARE DESIGN DOCUMENT (SDD)

E 76

APPENDIX F DESCRIPTION OF SYSTEM INTERFACE

77 F

APPENDIX G IN-PROGRESS ASSESSMENT

78 G

UNIVERSITI TEKNOLOGI MARA

DEVELOPMENT OF E-FILING FOR MAJLIS DAERAH KERIAN USING DATA MINING TECHNIQUES

MOHAMED SYAHMI BIN MOHAMED ISA

BSc. (Hons) INFORMATION SYSTEM ENGINEERING

MAY 2010

79

Universiti Teknologi MARA

Development of E-Filing for Majlis Daerah Kerian using Data Mining Techniques

Mohamed Syahmi Bin Mohamed Isa

Thesis submitted in fulfillment of the requirements for Bachelor of Science (Hons) Information System Engineering Faculty of Computer and Mathematical Sciences

MAY 2010

80

DECLARATION

This declaration is to certify that this thesis and all of its submitted contents are original in its stature, excluding those in which have been acknowledged specifically in the references. The contents of this thesis are of my own endeavor and any ideas or quotations from the work of other people; published or otherwise are fully acknowledged in accordance with the standard referring practices of the discipline.

Name of Candidate : Candidates ID No. : Programme :

MOHAMED SYAHMI BIN MOHAMED ISA 2008287242 BACHELOR OF SCIENCE (HONS) INFORMATION SYSTEM ENGINEERING (CS 226)

Faculty

FACULTY OF COMPUTER AND MATHEMATICAL SCIENCES

Project Title

DEVELOPMENT OF E-FILING FOR MAJLIS DAERAH KERIAN USING DATA MINING TECHNIQUES

Signature of candidate : 24th MAY 2010

Date

81

APPROVAL

DEVELOPMENT OF E-FILING FOR MAJLIS DAERAH KERIAN USING DATA MINING TECHNIQUES

By

Mohamed Syahmi Bin Mohamed Isa 2008287242

This thesis is prepared under the direction of thesis coordinators, Assoc. Prof. Wan Nor Amalina Wan Hariri and Assoc. Prof. Rashidah Md. Rawi, Information System Engineering Program, and it has been approved by the thesis supervisor, Puan Norisan Abd Karim. It was submitted to the Faculty of Computer and Mathematical Sciences and was accepted in partial fulfillment of the requirement for the degree of Bachelor of Science.

Approved by:

__________________________ Madam Norisan Abd Karim Thesis Supervisor Date: 24th May 2010

82

DEDICATION

For my mother, Sadiah Binti Harun, my late father, Mohamed Isa Bin Harun, and my brothers.

83

ACKNOWLEDGEMENT

Praise be to Allah SWT Most Gracious, Most Beneficent

Firstly, I would like to pay my gratitude to Allah S.W.T for giving me strength to be able to complete this project. Without His blessing and permission, this project could not have been completed.

I would like to give my sincere appreciation to my supervisor Puan Norisan Abd Karim for her concern, advices, supports and encouragement throughout this thesis progress. My gratitude also goes to my coordinator of Final Year Project (ITS690) FSKM, UiTM Shah Alam, Assoc. Prof. Wan Nor Amalina Wan Hariri and Assoc. Prof. Rashidah Md. Rawi for their valuable guidance in the completion of this project.

Special thanks to Mr. Gobibaskaran and Puan Shalina for giving the opportunity to perform the interview session that helped me in gathering the requirements for this project.

Finally yet importantly, thoughtful thanks to my parents, who gave me an appreciation of learning and taught me the value of perseverance and resolve. I also would like to say thank you to my friends for their support and to the entire person that directly or indirectly helped me in this project. Thanks for inspiring me in such a means that could not be written in words. May Allah SWT bless all of you.

84 i

TABLE OF CONTENTS

TITLE

PAGE

ACKNOWLEDGEMENT TABLE OF CONTENT LIST OF TABLES LIST OF FIGURES ABSTRACT

i ii vi vii viii

CHAPTER 1 INTRODUCTION 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 Research Background Problem Statement Aim Objective of the Research Significance of Research Scope of Study Limitation Outcomes/Deliverables Layout of Dissertation Summary 1 2 3 3 3 4 4 5 5 6

CHAPTER 2 LITERATURE REVIEW 2.1 2.2 Introduction E-Filing 2.2.1 Introduction to E-Filing 2.2.2 Purposes of the Rules in E-Filing 2.2.3 Proposed Model Rules for E-Filing 2.3 What is Data Mining 2.3.1 Definition of Data Mining
85 ii

7 7 8

2.3.2 Data Mining & Knowledge Discovery 2.3.3 Challenges of Data Mining 2.4 Data Mining Techniques 2.4.1 Overview of Data Mining Techniques 2.4.2 Classifying Data Mining Techniques 2.4.3 Association Rules 2.4.4 Classification 2.4.5 Clustering 2.5 2.6 2.7 Selecting Data Mining Techniques Selecting Data Mining Tools Summary

9 12

15 15 17 18 20 22 29 33

CHAPTER 3 RESEARCH APPROACH AND METHODOLOGY 3.1 3.2 3.3 Introduction Problem Identification and Planning Requirement Gathering 3.3.1 Primary Data 3.3.2 Secondary Data 3.4 3.5 3.6 3.7 Requirement Analysis Design Model Develop Prototype Summary 34 35 36 36 37 37 38 38 39

CHAPTER 4 PROTOTYPE CONSTRUCTION 4.1 4.2 Introduction Software Requirement 4.2.1 Software Tools 4.2.2 Software Tools Installation 4.3 4.4 Hardware Requirements Development Phase
86 iii

40 40 40 41 44 44

4.4.1 Requirement Analysis Phase 4.4.2 Design Phase 4.4.3 Development Phase 4.5 Summary

45 45 46 48

CHAPTER 5 RESULT AND FINDINGS 5.1 5.2 Introduction Interview Results 5.2.1 Current Problems 5.2.2 Functional Requirements 5.3 5.4 5.5 Use Case Diagram Class Diagram Clustering as the Suitable Searching Method 5.5.1 Introduction 5.5.2 Why Clustering Search Result 5.5.3 Examples of Clustering Search Result 5.5.4 Clustering Search Result from e-filing web-based system 5.6 Summary 61 62 49 49 50 52 54 56 57 57 57 58

CHAPTER 6 CONCLUSION AND RECOMMENDATIONS 6.1 6.2 6.3 6.4 Introduction Conclusions Limitations Recommendations 63 63 65 66

REFERENCES

68

87 iv

APPENDICES

71

APPENDIX A : Project Planning APPENDIX B : Progress Slide Presentation APPENDIX C : Interview Question APPENDIX D : Software Requirements Specification (SRS) APPENDIX E : Software Design Document (SDD) APPENDIX F : Description Of System Interface APPENDIX G : In-Progress Assessment

A B C D E F G

88 v

LIST OF TABLES

Table 2.1 : Differences of Classification, Association and Clustering techniques 25 Table 2.2 : Computational Performance Criteria (Collier et. al, 1999) Table 2.3 : Functionality Criteria (Collier et. al, 1999) Table 2.4 : Usability Criteria (Collier et. al, 1999) Table 4.1 : Software Tools Specifications Table 5.1 : The problems that have been identified from the interviews Table 5.2 : The requirement and suggestion that had been identified from the interviews Table 5.3 : Description of Use Case diagram 52 55 30 30 31 40 50

89 vi

LIST OF FIGURES

Figure 2.1 : The Process of knowledge discovery in database Figure 2.2 : Process for designing and implementing arecommender system (Shyu et al., 2005) Figure 2.3: The general architecture of Mining Association Rule model (Defit & Md Sap, 2001) Figure 2.4: Hierarchical Classification Process (Khodra & Widyantoro, 2007) Figure 2.5 : Stages in clustering (Jain et al., 1999) Figure 3.1 : Overview of Research Approach and Methodology Figure 4.1 : Coding index.php Figure 4.2 : The main page interface of e-filing Figure 5.1 : A Model for Navigating Interview Processes in Requirements Elicitation Figure 5.2 : Use Case Diagram for E-Filing web-based system Figure 5.3 : Class Diagram for E-Filing web-based system Figure 5.4 : Googles One Dimensional Result List Figure 5.5 : Good clustering result list Figure 5.6 : Good clustering result list from http://search.carrot2.org Figure 5.7 : Good clustering result list from e-filing web-based system Figure 5.8 : Data Mining Tool by Carrot

10

11

17 19 21 34 47 48

49 54 56 59 59 60 61 62

90 vii

ABSTRACT

E-filing web-based system is a development project that using a data mining technique called clustering. There are different types of data mining that are useful based on their functions and stated conditions. Majlis Daerah Kerian act as local government which is a government unit that is closest to the citizens and these includes municipalities, local authorities, town councils and city councils. The staff in Majlis Daerah Kerian face difficulties in managing and identifying needed files that meet their requirement. This is because, they have thousand of files and eight departments, so that is difficult to search needed files manually that contains many steps to follow. This research provides suitable searching method using data mining technique for e-filing web-based system. The researcher make comparison between three different data mining techniques (association, classification and clustering) to identify suitable data mining technique for searching files and do interview session with staff in Majlis Daerah Kerian to gather details requirement. By developing efiling web-based system for Majlis Daerah Kerian, it will help staff to identify their needed files more accurate and faster as a result of using suitable searching method by selected data mining techniques. It also will provide staff with interactive environment in making their choice in determining the suitable files that meets their requirements. It is expected that this e-filing web-based system will act as information center for staff in Majlis Daerah Kerian to gather information about status of the files.

91 viii

You might also like