Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Hybrid Sequential Feature Selection -II

Hybrid Sequential Feature Selection -II

Ratings: (0)|Views: 9|Likes:
Published by riazahmad82

More info:

Published by: riazahmad82 on Jul 24, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as DOC, PDF, TXT or read online from Scribd
See more
See less

09/07/2009

pdf

text

original

 
HYBRID SEQUENTIAL FEATURE SELECTION
Assignment No. 3Submitted ByNameRiaz AhmadRegistration No.L1R08MSCS0006
University of Central PunjabDepartment of Information & TechnologyLahore.
 
HYBRID SEQUENTIAL FEATURE SELECTION
Riaz Ahmad
Faculty of Information and TechnologyUniversity of Central Punjab, Lahore, Pakistan.{riaz.ahmad@ucp.edu.pk}
 ABSTRACT:
The data collected for data mining might have many irrelevant as well as redundant features.There is a need to remove these irrelevant and redundant features, because these features does not add or affect the target concept [1]. But this data must be removed in many applications, so that learning canwork. Removing these features improve the efficiency of the learning algorithm as well make the model  simpler and general. Many algorithms have been introduced for feature selection .e.g. [5,8,9,16,17,19,20].The focus of the paper is the Forward Sequential Selection (FSS) and Backward Sequential Selection (BSS) features selection techniques. In these techniques a feature is removed or added by comparing the accuracyrates. But there is still lack of selection criteria if two features having same accuracy rates. This work addsa selection criteria if there is a tie between the two features candidates for selection. As well propose ahybrid technique BF Sequential Feature Selection incorporating this selection criterion in both techniques, showing better results comparing to both techniques. Let F be the set of features, S1 = {F1, F3, F4} result after applying FSS and S2= {F1, F3, F4, F5} after applying BSS algorithm with incorporated selectioncriterion. Then the final result of the proposed hybrid technique is the union of both resultant featuresS = {F1, F2, F3, F4, F5}.
1. INTRODUCTION
Data Mining is new emerging technology these days used tofind hidden, interesting and previously unknown patterns indata [1]. The data to be mined can be in any form e.g.Relations, text, images etc. Question arises here that why weneed to mine the data and use different algorithms or tools or in simple words can we use the powerful features provided by SQL to achieve this goal? The answer is very simple‘NO’, because SQL queries are simply used to present thedata in different forms either detailed form or summarizedform, mostly provide the patterns that we already know.Data Mining algorithms give us the patterns that we alreadydon’t know (unusual but of our interest).If we further elaborate Data Mining, we can say as CoalMiners dig the soil and come up with coal or different minesthat are their main objective. In data mining our objective isto dig into the data and come up with interesting results toour stakeholders that were previously unknown. If we comewith the result that all females got pregnant then it’s not anunknown result. No need to present such results becausethese results are known or of no interest to our stakeholder.Data mining is also known as Knowledge Discovery. Now adays where we are with full of new technologydevelopments, companies have massive data in the form of databases, data marts and data warehouses to run their day today business tasks as well for decision makings. Datamining as discussed in the start helps to uncover hidden patterns lie under the data. In this era data mining is possibleas the computing power is affordable as well as data miningtools are easily available in the market [2].
1.1 Prerequisites for Data Mining
Data Mining comes from the different fields of scienceincluding machine learning, neural networks, statistics,database systems and data warehousing. To learn about datamining we must have knowledge about these areas up tosome extent [1].
1.2 Importance/Need of Data Mining
Organizations are storing their day to day data since years;they have terabyte of data. Organizations are bearing cost of keeping this data for years. They were forced to think whatto do with this data. Then the concept of warehousing camefor efficient reporting but still organizations could not usetheir data in an efficient way. Data warehousing helps themto view data in an efficient way as well up to some extent for decision making. Then the concept of data mining came intoexistence. Data Mining helps to make decisions in avalidated way. With the help of data mining we can find therelationships or associations between two products. For example if want to know that when a customer buy an eggwhat other product he will buy as well and vice versa [2]. Inthis way the stakeholder can keep the most commonly bought products together or can offer the promotions for other products. In the same way data mining can helpstakeholder for forecasting mean to say prediction. Datamining is used both to reduce cost as well to increaserevenue.
1.3 Data Mining Process Model
Following steps are involved in data mining [5]
Data cleaning, (removes or transforms noise andinconsistent data)
Data integration, (combine data from multiple datasources)
Data selection, (data relevant to the analysis task are retrieved from the database)
Data transformation, (data are transformed or consolidated into forms appropriate for mining)
Data mining, (model construction, Algorithms areapplied to uncover patterns)
Patterns evaluation, (checking results)
Knowledge presentation, (using visualization andknowledge representation techniques)
1.4 Types of Data Mining
We can divide the types of data mining into two categories,first category concerns with the type of data to be mined; Imean to say Text Mining, Web Mining, and Graph Miningetc. Second category comprises of different ways of miningthe data [3,4].These consist of the following types.
Association Rule Mining
Classification
Clustering
Prediction
 
RegressionLets us discuss about these types of Data Mining Algorithms
1.4.1 Association Rule Mining
This technique is used to find the interesting relationshipsamong the data items. A most famous application of association rule mining is market basket analysis. In market- basket analysis, we try to observe the different buying trendsof customers. It helps the stakeholder to offer promotions aswell the placement of items. For example items egg, butter and butter purchased together can be kept together.Association rule mining is unsupervised.
1.4.2 Classification
Classification is used to divide the data items on the basis of class attributes. Most famous example of classification is todivide the given sale data into the class of data consisting of those who buy and who don’t buy. Classification issupervised as the class attribute is already known on the basis of which data is divided into different classes. The bestexample of classification is decision trees.
1.4.3 Clustering
Clustering is used to group the items having samecharacteristics. Example of clustering can be thought asgrouping the students of a university into different groupsconsidering the similarities found in them. As usually wedivide the students into groups as intelligent students anddull minded students. Clustering is also unsupervised unlikethe classification in which class attribute is given.
1.4.4 Prediction
Prediction means telling about future. Mostly it is used in predicting the product sales. This technique observes the previous history of the data items. For example one mightwant to predict will it rain in the next week. Then climateconditions in the previous days, last year during these dayswill be observed to predict. Similarly one might want toobserve the passing percentage of the students next year.
1.5 Applications of Data Mining
The applications of data mining are almost in every field of life. Main areas of application are
Finance (Loan Application Processing, Credit CardAnalysis)
Insurance (Claims, Fraud Analysis)
Telecommunications (Call History Analysis, Frauddetections, Promotions)
Transport
Marketing & Sales
Electricity Supply Forecasting
MedicalLet us discuss the examples of each area of application. Infinance/Banking data mining application is used in loanapplication processing. While processing loan application,data mining helps to make decisions either to accept or rejectthe loan application by analyzing different attributes of theapplicant. Algorithm can use the attribute of region, race,salary, years of service with the current employer, years of living at current place, family background as well previousloan history. It helps you to make decision accurately.Further elaborating might be the people of that regions aremostly defaulters similarly might be the black people aremost vulnerable to defaulter. Same is the case with creditcard fraud detection. History of customer is analyzed todetect the fraud. For example credit card holder history tellsthat he uses card mostly in the very start of month and never make a transaction of a huge amount. If someone else triesto use his/her card illegally to make unusual transaction itcan be detected easily.Insurance Companies use data mining for claims and fraudanalysis. For example it might help insurance company tomake decision either to give insurance to the particular  person or not similarly by analyzing different attributes fromthe profile of the customer. Data about customer fromexternal sources mean to say from different companies canalso be used besides the data provided by the customer.In telecommunication, company might want to offedifferent packages; data mining helps them to makedecisions to offer promotions for what age of people duringwhich hours for different regions by analyzing the traffic.The major use of data mining in telecommunication is alsothe fraud detection by observing the behavior of their customers (voucher recharging, call durations etc)In Transport, bus or airline Company can take help byapplying data mining to make a decision about new rout. Byanalyzing the traveling trend of passengers, theirs likes or dislikes during journey. Either they must increase or decrease the no. of busses/airplane on a specific rout. Thisall is only possible by the application of data mining.Marketing & Sales point of view as I have discussed earlier Market-Basket analysis. A shopkeeper might offe promotions on different items based on analyzing the buyingtrend of consumers at different places. Similarly the placement of items is made with the help of data miningresults.Power Supply Forecasting is also a main area of applicationof data mining. Power supply companies use the data miningfor power usage analysis of domestic and commercial areasduring different hours, months to forecast the powerequirements for the next month, next quarter or next year.As in Europe electricity is charged differently in differenthours of the day. Such types of decisions are only possiblewith the usage of data mining.There are also many other areas of application of datamining, in simple data mining has become the need of industries. In Europe data mining is being used since years,in Pakistan industries have just started the data mining after knowing its importance.Section 2 discusses the FSS and BSS feature selectionmethods from the material studied for supervised learning.Section 3 introduces and explains criterion addition to BSSand BSS while selecting feature among candidate features.Section 4 discusses the newly hybrid nature proposedtechnique. Section 5 concludes this work with key findings.
2. MATERIAL AND METHOD
Data for mining is collected from the different sources.Cleansing and pre-processing is performed on the data.During the pre-processing of data some features/attributesare added and removed. Before adding an attribute it is madesure that is this attribute have worth or not. Similarly whileremoving an attribute or feature the accuracy rate of classifier must not be compromised. The need to remove anattribute is to simplify the data as much as possible so thatthe efficiency of the algorithm could be increased and themodel could be simpler [5]. Attributes can be divided intotwo types, relevant and irrelevant [6].Feature selection has

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->