You are on page 1of 43
UNIT V CLASSIFICATION AND PREDICTION There are two forms of data analysis that can be used for extracting models describing important classes or to predict future data trends. These two forms are as follows — Classification Prediction Classification models predict categorical class labels; Prediction models predict continuous valued functions. For example, we can build a classification model to categorize bank loan applications as either safe or risky, or a prediction model to predict the expenditures in dollars of potential customers on computer equipment given their income and occupation. What is classification? Classification is to identify the category or the class label of a new observation. Following are the examples of cases where the data analysis task is Classification — A bank loan officer wants to analyze the data in order to know which customer (loan applicant) are risky or which are safe. A marketing manager at a company needs to analyze a customer with a given profile, who will buy anew computer. In both of the above examples, a model or classifier is constructed to predict the categorical labels. These labels are risky or safe for loan application data and yes or “= we What is Prediction? It is used to find a numerical output. Same as in classification, the training dataset contains the inputs and corresponding numerical output values. . The model should find a numerical output when the new data is given.. Regression is generally used for prediction. Predicting the value of a house depending on the facts such as the number of rooms, the total area, etc., is an example for prediction. For example, suppose the marketing manager needs to predict how much a particular customer will spend at his company during a sale. CLASSIFICATION VS PREDICTION | Classification Classification is the process of identifying which category a new observation belongs to based on a training data set containing observations whose category membership is known. In classification, the accuracy depends on finding the class label correctly. In classification, the model can be known as the classifier. A model or the classifier is constructed to find the categorical labels. For example, the grouping of patients based on their medical records can be considered a classification. Prediction Predication is the process of ident the missing or unavailable nume for a new observation. In prediction, the accuracy depends on how well a given predictor can guess the value of a predicated attribute for new data. In prediction, the model can be known as the predictor. A model or a predictor will be constructed that predicts a continuous-valued function or ordered value. as predicting the correct treatment for a For example, We can think of prediction particular disease for a person. | HOW DOES CLASSIFICATION WORKS? “ With the help of the bank loan application that we have discussed above, let us understand the working of classification. The Data Classification process includes two steps - > Building the Classifier or Model > Using Classifier for Classification ¢ Building the Classifier or Model > This step is the learning step or the learning phase. > In this step the classification algorithms build the classifier. > The classifier is built from the training set made up of database tuples and their associated class labels. > Each tuple that constitutes the training set is referred to as a category or class. These tuples can also be referred to as sample, object or data points. a Building the Classifier or Model a young middle-aged high middle-aged low senior low Classification Algorithm IFage=youth THEN loan_decision=risky IF income=high THEN loan-decision=safe IF age=middle-aged AND income=low THEN loan_decision=risky Using Classifier for Classification In this step, the classifier is used for classification. Here the test data is used to estimate the accuracy of classification rules. The classification rules can be applied to the new data tuples if the accuracy is considered acceptable. Classification Rules | income loan_decision John, middle-aged, low income, loan decision + risky 4 young Rick middle-aged high sofe middle-aged low senior low Classification and Prediction Issues The major issue is preparing the data for Classification and Prediction. Preparing the data involves the following activities — Data Cleaning - Data cleaning involves removing the noise and treatment of missing values. The noise is removed by applying smoothing techniques and the problem of missing values is solved by replacing a missing value with most commonly occurring value for that attribute. Relevance Analysis — Database may also have the irrelevant attributes. Correlation analysis is used to know whether any two given attributes are related. ‘ Data Transformation and reduction Ge - The data can be transformed by any of the following methods. Normalization is used to scale the data of an attribute so that it falls ina smaller range, such as -1.0 to 1.0 or 0.0 to 1.0. It is generally useful for classification algorithms. Need of Normalization - Normalization is generally required when we are dealing with attributes on a different scale, otherwise, it may lead to a dilution in effectiveness of an important equally important attribute(on lower scale) because of other attribute having values on larger scale. In simple words, when multiple attributes are there but attributes have values on different scales, this may lead to poor data models while performing data mining operations. So they are normalized to bring all the attributes on the same scale. The attributes salary and year_of_experience are on different scale and hence attribute salary can take high priority over attribute year_of_experience in the model. Comparison of Classification and Prediction Methods Here is the criteria for comparing the methods of Classification and Prediction - Accuracy - Accuracy of classifier refers to the ability of classifier. It predict the class label correctly and the accuracy of the predictor refers to how well a given predictor can guess the value of predicted attribute for a new data. Speed - This refers to the computational cost in generating and using the classifier or predictor. Robustness - It refers to the ability of classifier or predictor to make correct predictions from given noisy data. Scalability — Scalability refers to the ability to construct the classifier or predictor efficiently; given large amount of data. Interpretability — It refers to what extent the classifier or predictor understands. Decision Tree Induction A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch denotes the outcome ofa test, and each leaf node holds a class label. The topmost node in the tree is the root node. The benefits of having a decision tree are as follows — Q It does not require any domain knowledge. Q Itis easy to comprehend. Q The learning and classification steps of a decision tree are simple and fast. Decision Tree Induction The following decision tree is for the concept buy_computer that indicates whether a customer at a company is likely to buy a computer or not. Each internal node represents a test on an attribute. Each leaf node represents a class. young senior middle- aged ‘Student? Credit_rating? /\ -/\—- “1 Decision Tree Induction Algorithm A machine researcher named J. Ross Quinlan in 1980 developed a decision tree algorithm known as ID3 (Iterative Dichotomiser). Later, he presented C4.5, which was the successor of ID3. ID3 and C4.5 adopt a greedy approach. In this algorithm, there is no backtracking; the trees are constructed in a top-down recursive divide-and-conquer manner. a“ Tree Pruning Tree pruning is performed in order to remove anomalies in the training data due to noise or outliers. The pruned trees are smaller and less complex. Tree Pruning Approaches There are two approaches to prune a tree — Pre-pruning - The tree is pruned by halting its construction early. Post-pruning - This approach removes a sub-tree from a fully grown tree. “1 Data Mining - Bayesian Classification Bayesian classification is based on Bayes' Theorem. Bayesian classifiers are the statistical classifiers. Bayesian classifiers can predict class membership probabilities such as the probability that a given tuple belongs to a particular class. Baye's Theorem Bayes' Theorem is named after Thomas Bayes. There are two types of probabilities — Posterior Probability [P(H/X)] Prior Probability [P(H)] where X is data tuple and H is some hypothesis. According to Bayes' Theorem, PUH/X)= POH) PL) / Pog RY Bayesian Belief Network They are also known as Belief Networks, Bayesian Networks, or Probabilistic Networks. A Belief Network allows class conditional independencies to be defined between subsets of variables. It provides a graphical model of causal relationship on which learning can be performed. There are two components that define a Bayesian Belief Network — Directed acyclic graph A set of conditional probability tables =“ Directed Acyclic Graph * Each node in a directed acyclic graph represents a random variable. * These variable may be discrete or continuous valued. * These variables may correspond to the actual attribute given in the data. ~ Directed Acyclic Graph Representation The following diagram shows a directed acyclic graph for six Boolean variables. The arc in the diagram allows representation of causal knowledge. For example, lung cancer is influenced by a person's family history of lung cancer, as well as whether or not the person is a smoker. It is worth noting that the variable PositiveXray is independent of whether the patient has a family history of lung cancer or that the patient is a smoker, given that we know the patient has lung cancer, Conditional Probability Table The conditional probability table for the values of the variable LungCancer (LC) showing each possible combination of the values of its parent nodes, FamilyHistory (FH), and Smoker (S) is as follows - FHS -FH,-S_-FH,S-FH,S CLASSIFICATION BY BACKPROPAGATION + Back propagation, or backward propagation is an algorithm that is designed to test for errors working back from output nodes to input nodes. + Itis an important mathematical tool for improving the accuracy of predictions in data mining and machine learning. * The characteristics of Back propagation are the iterative, recursive and effective approach through which it computes the updated weight to enhance the network + Back propagation is generally used in neural network training and computes the loss function concerning the weights of the network. + Itfunctions with a multi-layer neural network and observes the internal representations of input-output mapping. a CLASSIFICATION BY BACKPROPAGATION > A neural network: A set of connected input/output units where each connection has a weight associated with it . > Neural networks can help computers make intelligent decisions with limited human assistance. This is because they can learn and model the relationships between input and output data that are nonlinear and complex. Neural Network as a Classifier WEAKNESS o Long training time o Require a number of parameters typically best determined empirically, e.g., the network topology or “structure.” o Poor interpretability: Difficult to interpret the symbolic meaning behind the learned weights and of “hidden units” in the network ~T CLASSIFICATION BY BACKPROPAGATION Strength o High tolerance to noisy data o Ability to classify untrained patterns o Well-suited for continuous-valued inputs and outputs o Successful on a wide array of real-world data o Algorithms are inherently parallel o Techniques have recently been developed for the extraction of rules from trained neural networks .\ PROCESS Initialize the weights: = The weights in the network are initialized to small random numbers ranging from-1.0 to 1.0, or -0.5 to 0.5.(small random numbers) = Each training tuple, X, is processed by the following steps. = Propagate the inputs forward: First, the training tuple is fed to the input layer of the network. = The inputs pass through the input units, unchanged. That is, for an input unitj, its output, Oj, is equal to its input value, Jj. = Next, the net input and output of eachunit in the hidden and output layers are computed. ~ PROCESS = The net input to a unit in the hidden or output layers is computed as a linear combination of its inputs. = Each such unit has a number of inputs to it that are, in fact, the outputs of the units connected to it in the previous layer. = Each connection has a weight. = To compute the net input to the unit, each input connected to the unit is multiplied by its corresponding weight, and this is summed. PROCESS ANeuron(=aperceptron) Tapas Weighted Activation courputs from som function previous layer) A hidden or output layer unit j: The inputs to unit j are outputs from the previous layer. “These are multiplied by their corresponding weights in order to form a weighted sum, which isadded to the bias associated with unit j. A nonlinear activation function is applied to the net input. (For ease of explanation, the inputs to unit j are labeled y1,,y25-.-y ye: unit 7 were in the first hidden layer, then these inputs would correspond to the input tuple (x -ta).) > The n-dimensional input vector x is mapped into variable y by means of the scalar product anda nonlinear function mapping. k-Nearest-Neighbor Classifier: = Nearest-neighbor classifiers are based on comparing a given test tuple with training tuples that are similar to it. = The training tuples are described by n attributes. = Each tuple represents a point in an n-dimensional space. = In this way, all of the training tuples are stored in an n- dimensional pattern space. = When given an unknown tuple, a k-nearest-neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple. = These k training tuples are the k nearest neighbors of the unknown tuple. KNN Algorithm Closeness is defined in terms of a distance metric, such as Euclidean distance. The Euclidean distance between two points or tuples, say, X1 = (x11, x12, ..., x1n) and X2 = (x21, x22, ...,x2n), is In other words, for each numeric attribute, we take the difference between the corresponding values of that attribute in tuple Xland in tuple X2, square this difference,and accumulate it. The square root is taken of the total accumulated distance count. dist(X),X) = | y Working of KNN Algorithm K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further means that the new data point will be assigned a value based on how closely it matches the points in the training set. We can understand its working with the help of following steps - Step 1 - For implementing any algorithm, we need dataset. So during the first step of KNN, we must load the training as well as test data. Step 2 - Next, we need to choose the value of K i.e. the nearest data points. K can be any integer. KNN Algorithm Step 3 - For each point in the test data do the following — 3.1 — Calculate the distance between test data and each row of training data with the help of any of the method namely: Euclidean, Manhattan or Hamming distance. The most commonly used method to calculate distance is Euclidean. 3.2 — Now, based on the distance value, sort them in ascending order. 3.3 - Next, it will choose the top K rows from the sorted array. 3.4 — Now, it will assign a class to the test point based on most frequent class of these rows. Step 4 - End “1 KNN Algorithm Example The following is an example to understand the concept of K and working of KNN algorithm - Suppose we have a dataset which can be plotted as follows — 100 KNN Algorithm Now, we need to classify new data point with black dot (at point 60,60) into blue or red class. We are assuming K = 3 i.e. it would find three nearest data points. It is shown in the next diagram — 200 bs » © » © © & We can see in the above diagram the three nearest neighbors of the data point with black dot. Among those three, two of them lies in Red class hence the black dot will also be assigned in red class. 1 Pros and Cons of KNN Pros It is very simple algorithm to understand and interpret. It is very useful for nonlinear data because there is no assumption about data in this algorithm. It is a versatile algorithm as we can use it for classification as well as regression. It has relatively high accuracy but there are much better supervised learning models than KNN. 1 Cons * Itis computationally a bit expensive algorithm because it stores all the training data. * High memory storage required as compared to other supervised learning algorithms. * Prediction is slow in case of big N. * Itis very sensitive to the scale of data as well as irrelevant features. 7 Applications of KNN The following are some of the areas in which KNN can be applied successfully — Banking System KNN can be used in banking system to predict weather an individual is fit for loan approval? Does that individual have the characteristics similar to the defaulters one? Calculating Credit Ratings KNN algorithms can be used to find an individual's credit rating by comparing with the persons having similar traits. Politics With the help of KNN algorithms, we can classify a potential voter into various classes like “Will Vote”, “Will not Vote’, “Will Vote to Party ‘Congress’, “Will Vote to Party ‘BJP’. Other areas in which KNN algorithm can be used are Speech Recognition, Handwriting Detection, Image Recognition and Video Recognition. GENETIC ALGORITHM VERSUS TRADITIONAL ALGORITHM GENETIC ALGORITHM TRADITIONAL ALGORI An algorithm for solving both constrained and unconstrained optimization Pcelevtenitme rt arconsyseiets ors) Genetics and Natural Selection An unambiguous specif tion that defines how to solve a problem Cot ecb crts Ms toMy ete T | solutions for difficult Percedsd Conta Toeoh ate hee tte oe Na ects) methodical procedure to Bev beem rceley font More advanced Not as advanced Used in fields such as Programming, Mathematics, etc. Used in fields such as research, Machine Sveti tattsW Cuttstoryl Intelligence Rass ny What Is the Genetic Algorithm? > The genetic algorithm is a method for solving both constrained and unconstrained optimization problems. > The genetic algorithm repeatedly modifies a population of individual solutions. > Ateach step, the genetic algorithm selects individuals at random from the current population to be parents and uses them to produce the children for the next generation. > Over successive generations, the population "evolves" toward an optimal solution. > Genetic algorithm is used to solve a variety of optimization problems that are not well suited for standard optimization algorithms, including problems in which the objective function is discontinuous, non differentiable, highly nonlinear. > The genetic algorithm can address problems of mixed integer programming, where some components are restricted to be “1 integer valued. The genetic algorithm uses three main types of rules at each step to create the next generation from the current population: ¢ Selection rules select the individuals, called parents, that contribute to the population at the next generation. ¢ Crossover rules combine two parents to form children for the next generation. ¢ Mutation rules apply random changes to individual parents to form children. The genetic algorithm differs from a classical, derivative-based, optimization algorithm in two main ways, as summarized in the following table. a“ Cluster Analysis Cluster is a group of objects that belongs to the same class. In other words, similar objects are grouped in one cluster and dissimilar objects are grouped in another cluster. What is Clustering? Clustering is the process of making a group of abstract objects into classes of similar objects. Points to Remember + A cluster of data objects can be treated as one group. * While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups. ¢ The main advantage of clustering over classification is that, it is adaptable to changes and helps single out useful features that distinguish different groups Applications of Cluster Analysis * Clustering analysis is broadly used in many applications such as market research, pattern recognition, data analysis, and image processing Clustering can also help marketers discover distinct groups in their customer base. And they can characterize their customer groups based on the purchasing patterns, « In the field of biology, it can be used to derive plant and animal taxonomies, categorize genes with similar functionalities and gain insight into structures inherent to populations. * Clustering also helps in identification of areas of similar land use in an earth observation database. It also helps in the identification of groups of houses ina city according to house type, value, and geographic location. * Clustering also helps in classifying documents on the web for information discovery. * Clustering is also used in outlier detection applications such as detection of credit card fraud. + Asa data mining function, cluster analysis serves as a tool to gain insight into the distribution of data to observe characteristics of each cluster. Requirements of Clustering in Data Mining The following points throw light on why clustering is required in data mining - « Scalability - We need highly scalable clustering algorithms to deal with large databases. + Ability to deal with different kinds of attributes - Algorithms should be capable to be applied on any kind of data such as interval-based (numerical) data, categorical, and binary data. * Discovery of clusters with attribute shape - The clustering algorithm should be capable of detecting clusters of arbitrary shape. They should not be bounded to only distance measures that tend to find spherical cluster of small sizes. « High dimensionality - The clustering algorithm should not only be able to handle low-dimensional data but also the high dimensional space. + Ability to deal with noisy data - Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters. « Interpretability - The clustering results should be interpretable, comprehensible, and usable Clustering Methods Clustering methods can be classified into the following categories — + Partitioning Method * Hierarchical Method + Density-based Method * Grid-Based Method Model-Based Method * Constraint-based Method Partitioning Method Suppose we are given a database of ‘n’ objects and the partitioning method constructs ‘k’ partition of data. Each partition will represent a cluster and k < n. It means that it will classify the data into k groups, which satisfy the following requirements - » Each group contains at least one object. « Each object must belong to exactly one group. Hierarchical Methods This method creates a hierarchical decomposition of the given set of data objects. We can classify hierarchical methods on the basis of how the hierarchical decomposition is formed. Divisive Approach This approach is also known as the top-down approach. In this, we start with all of the objects in the same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is down until each object in one cluster or the termination condition holds. This method is rigid, i.e., once a merging or splitting is done, it can never be undone.

You might also like