5th Lesson

You might also like

You are on page 1of 24
— Machine Learning 1. What is Machine Learning? Machine eel is a core sub-area of Artificial Intelligence (Al). Machine learning applications ‘eam from experience (well data) like humans without direct programmin When exposed to new data these applications learn, paetaa amas serio A lear, grow, change, and develop by themselves. In other words, with machine learning, computers find insightful information without being told where to look. Instead, they do this by leveraging algorithms that learn from data in an iterative process. _ At ahigh level, machine learning is the ability to adapt to new data independently and through iterations. Basically applications learn from previous computations and transactions and use “pattern recognition” to produce reliable and informed results. 2. Why Machine Learning? consider some instances where machine To better understand the uses of machine learning. insta i learning is applied: the self-driving Google car, cyber fraud detection, online recommendation engines from Facebook, Netflix ‘and Amazon. Machines can enable all of these things by filtering useful pieces of information and piecing them together based on patterns to get accurate Tesults, The process flow depicted here represents how machine leaning works: 51 2 [wien iol! inenioance 2.1 Machine Learning Proce’ jmiti Machine learning processes are a follows i = 1. Data Collection: The quantity and quality of your datn dictate HOw ccurate our model is 2 ™ ‘The outcome of this step Is generally a representation of data which we will use for training Using pre-collected data, by way of datasets from Kagele UCI, etc.» still fits into this step 2.2 % 2, Data Preparation: Wrangle data and prepare it for training Clean that which may require it (remove duplicates, correct errors, deal with missing values, normalization, data tyPe conversions, ete) Randomize data, which erases the effects of the particular order in which we collected and/or otherwise prepared our data Visualize data to help detect relevant relationships between variables or class imbalances (bias alert!) or perform other exploratory analysis Split into training and evaluation sets 3 Choose a Model: Different algorithms 2° for different tasks; choose the right One ; pa paras ite Som a correctly as often as possible Each iteration of process is a training step Evaluate the Model: Uses some metric or combination of metrics to “measure” objective performance ‘of model Test the model against previously unseen data ‘This unseen data is meant to be somewhat representative sf model performance in ther ‘world, but still helps tune the model Good train, depending on domain, data availability, dataset particulars, etc. Parameter Tuning: This step refers to hyperparameter tuning, which is an astform" & opposed to a science Tune model parameters for improved performance simple model h initialization ES ay *Leening whan 59 values and distrit, gn inelude: distribution le: mumber of trai 1, Make Predictions: 1s et, Mea pany withheld from the model een (8 8) data whi model; a better ay Peel (anc Which have, until this point, been 9.2 Different Types of Machine Learnin 9 si [Soperiesd] — [Urwiperiond]} [Rw Task driven mae (Regression/ = cio Algorithm learns to Classification) (cereal: ceaetioan Supervised Leaming In supervised learning, we are given a data set and already know what cur correct output should look like, having the idea that there is a relationship between the input andthe output Supervised learning as the name indicates the presence of a supervisor as a teacher. Basically supervised learning is a learning in which we teach or train the machine using data which is well labeled that means some data is already tagged with the correct answer. After that, the ‘machine is provided with a new set of examples(data) 80 that supervised learning algorithm analyses the training data(set of training examples) and produces a correct outcome from labeled data. For instance, suppose you are given an basket filled with step is to train the machine with all different fruits one by one like this: i. If shape of object js rounded and depression at top having color Red as Apple. ii, If shape of object is long a wr ep rt se onan SOMA vow goa fering cn so0 mT and asked to identify it Oe different kinds of fruits. Now the first then it will be labeled curving cylinder having color Green-Yellow then jt will be 6-4 / wien Anica Inveligence and this time have to use it from previous data ; \d color and would confirm the fruit name as tas the machine Jearns the things from training knowledge to test data(new fruit). Since the machine has already learned the things wisely. It will first classify the fruit with its shape an BANANA and put it in Banana category. Th data(basket containing fruits) and then apply the this data is called as training data. Machine learning takes data as input, ‘The training data includes both Inputs and Labels (Targets). For example, a = 5, b= 6 result =11 Inputs are 5, 6 Target is 11 5,6 Algorithm ;—>Logic 2,4 Predicting with new data Training with training data Figure 5.1 We first train the model with the lots of training data (inputs and targets) then with new data and the logic we got before we predict the output. This process is called Supervised Learning which is really fast and accurate. Supervised learning classified into two categories of algorithms: i Classification: A classifica; Machine t wa\ Problem is w 5 e” and “ and “no disease”, Variable is a category, such as els can b g it data with the best hyper-pl; © Used, the simplest is th jane 3 li plane which goes though he an linear regression, It tries to Y dependent variable | (Output) Se | °° 2 X-independent variable(input) Types of Regression Models Regression Models. “4 Giiee Multiple Linear Non Linear Non Linear For examples: Which of the following is a regression task? i, Predicting age of a person ii, Predicting nationality of a person price of a company will increase tot ument is related to sighting of UFOs? Solution: Predicting age of @ person (because it is a real value, ae 8 : categorical, whether stock price will increase is discrete-yes/no answer, predicting whether document is related to UFO is again discrete- yes/no answer). iii, Predicting whether stock morrow iv. Predicting whether a doc vere redicting nationality is 6 | whos Mihai iieligance Classification rus eacype ef peblen whore we proc te coger roo vs wns he EATS predict one of the value in a wet of values). separated into specific “classes” (example: we Some examples are i. This mail is spam or not? ii. Will itrain today or not? iii, Is this picture a cat or not? Basically “Yes/No’ type questions is called binary classification. Other examples are fF i. This mail is spam or important or promotion? _ ii. Is this picture a cat or a dog or a tiger? This type is called multi-class classification. Classification | | aissification algorithms in Nao Mactinearing wie Logistic Regression DAN: Decision Tree Random Forest Naive Bayes I hine Learning: i ie Unlike supervised learnin; 1B, No teacher i i nachin aera nena ae eo i peepee nese cornet oe Oe rence self. to find the hidden structure in unlabeled data by our- instance, as toa we have an image having both dogs and cats which have not seen ever. us the machine has no idea about the features of dogs and cat so we can’t categorize it in dogs and cats, But ifcan categorize them according to their similarities, patterns, and differences ie. we can easily categorize the above picture into two parts. First first may contain all pics having dogs in it and second part may contain all pics having cats in it. Here you didn’t learn anything before, means no training data or examples. The training data does not include targets here so we don’t tell the system where to go, the system has to understand itself from the data we give. Here training data is not structured (contains noisy data, unknown data and etc.) Example: A random article from different pages. Understand pattern in data itself Conclusioniresults instructured U unsupervised process Figure 5.3: 2 8-8 /wsion Aiicia! Inteligence or unsupervised learning like clustering and anomaly detection. ‘There are also different types f [ Clustering Clustering a type of problem where we group similar things together. This Bit similar to multi class classification but here we don’t provide the labels, the system understands from data itself and cluster the data. Some examples are i given news articles, cluster into different types of news. ii, given a set of images, cluster them into different objects. Estimated number of clusters: 3 Figure 5.4 why Clustering? ciustering is very much j Machine Learing whiea\ a Sm Tw een teri teria f Mes the intr citeia they may use Which OF & good clustering, Ir nent IO ntati Satisfy the: toring, It 6 the wetetas representatives for homogeneous ir Need, For [at depends on the user, what is the é Fa ; Agglomerative (bottom up approach) => Divisive (top down approach). examples CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies) etc. iii, : These methods partition partition forms one cluster. This method is used to optimize an similarity function such as when the distance js a major parameter ‘example K-means, CLARANS (' ‘Clustering Large Applications based upon randomized Search) ete. in. Gridebased Methods: In this method the data space in jd-like structure. All the clustering ‘operation done pee “ae a of data objects example STU ee Grid), (CLustering In Quest) ete. the objects into k clusters and each objective criterion wave cluster, CLIQUE ae 5-10/ sion Anicieinoigence Applications of Clustering in different fields: ‘ he Ay. Marketing: It can be used to characterize & discover customer segments for-marketing Me a sjes of plants and animals. ef Biology: It can be used for classification among different spe“ ants a 3. Libraries: It is used in clustering different books on the basis of topics and information. a <4e-snsurance: It is used to acknowledge the customers, their policies and identifying the é frauds. ~~ City Planning: It is used to make groups of houses and to study their Values based on their geographical locations and other factors present. 66 Barthquake studies: By learning the earthquake’ affected areas We CS determine te dangerous zones. Association method for discovering -interesting Association mule learning is a rule-based machine learning | relations between variables in large data. It is intended to identify, strong rules discovered in data ising some measures of interestingness. Based on the concept of strong rules, association rules, for discovering ‘regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkes are introduced. For example, the rule {onions, potatoes) => (burger) found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely'to also buy hamburger meat. Such information can be used as the basis for decisions about ‘marketing activities example, promotional pricing or product placements. ‘ 3. KNN The K nearest neighbour algorithm is a simple, easy to implement supervised machine leamits algorithm that an be used to solve both classification and regression problems. rest unit KN Medel Representation se model representation foy storing the entire dataset, so there j sno} pifcient implementations can jook-up and matching of new 3.1 Predictions with KNN KNN makes predictions using training dataset directly. Predictions are made is most similar lier new instance (x) by searching through the entire training set forthe K inset es" (the ‘neighbours: and suinmarizing the itput vaidble for those Z For regression, it is the mean output variable, in classification it is the mode (or most common) class value. To determine which of the K instances in the training dataset are most similar to a new input a distance ‘measure is ‘used. For real-valued input variables, the most popular distance measure is Euclidean distance. Euclidean distance is calculated as the square root of the sum of the squared differences between anew point (x) and an ‘existing point (xi) across all input attributes j. EuclideanDistance(x, xi) = sqrt( sum (x) - xij)‘2)) Other popular distance measures include Hamming Distance: Calculate the distance between binary vectors. een real vectors using the sum Manhattan Distance: ci absolute difference. Also ¢ Mink Distance: Generalization of ‘Buclide a oi if the input ‘variables are similar in ‘ype (crample, ee ure to use if the input E Jidean is a good distance me : sev Mae of sured widths and heights). Manhattan distance ee is : be type (such as age, gender height, ete. variables are not similar in of their alculate the distance betw alled City Block Distance, ‘an and Manhattan distance. 2 6:12 / chon Anne! niempence by algorithm tuning. ‘ the size of the training dataset. For very sample from the training dataset The valve for K can be found ct with Tee complexity of KNN increases 2a KNN can be mace stochastic by taking ® imilar instances. from which to calculate the K-most si KANN has been different disciplines have different names for it, for example: instances are used to make predictions. As Jearning or a case-based learning. i Instance-Based Learning: The raw training such KNN is often referred to as instance-based 10 learning of the model is required and all of the work happens at the & Lazy Learning: N F time a prediction is requested. Such, KNN is often referred to as @ lazy learning algorithm. KNW can be used for regression and classification problems. KNN for Regression When KNN is used for regression problems the prediction is based on the mean or the median of the K-most similar instances. KKNN for Classification When KNN is used for classification, the output can be calculated as the class with the highest ‘frequency from the K-most similar instances. Each instance in essence votes for their class and ‘the class with the most votes is taken as the prediction. (Class probabilities can be calculated as the normalized freq “ é juency of samples that belong to each lass in the set of K most similar instances for a new data instance. . For example, in a binary classification problem (class is 0 or 1); P(class=0) = count(class=0) / (count(class=0) + count(class=1)) 4. Random Forest Algorithm b general, the more trees in the forest tye random forest classifier, the hi * the hi the number of (tees in the forest gives the high 4.1 Basic Decision Tree Concept Decision tree concept is more to the rule based ol fester] es ek Plas wok Te ms rules can be used to Pets up with some set of rules. The. a — the prediction on the test dataset, Se you would like to predict that ighter will li leased movie or not. To model the decision copa cs ry ane ad will use the training dataset like the mated cartoon characters i i nt oa ee your daughter liked in the past movies. So once you pass the dataset with the eget as your d sughter will like the movie or not to the decision tree classifier. The decision tos will start building the rules with the characters your daughter like as nodes and the targets like or nn rules. ‘The simple rule could be if some x characteris playing the leading role then your daughie wil more rules based on this example. not? You just need to check the your daughter will like the newly like the movie. You can think of few Then to predict whether your daughter wil like the movie of rules which are created by the decision tree to predict whether released movie or not. gorithm, calculating these nodes and forming the rules will happen using the index calculations. instead of using information gain oF gil splitting the In decision tree al information gain and gini algorithm, finding the root node and ni index for calculating the In random forest feature node will happen root node, the process of randomly. “3° 9 | Sf medy Friend Friend2Mady Friend 4 | Friend 3 : Figure 6.5 Suppose Mady somehow got 2. weeks leave from his office, He wants to spend his 2 weeks by travelling to the different place. He also wants to go to the place he may like. So he decided to ask his best friend about the places he may like, Then his friend started asking about his past trips. It's just like his best friend will ask, you have been visited the X place did you like it? Based on the answer which’ are given by Mady, his best starts recommending the place Mady may like. Here his best formed the decision tree with the answers given by Mady. As his best friends asked some random questions and each one recommended one place to Mady. Now Mady considered the place which has high votes from his friends as the final place to visit. Applications of Random Forest Kv king: Inthe banking sector, random forest-algorithmis_widely used in two main © ~ applications. These are for finding the loyal customers and finding the fraud customers. uy Medicine: In medicine field, random forest algorithm is used to identify the correct = = ombination of the components to validate the medicine. Random forest algorithm is also ooo “pelpful for identifying the disease by analyzing the patient’s medical record. 3 sii Stock Market: In the stock market, random forest algorithm is used to identify the stock _ behavior as well as the expected loss or profit by purchasing the particular stock. E-commerce: 1h e-commerce, the random forest is used only in the small segment of the recommendation engine for identifying the likelyhood of the customer liking the recommended products based on the similar kinds of customers, svantoges of Random Forest Aigo Machine owning vba 8.48 i i. lewill handle the missing values, i, When we have more trees in the f forest, random forest Classifier won't overfit the model. 5. K-means for Clustering K-means. clustering, is another basi 5 cuseringidlge ett etvesn a technique often used in machine learning; K-means coe around since 1967. It was developed-by researcher named James Iis used on unl: i is a labelled mumenical data rather than data that is already defined making ita type of unsupervised learning. It is popular unsupervised learning technique due to its simplicity and efficiency. The k-means clustering algorithm assigns data points to categories, or clusters, by finding the mean distance between data points. It then iterates through this technique in order to perform more accurate classifications over time, Since you must first start by classifying your data into.k categories, it is essential that you understand your data well enough to do this. Pros i. Fast and efficient. ii. Works on unlabeled numerical data. iii, _ Iterative technique. Cons the context of your data well. i, Must understand ii, Have to choose your ili,, Lots of repetition. iv. Does not perform well when own k value. outliers are present. « 610/ whan Anita! ineligence 6.1 Steps to Creating a K-Means Model ‘There are three main steps when using the k-means clustering technique. Step 1: First, you need to chgose a value for k based off your data and bit acorn mber of categories, or clusters. If you are really what most sense, K is the number of categot A aon value to make k, it is best to try different values until you fi : i for your data set. Then compare the different models generated from different k values and choose the one that is most suitable. Step 2: Second, you need to create k clusters by assigning each data point to a nearby cluster. After that randomly choose clusters a centroid or the center-most point of a cluster which is generated based on the means of the data points in each cl Step 3:'Lastly, repeat the previous stéps until the convergence criterion is reached. Iteration takes place so that mean distances will continue to be generated until centroid values no longer change. 5.2 Apriori Algorithm The Apriori algorithm is a categorization algorithm. Some algorithms are used to create binary appraisals of information or find a regression relationship. Others are used to predict trends and patterns that are originally identified. Apriori is a basic machine learning algorithm which is used to sort information into categories. Sorting information can be incredibly helpful with any data management process. It ensures that data users are appraised of new information and can figure out the data that they are working with. Datasets for Apriorl Algorithm Apriori has a wide variety of applicable datasets, These dataset includes thousands of entries of either qualitative or quantitative data, Data is most often organized into some sort of database or table. H ganization i : . However, such ot 5 not absolutely neoessary forthe mache learning algorithm to do its work. The data may 6 ee ial neural network or another form of artificial intelligence. It must b® Present with guiding information such as timestamps or dates, Guiding information helps machine learning algorithm Process categories and find patterns. 5.3. HOw Apriorl Works active Learing wien 8.47 ; ‘ re-arran, the idea is implemented algorithm. A ‘frequent’ data eee eee Sr oe crt ote known as a support, © Is one that occurs above that pre-arranged amount, characteristic The characteristics that are frequent can then be analyzed io point oolneeererat nn lyzed and place into pairs. This process helps i *lationships between relevant data points. Other f od pla elena se ee . Other forms of data can be pruned ws aa eee = . Pruning helps to further differentiate between categories oe epee reach the overall support amount. Next, the data set can be analyzed by oe plets. These triplets show even greater frequency. Analysis can detect more and more relations throughout the body of data until the algorithm has exhausted all of the possible. 5.4 Apriori Algorithm Learning Types Support Vector Machine SVM is supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is the number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two classes very well. 2 5-18/wiien Arica! Intetigence jector Machine 4 vidual observation. Support V 8 of i Jasses (hyyper-plane/ line) “Support Vectors are simply the co-ordinate: sgregates the two cl is a frontier which best s How does It works? seen above, we got accustomed to the process of SeBreBsing ihe two classes with a byper- ‘How to identify the right hyper-plane?’ 1): Here, we have three hyper-planes (A, B and ©). and circle. AS plane. The question is i. Identify the hyper-plane (Scenario- Now, identify the right hyper-plane to classify star Figure 5.7 You need to remember a thumb rule to identify the right hyper-plane: “Select the hypet- plane which segregates the two classes better”. In_ this scenario, hyper-plane “B” has excellently performed this job. fi. Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B and C) and all are segregating the classes well. Now, How can we identify the right hyper-plane? Figure 5.8 Above, » YOU can see that the ‘mate rs B. Hence, we name stg fer bnecouade' & uae 7 cor _hyper-plane with higher yperplane as C. Another aay to both A margin then there is a high margin is robustness. If we select jon for selecting, chance of mis-classification — Scenario-3 Figure 5.9 Some of you may have selected the hyper-plane B as it has higher margin compared to A. But, here is the catch SVM selects the hyper-plane which classifies the classes ing margin. Here, hyper-plane Bhasa classification error and accurately prior to maximizing A has classified all correctly. Therefore, the right hyper-plane isA. 820/ake te Seemartosts Can we chassity (we clenees snchom, tense to engage tan two chases SHE eens te, a8 one CE Hay tatoo toetnacy of ether (cecte) clase on an ontle® ae * * x * e.e° * Figure 5.10 sum em ut en citer cos Is ibs on outer for sar clas. SVM haw 9 SSS SS cers an find dhe hyperplane that bas maximum margin. Hence, we cas Sis SVMS robust to outliers. . Figure 6.11 vi inet © sere © cones Goeie tsb te oor a = - yper-plane between the two classes, so how does SVM classify the SVM can solve this problem by i iti y adit i ; ss ptdeeiomenanale ing additional feature. ic. z= x2 + y®2. Now lt whoa 21 Figure 5.12 In the above plot, points to consider are: i, _ All values for z would be positi a Positive always because z is the squared sum of both x it a : : the original plot, circles appear close to the origin of x and y axes, leading to lower value of z and star relatively away from the origin results to higher value of z. 6. Reinforcement Learning jitable action to maximize reward in a particular Reinforcement learning is about taking sui situation. It is employed by various software and machines to find best possible behavior or path. It should take in a specific situation. It differs from supervised learning in a way that in supervised Jearning the training data has answer key with it, So the model is trained with correct answer itself whereas in reinforcement learning, but the reinforcement agent decides what to do to perform the given task. In bound to learn from its experience. sinforcement learning should be an initial state fro there is no answer the absence of training dataset it is Main points in re i, Input: The input m which the model will start. ————— ll iti 5:22 / when Avion! netgence ions to 9 particu tk Output: There are many possible outpuls a there are a variety of solv Nar problem, ii, Training: The training is ba will decide to reward or punish wed upon_the input, the model will return a state and the user the model based on its output: ‘The mode! continues to leam. The best solution is decided based on the maxi Types of Reinforcement There are two types of Reinforcement: Positive: Positive reinforcement is defined as when an event ootss due to a particular dehavior, increases the strength and the frequency of the behavior. In other words, it has a positive effect on the behavior. Advantages of reinforcement learning are © Maximizes the performance. Sustain change for a long period of time. mum reward. i Disadvantages of reinforcement learning Too much Reinforcement can lead to overload of states which can diminish the results Negative: Negative Reinforcement is defined as strengthening of a behavior because @ negative condition is stopped or avoided. Advantages of reinforcement learning: * Increases behavior. Provides defiance to minimum standard of performance. . Disadvantages of reinforcement learning It only provides enough to meet up the minimum behavior. Various practical applications of Reinforcement Learning iH RL can be used in robotics for industrial automation. ii, RL can be used in machine learning and data processing. Differentiate between sup supervis [Differentiation] Ont] Sopenieeg eer Dperviaeg nforcement learning Prectceunl Tre peering | < | The trainir intorcemart Learn "| ining ‘set has a Nariogemen Coates | _— both Predictors. and | ore, taining set Tas | can ost Aigorifime ——} Predictions, "84 | Only pedro \ sr teaieasn asec | a foauts on ary task Topression, usanng | Gianna wai-acion —| veeenien. Suppo | algorithm and | eating, San Oo NERY Tachi, and | iperoy | eg iia Nave Bayes. reduction algorithms | Networ DON) | speech (eecanion, | Preprocessing data, —~| Warehouses, ivan | fereeaaingeceenten| pre-taing supervised | manapenert ever, . ig leaning alanis, | managemer, power sytem, | Differentiate between parametric and non-parametric models. Differentiation Based on. Parametric Model predict new data. | financial systems, etc \ Non-parametric Model | Features A finite number of parameters to | Unbounded numberof parsers Algorithm Logistic regression, _ linear | near and Naive Bayes. vector discriminant analysis, perceptron, | like CART and C4.5, and support | esi neighbors, decision wees | machines \ Benefits Simple, fast, and less data. Flexibil ity, power, and perlormance | and poor fit. Limitations Zonstrained, limited complexity, | More data, slower, and overfiting Unsupervised a Kemeans Clustering a ‘Supervised pe lustering algorithms. Classi regression of the ki eu. & zation algorithms. or - ive training model ‘Minimal training model Exhaustive training

You might also like