0% found this document useful (0 votes)
67 views95 pages

MLT Complete

Uploaded by

falana3199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
67 views95 pages

MLT Complete

Uploaded by

falana3199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
MACHINE LEARN! NOTES MOST IMPORTANT QUESTIONS OF MACHINE LEAR! -ENGINEER BEING iG AKTU MODULE 1 PART-I Learning is the process of acquiring new understanding, knowledge, beliaviors, skills, values, attitudes, and preferences. Learning is any process by which system improves its performance from experience. Ques2. What is Machine Learning? 2020-21 2M Ans. Machine leaning (ML) is defined as a discipline of artificial intelligence (AT) that provides machines the ability to automatically lea from data and past experiences to identify patterns and make predictions with minimal human intervention. “Machine learning enables a machine to automatically lean from data, improve performance from experiences, and predict things without being explicitly programmed”. ‘Ques3.Difference between ML, AI, Deep Learning? 2020-21 2M Artificial Intelligenes Lis the broadest concept of all, and gives a machine the ability to imitate human Rit poe behaviour. J orcitisqenee Machine Learning: Machine Learning uses 7 Mecine. lem. algorithms and techniques that enable the machines { to learn from past experience/trends and predict the { (Dep output based on that data, their performance improve. \ earring as they are exposed to more data over time Deep Learning: subset of machine learning in which multilayered neural networks learn from > vast amounts of data. ‘The main difference between machine learning and deep learning technologies is of presentation of data. Machine learning uses structured/unstructured data for learning, while deep learning uses neural networks for learning models, applications of ML? ‘Ans. Machine learning is important because it gives enterprises a view of trends in customer behavior and business operational patterns, as well as supports the development of new products. Many of today's leading companies, such as Facebook, Google and Uber, make machine learning a central part of their operations; Machine learning has become a significant competitive differcutiator for many, companies. Applications of ML: 1. Image recognition: a. Image recognition is the process of identifying and detecting an object or a feature in a digital image or video. b. This is used in many applications like systems for factory automation, toll booth monitoring, and security surveillance. 2. Speech recognition : a. Speech Recognition (SR) is the translation of spoken words into text. b. It is also known as Automatic Speech Recognition (ASR), computer speech recognition, or Speech To Text (ST). c. In speech recognition, a sottware application recognizes spoken words. 3.Product recommendation Machine learning is widely used by various e-commerce and entertainment companies such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some product on Amazon, then we started getting an advertisement for the same product while internet surfing on the same browser and this is because of machine learning. 4, Email Spam and Malware Filtering: Whenever we receive a new email, itis filtered automatically as important; normaly and spam, We always receive au important mail in ou inbox with the important symbol and spam emails in our spam box, and the technology behind this is Machine-leamning- 5. Stock Market trading: Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up and downs in shares, so for this machine*Ieamning’s long short term memory neural networkis used for the prediction of stock market trends Types of Machine Learning: © Supervised Learning © Unsupervised Learning * Reinforcement Learning Supervised learning is the types of machine learning in which machines are trained using well “labelled” training data, and on basis of that data, machines predict the output. The labelled data means some input data is already tagged with the correct output. Ex: Risk Assessment, Image classification, Fraud Detection, spam filtering, etc. Types of Supervised learning © Classifications) classification problem is when the Output variable is a category, such as “red” or “blue” “disease” and “no disease”, Yes-No, MaleFemale, True-false, etc. © ii, Regression: A regression problem is when the output variable is a real value, such as, Forecasting sales, Weather forecasting, ete. Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on that data without any supervision The goal of unsupervised learning is to find the underlying structure of dataset, group that data according to similarities, and represent that dataset in a compressed format. + The output is dependent upon the coded algorithms. a ob Rag 200 + Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. ‘© "Association: Association rule learning is a kind of unsupervised learning technique that tests for the reliance of one data element on another data element and design appropriately so that it can be more cost- effective. It tries to discover some interesting relations or associations between the variables of the dataset. Semi Supervised learning is between the supervised and unsupervised learning families. The semi-supervised models use both labeled and unlabeled data for training Reinforcement Learning is a feedback-based Machine learning technique in which an agent leams to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback. and for each bad action. the agent gets negative feedback or penalty. ‘The main elements ofan RL system are: + The agent or the learner +The environment the agent interacts |= Renate aed with = i J Net +The policy that the agent follows to take actions Dig Ko +The reward signal that the agent observes upon taking action GENETICATGORITHM TRANITIONAT AT GORTTHM. A genetic algorithm isa search-based _| Traditional Algorithms refers algorithm used for solving optimization | to general algorithms we use to solve problems in machine learning. problems. It is a methodical procedure to solve a given problem. There can be several algorithms to solve a problem. More Advanced Not as Advanced Used in ML, AT Used in Programming, Math, 1) Process Complexity of Machine Learning The machine leaning process is very complex, which is also another major issue faced by machine leaming engineers and data scientists. There is the majority of hits and trial experiments; hence the probability of error 1s higher than expected. Further, it also includes analyzing the data, removing data bias, training data, applying complex mathematical calculations, etc., making the procedure more complicated and quite tedious. 2) Getting bad recommendations A machine learning model operates under a specific context which tesiilts in bad recommendations and concept drift in the model. Suppose at a specific \time customer is looking for some gadgets, but now customer requirement changed over time but still machine leaming. model showing same recommendations to the customer while customer expectation has been changed. This incident is called a Data Drift. However, we can overcome this by regularly updating and monitoring data according to the expectations. 3) Overfitting and Underfitting Overfitting: Overfitting is one of the most common issues faced by Machine Learning engineers and data scientists, Whenever a machine learning model is trained with a huge amount Of data, it starts capturing noise and inaccurate data into the training data set. Underfitting: Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and destroys the accuracy of the machine learning model. 4) Inadequate Traitiing Data The major issue that comes While Using machine leaning algorithms is the lack of quality as well as quantity of data. Although data plays a vital role in the processing of machine leaming algorithms, many data scientists claim that inadequate data, noisy data, and unclean data are extremely exhausting the machine learning algorithms. For example, a simple task requires thousands of sample data, and an advanced task such as speech or image recognition needs millions of sample data examples. Further, data quality is also important for the algorithms to work ideally, but the absence of data qnality is alsa found in Machine I earning applications 5) Monitoring and maintenance As we know that generalized output data is mandatory for any machine learning model. Hence, regular monitoring and maintenance become compulsory for the same. Different results for different actions require data change; hence editing of codes as well as resources for monitoring them also become necessary; Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving, Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have ‘multiple branches, whereas Leaf nodes are’ the Output! Of those decisions and do not contain any further branches. © In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. representation : You Tube ENGINEER BEING Machine Leaming ANN Clustering Reinforcement Leat Decision Tree Leamin Bayesian Networks SVM (Support Vector Mac Genetic Algorithms 2020-21 10M The term "Artificial Neural Network" is derived from Biological neural networks — that develop cS oe a a pate brain. Similar to the human brain that has 0 neurons . These + The architec Hon cir aewtas Biological Network ificial Ne letwork Si Dent ‘Synapse iterconnect ‘Axon Output In a neural network, there are three essential layers — Input Layers Tupe The inpns layer is the first layer of an ANN that receives the input information in the form of various texts, numbers, audio files, image pixels, etc. idden Havers In the middle ofthe ANN model are the hidden layers. There can bea single Output Layer In the ouput layer, we obtain the result that we obtain through rigorous computations performed by the middle layer: Artificial Neural Networks Application problems to apply: Following are the important Artificial Neural Networks applications Handwritten Character Recognition ANNS are used for handwritten character recognition. Neural Networks are trained to recognize the handwritten characters which ean be in the form of letters or digits Facial Recognition In order to recognize the faces based on the identity of the person, we make use of neural networks. They are most commonly used in areas where the users require security access. Speech Recognition ANNs play an important role in speech recognition. The earlier models of Speéch Recognition were based on statistical models like Hidden Markov Models. With the advent of deep learning. various types of neural networks are the absolute choice for obtaining an accurate classificetion. 2020-21 10M (UNIT 2) SVM or Support Vector Machine isa linear model for classificationsand regression problems. It can solve linear and non-linear problems and work well for many practical problems: ccording to the SVM algorithm we find the points closest to the line from both the classes. These points are called support vectors. we compute the distance between the line and the support vectors. This distance is called the margin. Our goal is to maximize the margin. The hyperplane for which the margin is maximum is the optimal hyperplane. Thus SVM tries to make a decision boundary in such a way that the separation between the two classes is as wide as possible. 202 1 10M Clustering © Away of grouping the data points into different clusters, consisting of similar data points, The objects with the possible similarities remain in a group that has less or no similarities with another group." © Itis an unsupervised learning method, hence no supervision is provided to the algorithm, and it deals with the unlabeled dataset. * After applying this clustering technique, each cluster or group is ptovided with a cluster-ID. ML system can use this id to simplify the processing of large and complex datasets. © The clustering technique is commonly used for statistical data analysis. Example = Clustering technique with the real-world example of Mall: When we visit any shopping mall, we can observe that the things with similar usage are grouped together. Such as the t-shirts are grouped in one section, and trousers are at other sections, similarly, at vegetable sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so that we can easily find out the things. The clustering technique also works in the same way. Classification and Regression Regression and Classification algorithms are Supervised Learning algorithms. Both the algorithms are used for prediction in Machine learning and work with the labeled datasets. But the difference between both is how they are used for different machine learning problems. Classification Regression Classification algorithms are used to predict/Classify the discrete values such as Male or Female, True or False, Spam or Not Spam, ete. Regression algorithms are used to predict the continuous values such as price, salary, age, etc. The task of the classification algorithm is to map the input value(x) with the discrete output variable(y). The task of the regression algorithm is to map the input value (x) with the continuous output variable(y). Classification Algorithms are used with discrete data. Regression Algorithms are used with continuous data. The Classification algorithms can be divided into Binary Classifier and Mulli-class Classifier. The regression Algorithm can be further divided into Linear and ‘Non-linear Regression. Classification Algorithms can be used to solve classification problems 'suchas Identification of spam emails, Speech Recognition, Identification of cancer.cells, etc. Inj Email Spam Detection, the model is trained on the basis of millions of emails on different parameters) and whenever it receives a new email, it identifies whether the'email is spam/ornot. If the email is spam, then it is moved to the Spamifolder Regression algorithms can be used to solve the regression problems such as Weather Prediction. House price prediction, etc. Suppose we want to do weather forecasting, so-for thisywerwill use the Regression algorithm. In weather prediction, the model is frainedon the past datay and/once the training is completed, it can easily predict the weather for future days. 2021-22 2M A learning problem is said to be well defined if it has three features: the class of tasks, the measure of performance to be improved, and the source of experience Ex: A checkers learning problem ~Task T: playing checkers —Performance measure P: percent of games won against opponents Trait g experience B. playing practice games against itsell "Data Science is a field of deep study of data that includes extracting useful insights from the data, and processing that information using different tools, statistical models, and Machine learning élgorithms.” Machine Leaning allows the computers to learn from the past expericnees by its own, it uses statistical methods to improve the performance and predict the output without being explicitly programmed. Or Design the final design of Checkers Learning Program 2021-22 10M Learning is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes, and preferences. Learning is any process by which a system improves its performance from experience. Designing a Learning System in Machine Learning: Step 1) Choosing the Training Experience: The very important and first task is to choose the training data or training experience which will be fed to the Machine Learning Algorithm. Three attributes are used: 1. Whether the training experience provides direct or indirect feedback regarding the choices made by the performance system. 2. Direct training examples in léaming to play checkers consist of individual checkers board states and the correct move for each. 3. Indirect training examples in the same game consist of the move sequences and final outcomes of various games played in which information about the correctness of specific moves early in the game must be inferred indirectly from the fact that the game was eventually won or lost ~credit assignment problem. 2. The degree to which the learner controls the sequence of training examples. Example: ~The learner might rely on the teacher to select informative board states and to provide the correct move for each ~The learner might itself propose board states that it finds particularly confusing end ask the teacher for the correct move, - Or the learner may have complete control over the board states and (indirect) classifications, as it does when it learns by playing against itself with no teacher present. 3.The representation of the distribution of samples across which performance will be tested is.the third crucial attribute. This basically means the more diverse the set of training experience can be the better the performance can get. Example: If the training experience in play checkers consists only of games played against itself, the leamer might never encounter certain crucial board states that are very likely to be played by the human checker’s champion. Step 2- Choosing target function: ‘To determine what type of knowledge will be learned and how this will be used by the performance program. Example: ~In play checkers, it needs to learn to choose thé best move among those legal moves. Step 3- Choosing Representation for Target function: Once done with choosing the target function now we have to choose a representation of this target function, When the machine algorithm has a complete list of all permitted movements, it may pick the best one using any format, such as linear equations, hierarchical graph representation, tabular form, and so on. Out of these moves, the NextMove function will move the Target move, which will increase the success rate. For example; if achess machine has four alternative moves, the computer will select the most optimal move that will ead to victory Step 4- Choosing Function Approximation Algorithm: In this step, we choose a learning algorithm that can approximate the target function chosen. This step further consists of two sub-steps: a. Estimating the training value, and b. Adjusting the weights. Tew i Cael (eT nes Cerenco. Hap otests Probie Adlitien | -C caibe Exorp les Trace Cort Wisronp) The final design consists of four modules, as described in the picture. 1. The performance system: The performance system solves the given performance task. ENGINEER BEING MOST IMPORTANT QUESTIONS MACHINE LEARNI AKTU iIGINEER BEING MODULE 2 PART-I a 2020-21 10M Or Discuss Support vectors im SVM. 2020-21 2M Or iia ey 2020-21 10M SVM or Support Vector Machine is a linear model for classification and regression problems, It can solve linear and non-linear problems and work well for many practical problems. It tries to classify data by a hyperplane that maximizes the margin between the classes in the training data. Hence, SVM is an example of a large margin classifier. ‘The idea of SVM is simple: The algorithm creates a line or a hyperplane which separates the data into classes According to the SVM algorithm we find the points closest to the line from both the classes. These points are called support vectors. we compute the distance between the line and the support vectors. This distance is called the margin. Our goal is to maximize the margin. The hyperplane for which the margin is maximum is the optimal hyperplane. Thus SVM tries to make a decision boundary in such a way that the separation between the two classes is as wide as possible. SVM KERNELS. * SVM can work well in non-linear data cases using kernel trick. * The function of the kernel trick is to map the low-dimensional input space and transforms into a higher dimensional space. + In simple words, kemel converts non-separable problems into separable problems by adding more dimensions to it. + It makes SVM more powerful, flexible and accurate. Dennen erent S oe THREE TYPES OF KERNEL 1)Linear Kernel: A linear kernel can be used as normal dot product ofany two given observations, The equations for the kernel function: K(x, xi)=sum(x« xi) 2)Polynomial kernel: It is more generalized form of linear kernel and distinguish curved or nonlinear input space. Itis popular in image processing. Following is the formula for polynomial kernel — k(X, Xi)=1+sum(X« Xi)*d , d is the degree of the polynomial 3)Gaussian Radial Basis Function (RBF) Kernel: RBE kernel, mostly used in SVM classification, maps input space in indefinite dimensional space: It is a general-purpose kernel; used when there is no prior knowledge about the data Following formula explains it mathematically : K(x, xi)-exp(-gamma * sum(x-xi"2)) Gamma funetion: 1/20? APPLICATIONS OF KERNEL * Face detection — SVM classify parts of the image as a face and non-face and create a square boundary around the face. ¢ Handwriting recognition — We use SVMs to recognize handwritten characters used widely. ¢ Texture Classification using SVM- In this SVM application, we use the images of certain textures and use that data to classify whether the surface is smooth or not. + Stenography Detection in Digital Images Using SVM, we can find out if an image is pure or adulterated. This could be used in security-based organizations to uncover secret messages. Yes, we can encrypt messages in high-resolution images. In high-resolution images, there are more pixels, hence, the message is more hard torfind. We can segregate the pixels and store in data‘in various datasets. We can analyze those datesets using SVM. PROPERTIES OF SYM: 1. Flexibility:in-choosing-a similarity function 2. Sparseness of solution when dealing with large data sets- only support vectors are used to specify the separating hyperplane Ability to handle large feature spaces- complexity does not depend on the dimensionality of the feature space 4. Overfitting can be controlled by soft margin approach (we let some data points enter our margin intentionally) s. A simple convex optimization problem which is guaranteed to converge to a single global solution, DISADVANTAGES OF SVM: 1. SVM algorithm is not suitable for large data sets because the required training time is higher. 2. SVM does not perform very well when the data set has more noise ic. target classes are overlapping. 3. In cases where the number of features for each data point exceeds the number of training data samples, the SVM will underperform, 4, SVMs with the ‘wrong’ kernel - For SVMs nowadays, choosing the right kernel function is key. As an example, using the linear kemel when the data are not linearly separable results in the algorithm performing poorly. 2020-21 2M Regression is asupervised learning technique which helps in finding the correlation between variables and enables us to predict the continuous output variable based on the one or more predictor variables. It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship between variables Some examples of regression can be as: © Prediction of rain using temperature and other factors © Determining Market trends Prediction of road accidents due to rash driving. Tt is used to find the trends in data. By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors. Linear Regression Logistic Regression Linear Regression is a supervised regression model. Logistic Regression is a supervised classification model. In Linear Regression, we predict the value by an integer number. In Logistic Regression, we predict the value by 1 or 0. It is based on the estimation. Teast square It is based on maximum likelihood estimation. Here when we plot the training datasets, a straight line can be drawn that touches maximum plots. ‘Any change in the coefficient leads toa change in both the direction and the steepness of the logistic function. It means positive slopes result in an S- shaped curve and negative slopes result in a Z-shaped curve. Linear regréssion is used to estimate the dependent variable in case of a change in independent yariables..For example, predict the price of houses. Whereas logistic regression is used to calculate the probability of an event. For.cxample, classify..if tissue is benign or malignant. MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU -ENGINEER BEING MODULE 2 PART-II 2020-21 2M The vertices and edges in Bayesian Network have some sort of meaning, The network building itself gives you important information about the subject dependence between the variables) With Neural Networks the network structure does not tell you anything like Bayesian Network. Similarity in ANN and Bayesian Network is that they both uses directed graphs. Input ddan Layer Output Outpt See 2 aaa Sean 2021-22 10M Or 2021-22 10M Bayes theorem is one of the most popular machine learning concepts that helps to calculate the probability of occurring one event with uncertain knowledge while other one has already occurred. Bayes Theorem is a way of finding a probability when we know certain other posisbilities. P(X[Y) = P(VIX).P(X) PCY), Which tells us: how often X happens given that Y happens, written P(X/Y), When we know: how ofter Y happens given that X happens, written P(Y/X) and how likely X is on its own, written P(X) and how likely Y is on its own, written P(Y) The above equation is called’as Bayes Rule or Bayes Theorem: o™P(X{Y) is called as posterior, which we need to’ealculate!"It is defined as updated probability after considering the evidence. P (Y|X) is called the likelihood. It is the probability of evidence when hypothesis is true. c P(X) is called the prior probability, probability of hypothesis before considering the evidence © P(Y) is called marginal probability. It is defined as the probability of evidence under any consideration. Hence, Bayes Theorem can be written as posterior = likelihood * prior / evidence EXAMPLE: Dangerous fires are rare (1%) But smoke is fairly common (10%) due to barbecues. And 90% of dangerous fires make smoke We can then discover the probability of dangerous fire when there is no smoke: P(Fire/Smoke) = P(Fire) P(Smoke/Firey/P(Smoke) =(1% * 90% )/ 10% =9% Naive Bayes Classifier Algorithm Naive Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems. It is mainly used in ‘ext classification that includes a. high-dimensional training dataset. It is a probabilistic classifier, which means it predicts on the basis of the probability of an object Some popular examples of Natve Bayes Algoridn are spam filtration, Sentimental analysis, and classifying articles. The distinction between Bayes theorem and Naive Bayes is that Naive Bayes assumes conditional independence where Bayes theorem does not. This means the relationship between all input features are independent Working of Naive Bayes! Classifier: Working of Naive Bayes! Classifier can be understood with the help of the below example: Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide that whether we should play or not on a particular day according ‘o the weather conditions. So to solve this problem, we need to follow the below steps: 1. Convert the given dataset into frequency tables. 2. Generate Likelihood table by finding the probabilities of given features. 3. Now, use Bayes theorem to calculate the posterior probability. Problem: If the weather is sunny, then the Player should play or not? | Outlook a Play 0 Rainy Yes 1 Sunny Yes 2 Overcast Yes 3 ‘Overcast Yes 4 Sunny No 5 Rainy Yes 6 Sunny Yes 7 Overcast Yes 8 Rainy No 9 Sunny No 10 Sunny Yes 1 Rainy No 12 Overcast Yes 13 a 40a eS Likelihood Table: Frequency Table: Weather Yes No. Overcast Rainy Sunny 3 Total 10 ‘Applying Bayes Weather No es Overcast 0 Rainy B, Sunny Zz 3 All 4/14=0.29 10/14=0.71 P(¥es | Sunny)— PSunny | Yes)*P(Ves)/P(Sunny) P(Sunny | Yes)=3/10=03 P(Sunny)=0.35 P(Yes)-0.71 So P(Yes | P(No | Su P(Sunny | NO) = 2/40: .71/0.35= 0. (0)*P(No) nny) P(No)= 0.29 P(Sunny)= 0.35 AD So P(No | Sunny)= 0.5*0.29/0.35 = 0.41 So as we can see from the above calculation that P(Yes | Sunny)>P(No | Sunny) Hence on a Sunny day, Player can play the game. Ques 3) what problem docs EM algorithm solves? 10M 2021-22 Or what are task of E-stepsin EM Algorithm? 2M 2020-21 The Expectation-Maximization (EM) algorithm is defined as the combination of various unsupervised machine learning algorithms, which is used to determine the local maximum likelihood estimates (MLE) or maximum a posteriori estimates (MAP) for unobservable variables in statistical models. it is a technique to find maximum likelihood estimation when the latent variables are present. It is also referred to as the latent variable model. A latent variable model consists of both observable and unobservable variables where observable can be predicted while unobserved are inferred from the observed variable. These unobservable variables are known as latent variables Steps in EM Algorithm The EM algorithmyis completed)mainly in™4"steps, which include Initialization ‘Step, Expectation Step, Maximization Step, and convergence Step. G2D—> Initiar Values [fe Eee | Monson wetion Sep 1S naan Sapp 1* Step: The very first step is to initialize the parameter values. Further, the system _is provided with incomplete observed data with the assumption that data is obtained from a specific model. 2" Step: This step is known as Expectation or E-Step, which is used to estimate or guess the values of the missing or incomplete data using the observed data. Further. E-step primarily updates the variables. 3" Step: This step is known as Maximization or M-step, where we_use complete data obtained from the 2™ step to update the parameter values. Further, M-step primarily updates the hypothesis. 4" step: The last step is to check if the values of latent variables are Converging or not. If.it-gets "yes", then stop the” process "else, repeat the process from step 2 until the convergence occurs. MOST IMPORTANT QUESTIONS MACHINE LEAR! AKTU -ENGINEER BEING MODULE 3 PART-I If we depend too much on the training data while drawing the decision tree, there is a possibility that the tree will go into overfitting. That is, a particular hypothesis will work good on the training data, but it doesn’t work good on Testing or the real world data So such tree is calledas a overfitting. Underfitting occurs when our machine learning model is not able to capture the underlying trend of the data. In the case of underfitting, the model is not able to learn enough from the training data, and hence it reduces the accuracy and produces unreliable predictions. a. overfitting the data b. handling continuous valued attribute c. handling missing attribute values d. handling attributes with different costs Ans: a. overfitting the data If we depend too much on the training data while drawing the decision tree, there is a possibility that the tree will go into overfitting. That is, a particular hypothesis will work good on the training data, but it doesn’t work good on Testing or the real world data So such tree is called as a overfitting, This particular overfitting can be addressed with the two techniques reduced error pruning post rule pruning. The decision tree works well with the problems where we have fixed number of attributes and the discrete number of possibilities for each attributes. Ifa particular attribute has the continuous values, then we cannot apply the decision tree directly. First, we need to convert those particular attributes which are having continuous values into a discrete possibilities. Then only we can apply decision tree learning. if you have some missing attributes, we need to fill those particular missing attributes with a proper values then only we can use this learning. Let us Say that a particular attribute is not having a value, we need to find some value or fill it with the proper value Whenever we apply decision tree algorithm, each and every attribute in the given equal importance. But sometimes what happens is @ given problem definition, there is a possibility that a particular attribute may have more importance or it is given more weightage, In such case we cannot use the core decision tree learning. We need to handle this particular issue with some sort of calculation. | Ca-4)+ (4-4 ir r (3-37 + G ie ay 2 - ee GAMES Wplis se | BEING il | | un ts. hit | ENGINEER BEING SEGEFvesRavaR ly In Decision Tree the majo root node in each level. This popular attribute selection mea: 1. Information Gain 2. Gini Index Information Gain When we use a node in a decision tree to partition the training instances into smaller subsets the entropy changes. Information gain is a mcasure of this change inentropy Gain(S.A)= Entropy(S) ~ y-vatuents) Sh.-Entropy (Se) Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) BEING esse] Teli ls] 222299929299 29 pajass}aesi}}a| gan BABSBSAREERE NG ot BEN iseassscere ENGINEER BEL ISSRABABABLEREA acess assageeeg| ge22eeeee228 Se gF920 77122 sessTs) PFE EEE RE! pppasagays ai}ipals BRB ASRABRAEE 3 | BRBASRRERAR EERE MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU -ENGINEER BEING MODULE 3 PART-II different from radial basis function network? 2020-21 10M Ans: Instance-based learning refers to a family of techniques for classification and regression, which produce a class label/predication based on the similarity of the query to its nearest neighbor(s) in the training set. Some of the instance-based learning algorithms are : 1. K Nearest Neighbor (KN) 2. Locally Weighted Learning (LWL) 3. Case-Based Reasoning Locally weighted regression Locally weighted linear regression is a non-parametric algorithm, that is, the model does not learn a fixed set of parameters as is done in ordinary linear regression. Rather parameters are computed individually for each query point, Locally weighted regression (LWR) is a memory-based method that performs a regression around a point of interest using only training data that are “ocal" to that point. Locally weighted linear regression is'a supervised learning algorithm. There exists No training phase. All the work is done during the testing phase/while making predictions. Locally weighted regression methods are a generalization of k-Nearest Neighbour: * In Locally constructed Radial basis funct RBF network on layer, and an outpt Input Layer ‘The input layer simply feeds the data to the hidden layers. As d FSSUlE, thé RuMber of neurons in the input layer should be equal to the dimensionality of the data. Hidden Layer ematically Output Layer The output layer uses a linear activatio regression tasks. In general, the case-based reasoning process entails: Retrieve- Gathering from memory an experience elosest to the current problem. 2. Reuse- Suggesting a solution based on the experience and adapting it to mect the demands of the new situation. 3. Revise- Evaluating the use of the solution in the new context. 4, Retain- Storing this new problem-solving method in the memory system A CADET system employs case based reasoning to assist in the conceptual design of simple mechanical devices such as water faucets. It uses a library containing approximately 75 previous designs and design fragments of two suggest conceptual designs to meet the specifications of new design problem 4 aT T+ terpet Fo, © 81 1, [) 8 entiiow + at lt Sat = r 1S Qt igs ©. The function is represented in terms of qualitative relationships among the water flow levels and temperatures at its inputs and outputs; ° In the functional description, an arrow with a “+” labeled indicates that the variable at the arrow head increases with the variable at its tail. A “-” label indicates that the variable at the head decreases with the variable at the tail. o Here Qe refers to the flow of cold water from the into the faucet, Qh to the input flow of hot water, and Qm to the single mixed flow out of the faucet. o Te, Th, Tm refers to the temperature of the cold water, hot water and mixed water respectively. ° The variable Ct denotes the control signal for temperature that is input to the faucet and Cf denote the control signal for water flow. ° The control Ct and Cfare to influence the water flow Qc and Qh, thereby indirectly influencing the faucet output flow Qm and temperature Tm. 2021-22 10M Ease of knowledge elicitation : Lazy methods can utilise easily available case or problem instances instead of rules that are difficult to extract. Absence of problem-solving bias: Cases can be used for multiple problem- solving purposes, because they are stored in a raw forms"This in contrast to eager methods, which can be used merely for the purpose for which the knowledge has already’been compiled. Incremental learning : A CBL system can be put into operation with a minimal set solved cases furnishing the case bases The case base will besfilled with new cases increasing the system’s problem-solving ability Ease of maintenance : This is particularly due to the fact that CBL systems can adapt to many changes in the problem domain and the relevant environment, merely by acquiring Ease of explanation: The results ofa CBL system can be justified based upon the similarity of the current problem to the reirieved case.CBL are easily traceable to precedent cases, it is also easier to analyse failures of the system. For example, CASEY for classification of auditory impairments, CASCADE for classification of software failures 2021-22 The inductive bias (also known as leamning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. In machine learning, one aims to construct algorithms that are able to learn to predict a certain target output. Inductive learning methods require a certain number of training examples to generalize accurately. Analytical learning stems fromthe idea that when not enough training examples are provided, it may be possible to “replace” the “missing” examples by prior knowledge and deductive reasoning. 2021-22 request is made. on of the target provided training + Lazy learning ING AKTU INEER BEING Perceptrons are the buildin learning algorithm of binary The perceptron consists of 4 parts. 1. Input values or One input layer 2, Weights and Bias 3. Net sum 4. Activation Function a. All wil bA c. Apply that weighted sum to the correct Activation Function. Weights shows the strength of the particular node. A bias value allows you to shift the activation function curve up or down. In short, the activation functions are used to map the input between the required values like (0, 1) or (-1, 1), Perceptron 1s usually used to classify the data into two parts. Iheretore, tt 1s also known as a Linear Binary Classifier. ‘Ques 2)What is Gradient descent? 2021-22 2M Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks, to find a local minimum/maximum of a given function This method is commonly used in machine learning (ML) and deep tearning(DL) to minimize a cost/loss function. 2021-22 2M In machine learning, the delta rule is a gredient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network. Itis a special case ot the more general backpropagation algorithm. ‘Ques:4) Describe BPN algorithm in ANN along with a suitable example. 2020-21 10M Back-propagation is used for the (raining of neural network. U C The Backpropagation algorithm looks for the minimum value of the error function in weight space using a technique called the delta rule or gradient descent. Tau aitificial neural nctwork, the values of weights aud Liases ave randuuily initialized. Due to random initialization, the neural network probably has errors in giving the correct output. We need to reduce error values as much as possible. So, for reducing these error values, we need a nen that can compare the desired output ofthe neural network withthe.n e a a and biases su For this, we and biases. Backpropagation is a short form for "backward propagation of errors." It is a standard method of training artificial neural networks. Backpropagation Algorithm: Step 1: Inputs X, arrive through the preconnected path. Step 2: The input is modeled using true weights W. Weights are usually chosen randomly. Step 3: Calculate the output of cach neuron fiom the input layer dhe hidden layer to the output layer. Step 4: Calculate the error in the outputs Backpropagation Error Actual Output — Desired Output Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the error. Step 6: Repeat the process until the desired output is achieved. Why We Need Backpropagation? Most prominent advantages of Backpropzgation are: + Backpropagation is fast, simple and easy to program + It isa flexible method as it does not require prior knowledge about the network + It is a standard method that generally works well + It does not need any special mention of the features of the function to be leamed, ‘Types of Backpropagation Networks Two Types of Backpropagation Networks are: + Static Back-propagation + Recurrent Backpropagation The output two runs of @ neural network compete among themselves to become active. Several output neurons may be active, but in competitive only single output neuron is active at one time 2020-21 10M. Self Organizing To determine the best matchin and calculate the Euclidean dist i and current input vector. The node with tor closest to the inpt tagged as the winning neuron. Step 4: Find the new weight between input vector sample and winning output Neuron. New Weights = Old Weights + Learning Rate (Input Vector — Old Weights) Step 5: Repeat st e eel weight are similar to old we map stop cl Convolutional Neural Networks (CNNs) are specially designed to wo images. Convolutional Neural Networks (CNNs) are specially desigr with images. An image consists of pixels. In deep learning, images are represented as arrays of pixel values. There are three main types of layers in a CNN: ¢ Convolutional layers ° 5 Pooling layers Tn additic ti er and fully conn There are four main types of operations in a CNN: Convolution operation, Pooling operation, Flatten operation and Classification (or other relevant) operation. Convolutional layers and convolution operation: The first layer in a CNN is a convolutional layer. It takes the images as the input and begins to process. ‘There are three elements in the convolutional layer: Input image, Filters and Feature map Section (3x3) Convolution operation between the image and filter, of spots 2}/0/1/0 4/3 o[1[3]2 ales 1) tts 3 |} 2 0};0);0)1 3)1 ol1[afo Feature map (axa) Input image Convolutional (6x6) operation Fil : This is also called Kernel or Feature Detector. Image section: The size of the image section should be equal to the size of the filter(s) we choose. The number of image sections depends on the Stride. Feature map: The feature map stores the outputs of different convolution operations between different image sections and the filter(s). ‘The number of steps (pixels) that we shift the filter over the input image is called Stride. Padding adds additional pixels with zero values to each side of the image. That helps to get the feature map of the same size as the input. Padding=t Pooling layers and pooling operation Pooling layers are the second type of layer used in a CNN. There can be multiple pooling layers in a CNN. Each convolutional layer is followed by a Padded Input mage pooling layer. So, convolution and pooling layers are es) used together as pairs. It Reduce the dimensionality (number of pixels) of the output returned from previous convolutional layers. There are three elements in the pooling layer: Feature map, Filter and Pooled feature map There are two types of pooling operations. +) Max pooling: Get the maximum value in the area where the filter is, applied. + Average pooling: Get the average of the values in the area where the filter is applied. Then, we can flatten a pooled feature map that contains multiple channels. Fully connected (dense) layers 2020-21 10M. Size of kernel or filter is 3*3 hence the size of image section is also 3*3 (lle PAN foi O- (t-kits 1 D4 TXL4 LXE t0K0 OX( 1 OKO + 1KO. = |IMOtKl +ixt “Lnput You Tube ENGINEER BEING MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU MODULE 5 PART-I REINFORCEMENT LEARNING — Introduction to Reinforcement Learning , Learning Task, Example of Reinforcement Learning in Practice, Learning Models for Reinforcement — (Markov Decision process , Q Learning - Q Learning function, Q Learning Algorithm ), Application of Reinforcement Learning, Introduction to Deep Q Learning. GENETIC ALGORITHMS: Introduction, Components, GA eyele of reproduction, Crossover, Mutation, Genetic Programming, Models of Evolution and Learning, Applications. Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty. The elements of reinforcement leaming are: Agent, Environment, Action, State, Policy, Reward. Leaming Models in RL: © Markov Decision Process © Q-Learning Algorithm © Deep Q Leaming The Markov Property state that : “Future is Independent of the past iven the present” Mathematically we ean express this statement as : P[Si+1 | Si] = P[S#i1 | Si, .- 15 It says that "If the agent is present in the current state S1, performs an action al and move to the state s2, then the state transition from s1 to s2 only depends on the current state and future action and states do not depend on past actions, rewards, or states”. MDP is a framework that can solve most Reinforcement Learning problems with discrete actions With the Markov Decision Process, an agent can artive at an optimal policy for maximum rewards over time. Markov Process is the memory less random process i.e. a sequence of a random state S[1],S[2],....S[n] with a Markov Property. Markov decision process has 5 tuples(S,A,P.R, 5): *. Sis the set of states. ¢ Ais the set of action. « P(S, A,S’)is the probability that action A in the state S at time T will lead to state S’ at time T+ 1. * R(S, A, S’) is the immediate reward received after a transition from State S to S dash due to action A. * Discount Factor (x): It determines how much importance is to be given to the immediate reward and future rewards. It has a value between 0 and 1 Quearning algorithm © Q-learning is a popular model-free reinforcement learning algorithm based on the Bellman equation. © The main objective of Q-leaming is to lear the policy which can infarm the agent that what actions © The goal of the agent in Q-learning is to maximize the value of Q. * Qsstands for quality in Q-learning, which means it specifies the quality of ari Gétion taken by the agent should be taken for maximizing the toward under what circumstances. * A Q-Table is used to find the best action for each state in the environment. We use the Bellman Equation at each state to get the expected future state and reward and save it in a table to compare with other states. Bellman Equation V(s) ~ max [R@a) +yV@')] Where, ‘V(s)= value calculated at a particular point. R(s, a) = Reward at a particular states by performing an action. y = Discount factor Q-Leamning algorithm works like this: Initialize all Q-values, e.g., with zeros Choose an action a in the current state s based on the current best Q-value Perform this action a and observe the outcome (new state s’). Measure the reward R after this action Update Q with an update formula that is called the Bellman Equation. Repeat steps 2 to 5 until the learning no longer improves EXAMPLE: An example of Q-learning is an Advertisement recommendation system, In a normal ad recommendation system, the ads you get are based on your previous purchases or websites you may have visited, If you’ ve bought a TV, you will get recommended TVs of different brands. Using Q-learning, we can optimize the ad recommendation system to recommend products that are frequently bought together. The reward will be if the user clicks on the suggested product. DEEP Q-LEARNING MODEL. ° O-Learning approach is practical for very small environments‘and quickly, loses it’s feasibility when the number of states and actions in the environment inereases: The solution for the above leads us to Deep Q Learning which uses a deep neural network to approximate the values Deep Q Leaming uses the Q-leaming idea and takes it one step further. Instead of-using a Q-table;weusea Neural Network thattakesia state and approximates the Q-values for each action based on that state The basic working'step for Deep Q-Learningiis that the initial stateiis fed into the neural network and it retums the Q-value of all possible actions as an output. CT able” => Q Value State Deep Q Learning eRe) Le) State => | Nel ae =p Q Value Action2 ™D Q Value Action3 The difference between Q-Learning and Deep Q-Learning can be illustrated as follows: + [aaa ‘amas Instead of using a Q-table, we use a Neural Network that takes a state and approximates the Q-values for each action based on that state Deep Neural Network Ques 6) What are the applications of reinforcement learning? Following are the applications of reinforcement learning : Robotics for industrial automation. 2, Business Strategy planning. 3. Machine leaming and data processing. 4. Ithelps us to create training systems that provide custom instruction and materials according to the requirement of students. 5. Aircraft control and robot motion control. INEER BEING foreg on Diabetic Retinopathy, Bung anna speaker, Selderving ea REINFORCEMENT LEARNING ~ Inodiction to Reinfrsement Leaning Learning Task, Example of Reinforement Learning ia Practice, Learning Modes fr Reinforeement~ (Markov Decision proces, Learing» Q Leasing funtion, Q Leaming Algom .Applition ef Reinforcement Leaning, odin to Dep This algorithm reflects the process of natural selection where the fittest individuals are selected in order to produce offspring of the next generation. The process of natural selection starts with the selection of fittest individuals from a population. © They produce offspring which inherit the characteristics of the parents and will be added to the next generation. * Ifparents have better fitness, their offspring will be better than parents and have a better chance at surviving. This process keeps on iterating and at the end, a generation with the fittest individuals will be found. © This notion can be applied for a search problem. The genetic algorithm is a method for solving both constrained and unconstrained optimization problems that is based on natural selection, the process that drives biological evolution. The genetic algorithm repeatedly modifies a population of individual solutions. Five phases are considered in a genetie algorithm. 1 Initial population Nv Fitness function 318 SBIBAtion 4. Crossover 5. Mutation Initial Population ‘The process begins set of i Is is called ition. Each individual is a solut the problem you solve. Al A2 aan t|1| 1 A3 1/0 a YOu Fitness Function l E The fitness function determines how fit an individual is (the ability of an individual te ‘ith other individuals). It.gi fitness indivi e indir wil NE is based on. Selection The idea of ct the fittest indivi k iss their genes to the ) Two pairs of individ Individuals with higi Crossover Crossover is the mo ‘ or each pair of parents to be mated, a erossover point is cho For example, consid crossover point to shown below. Offspring are created by exchangin parents among themselves until the cro A1 |0/0/0]0|0;0 Mutation In certain new offsy a mutation with a | m probability. bit string can be fli . Be Mutation: Before and Aft Mutation occurs to maintain: within the population and premature convergence. Termination ‘The algorithm terminates if the population has converged (does not produce offspring which are significantly, different from the previous generation). Then it is said that the genetic algorithm has provided a set of solutions to our problem. BEING a f Lt HH a ie bi, I Once the initial generation is created, the algorithm evolves the generation using following 1) Seleeti idea is to give prefe fitness scot 8 the 2) Crossover Operator: This represents mating between individuals. Two individuals are selected using selection operator and crossover sites are chosen randomly. Then the genes at these crossover sites are exchanged thus creating a completely new individual (offspring). For example oe | PSG eee a . 3)Mutation Operator: The key idea is to insert random genes in offspring to maintain the diversity in the population to avoid premature convergence. For example ““<- BOGEEEER . ‘ After Mutation M. N MOST IMPORTANT QUESTIONS MACHINE LEARN! -ENGINEER BEING INTRODUCTION ~ Leaming, Types of Learning, Well defined learning problems, Designing a Leaming System, History of ML, Introduction of Machine Leaning T | Approaches — (Artificial Neural Network, Clustering, Reinforcement Leaming, Decision Tree Learning, Bayesian networks, Support Vector Machine, Genetic Algorithm), Issues in Machine Learning and Data Science Vs Machine Learning; REGRESSION: Linear Regression and Logistic Regression BAYESIAN LEARNING - Bayes theorem, Concept leaning, Bayes Optimal I1 | Classifier, Naive Bayes classifier, Bayesian belief networks, EM algorithin. SUPPORT. VECTOR MACHINE: Introduction, Types of support vector kernel — (Linear kernel, polynomial kernel, and Gaussian kernel), Hyperplane — (Decision surface), Properties of SVM, and Issues in SVM, In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts created a model of neurons using an electrical circuit, and thus the neural network was created. 2. In 1952, Arthur Samuel created the first computer program which could learn as it ran. 3. Frank Rosenblatt designed the first artificial neural network im 1958, called Perceptron, The main goal of this was pattem and shape recognition 4. Use of back propagation in neural networks came in 1986, when researchers from the Stanford psychology department decided to extend an algorithm created by Widrow and Hoff in 1962. This allowed multiple layers to be used in a neural network, creating what are known’as\*slow leamers’, which will learmover a long period of time 5. In 1997, the IBM computer Deep Blue, which was a chess-playing computer, beat the world chess champion. 2st Century : 1. Since the start 0 learning will ineres heavily in it, in ord variables and their c It is also called a Bayes model. Bayesian networks are probabi a probability distribution it consists of two parts: o Directed Acyclic Graph © Table of conditional probabilities. BETES © Each nod ind a variable can o Are or di e( hip or conditional ENGIN EER Conditional probability Se False False False 0,001 0.999 The Bayes Optimal Classifier is a probabilistic model that predicts the most likely outcome for a new situation. It is based on bayes theorem. It’s also related to Maximum a Posteriori (MAP), a probabilistic framework for determining the most likely hypothesis for a training dataset Take a hypothesis space that has 3 hypotheses hi=0.4, h2=0.3, and h3=0.3. Hence, hl is the MAP hypothesis. Let a new instance x is encountered, which is classified negative by h2 and h3 bul positive by hl. P(yj|D) = > P(ojlhi)P(hi|D) hyeH The most probable classification of the new instance is obtained by combining the predictions of all hypotheses, weighted by their posterior probabilities. ‘To illustrate in terms of the above example, the set of possible classifications of the new instance is V = (®, ©). and PA DY = 4, P(Ojhy) = 0, Pl@lhy) = P(h2|D) = .3, P(@th2) = 1, P(@Jh2) =O PUD) = BoP (Ohta) ly PA@Ih3) = O- therefore DD P@npP db) = 4 Aa ¥ Pein Pd) = 6 and argmax )* P(vjlh,)P(hi|D) = © (8.0) Met ihm, Inductive bias, on theory, Information INSTANCE-BASED In machine learning, assumptions made by a generalize a finite set of ol domain. Inductive bias describes the basis tree over all the possible decision ID3 scareh in favor of shorter tree over highest information gain as the root attribute. That is to say, inductive inference is based on a generalization from a finite set of past observations, extending the observed pattern or relation to other future instances or instances occurring elsewhere. It is logically true but it might not be realistically true. i, ID3 is an algorithm used to generate a decision tree from a dataset. ii. To construct a decision tree, ID3 uses a top-down, greedy search through the given sets, where each attribute at every tree node is tested to select the attribute that is best for classification of a given set. iii, For constructing a decision tree information gain is calculated for each and every attribute and attribute with the highest information gain becomes the root node. i. C4.5 is an algorithm used to generate a decision tree. It is an extension of ID3 algorithm. ii, It is better than the ID3 algorithm because it deals with both continuous and discrete attributes and also with the missing values and pruning trees after construction, iii, C5.0 is the commercial successor of C4.5 because it is faster, memory efficient and used for building smaller decision trees. v. C4.5 performs by default a tree pruning process. © Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm. e An attribute with the low Gini index should be preferred as compared to the high Gini index. © It only creates binary splits, and the CART algorithm uses the Gin! index 10 create binary splits. © Gini index can be calculated using the below formula: Gini Index= 1- Y;P? Decision trees an represent any boolean function of the input attribiites. Let’s use decision trees to perform the function of three boolean gates AND, OR and XOR. o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning technique. K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. o K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. KNN Classifier Input value Predicted Output Need With the help of K-NN, we can easily identify the category or class of a particular dataset. Working The K-NN working cafi be explained on the basis of the below algorithm: © Step-1: Select the number K of the neighbors Step-2: Calculate the Euclidean distance of K number of neighbors Step-3: Take the K nearest neighbors as per the calculated Euclidean distances co Step-4: Among these k neighbors, count the\number of the data points in each category. Step-5: Assign the new data points to that category for which the number of the neighbor is maximum. Our model is ready ARTIFICIAL NEURAL NETWORKS ~ Perceptron’s, Multilayer perceptron, Gradient descent and the Delta rule, Multilayer networks, Derivation of Backpropagation Alyoritlun, Genersligation, Unsupervised Learning — SOM 1V_| Algorithm and its variant; DEEP LEARNING - Introduction, concept of convolutional neural network , Types of layers ~ ( Convolutional Layers , Activation function , pooling , fully connected) , Concept of Convolution (ID and 2D) layers, Training of network, Case study of CNN for eg on Diabetic Retinopathy, Building a smart speaker, Self-deriving car etc. Convergence of neural networks is a point of training a model after which changes in the leaning rate become lower and the errors produced by the model in training comes to a minimum. Convergence of the neural network helps in defining how many iterations of training a neural network will require to produce minimum errors. Most of the neliral nefivork fails to converge because The amount of the training data is low, Inappropriate weight application in the network, or Implementation of not enough nodes may be a reason behind this issue. ‘There are various things to do that can help in avoiding this failure : Change in the activation functioncan be helpful, reinitializatiomof the weights of thenetwork. A higher leaming:rate or the number of epochs should be avoided:to make the neural network converge faster. This essentially means how good our model is at learning from the given data and applying the learnt information elsewhere. When training a neural network, there’s going to be some data that the neural network trains on, and there’s going to be some data reserved for checking the performance of the neural network. If the neural network performs well on the data which it has not trained on, we can say it has generalized well. Due to overfitting, NN fails to form a general understanding. In neural networks, adding dropout neurons is one of the most popular and effective ways to reduce overfitting in neural networks Dropout appligdite'a neural network at a given instant Equivalent neural network at this instant Self Driving car CNN is the primary algorithm that these systems use to recognize and classify different parts of the road, and to make appropriate decisions. To understand the workings of self-driving cars, we need to examine the four main parts: 1. Perception Z Localization 3. Prediction 4. Decision Making Perception Perception, which helps the car see the world around itself, as well as recognize and classify the things that it sees. To achieve such a high level of perception, a self-driving car must have three sensors: 1 Camera 2. LiDAR Light Detection And Ranging 3, RADAR Radio detection and ranging Localization Localization algorithms in self-driving cars calculate the position and orientation of the vehicles it navigates Prediction The car has a 360-degree view of its environment that enables it to perceive and capture all the information and process it. Prediction creates an n number of possible actions or moves based on the environment Decision-making Decision-making is vital in self-driving cars. In order to make a decision, the car should have enough information so that it can select the necessary set of actions. uilding a smart speaker A smart speaker is a wireless electronic device that can respond to spoken commands Hardware Components eRaspberry Pi @KeSpeaker 2-mies Hat / USB mic / USB sound card © SD card © speaker ¢ 3.5mm Aux cable/ JST PH2.0 connector Speech recognition is used Convolutional Neural Network (CNN) is applied as advanced deep neural networks to classify each word from our pooled data set as a multi-class classification task. The proposed deep neural network returned 97.06% as word classification accuracy with a completely unknown speech sample. REINFORCEMENT LEARNING — Introduction to Reinforcement Learning , Leaming Task, Example of Reinforcement Learning in Practice, Leaming Models foF Reinforcement ~ (Markov Decision process , Q Learning - Q Learning function, @ V__| Learning Algorithm ), Application of Reinforcement Learning, Introduction-to:Deep. Q Learning. GENETIC ALGORITHMS: Introduction, Components, GA cycle of reproduction, Crossover, Mutation, Genctie Programming, Models of Evolution and Leaning, Applications. 1. RL in Robotics Robotics without any doubt facilitates training a robot in such a way that a robot can performetasks — justelike»ay human being can. But stillsthere is a bigger challenge the roboties industry i a n't able to use common sense while making vario i 2.Traffic Control Reinforcement lea i i d optimization for traffic control acti 3. Gaming From creating a new game, to testing its b efficient and relatively easy resource on vi are all examples of natut learning. By studying typi how people speak to each o Two evolution mivdels. Lamarckian evolution. Baldwin effect. Lamarckian evolution believed that individual genetic makeup is changed by the lifetime experience. That is, if an Organism changes during its life to adopt 2 the environment, then those changes are passed to the offspring. Baldwins explai ning ior. Ir Nf in terms of time and 1.Optimization — Genetic Algorithms are most commonly used in optimization problems wherein we have to maximize or minimize a given objective function value under a given set of constraints. 2. Traycling salesman problem (TSP) The main motive of this problem is to find an optimal way to be covered by the salesman, After each iteration, we can generate offspring solutions that can inherit the qualities of parent solutions. 3. Financial markets In the financial market, using genetic optimization, we can solve a variety of issues because genetic optimization helps in finding an optimal set or combination of parameters that can affect the market rules and trades. 4. Manufacturing system One of the major applications of genetic optimization is to minimize a cost function using the optimized set of parameters. 5.Parametric Design of Aircraft — GAs have been used to design aircrafts by varying the parameters and evolving better solutions.

You might also like