Machine Learning

What is Machine Learning?
- Machine learning uses statistical, computer science, algorithmic and probabilistic

aspects for learning iteratively from input data, to build intelligent system. Machine learning is a way to learn through
examples and past experiences, without being explicitly programmed. Machine Learning gives an ability to a machine
to solve various complex tasks that are difficult to be solved algorithmically. However, there is no universally accepted
definition for machine learning. Different authors define the term differently. Tom Mitchell in 1998 proposed a well-
posed machine learning problem
Dimensionality Reduction - Dimensionality reduction is a way of converting the higher dimensions dataset into lesser
dimensions dataset ensuring that it provides similar information. Dimensionality reduction is a data preparation
technique performed on data before modeling. This step might be performed after data cleaning and data scaling but
before training a predictive model. Dimensionality reduction gives a more compact, easily interpretable
representation of the target concept by hitting the most relevant features. It is commonly used in the fields that deal
with high-dimensional data, such as speech recognition, signal processing, bioinformatics, etc. It can also be used for
data visualization, noise reduction, cluster analysis, etc.
Data Science- Data science deals with converting raw data into insights for better decision making. It focuses on
analyzing the past or current data and predicting the future outcomes with the aim of making well informed
decisions. It is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden
patterns from the raw data. Definition- Data science is the field of study that combines domain expertise,
programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.
Correlation - Correlation is a statistical measure used to determine the strength of a co-relationship or association
between two variables. Correlation depicts the interdependence of two variables for correlating phenomenon. If
there is correlation between two variables means that if we change value of one variable then there will
Correlation Coefficient:- The degree of association or relationship between two variables can be represented by an
index known as correlation coefficient. It always lies between -1 and 1. It is a numerical index that depicts what extent
the two variables are related and to what extent the changes in one variable affects the change in another
Regression- Regression describes how an independent variable is numerically related to the dependent variable. It is
statistical tool which is used for prediction or estimation of value of response variable if values of predictor variable
are known. Predictor variables are also known as independent variables or regressor or input variables. Whereas
response variables are known as dependent variables or regressand or output variable. It is a powerful and flexible
tool used for forecasting the past, present or future events on the basis of past or present events in applications like
trend analysis, forecasting, insights on market patterns etc. Defination It is a statistical technique used to express the
relationship between two or more independent variables and dependent variables in the form of equation
Handling Missing Data - Most of the real world data has missing values, and as size of dataset increases the chance of
missing values increase. Sometimes there may be some instances from datasets where particular values are missing
due to various reasons, like data corruption, failure to load the data, incomplete extraction, manual error or many
more reasons. Following are few ways to handle this issue. 1) Ignore / Delete missing values- This method is advised
only when there are enough samples in the data set. For a particular feature and a particular column if it has more
than 70-75% of missing values then we delete a particular row having a null value for 2) Assigning new category or
unique value- Categorical feature will have a finite number of values or classes like student attendance-
present/absent, gender- male/female etc. So we can assign new class or unique value for these features like
unknown, NA, out-of- range, -1 etc. This strategy adds more information to existing dataset which will result in the
change of variance. 3) Replace missing values by Mean/Median/Mode- This strategy is applied when a feature has
numeric values. The mean, median or mode of the feature values will be calculated to replace them with the missing
values. This is an approximation which can add variance to the data set.4) Predicting missing values- In this
technique, we can predict the missing values using machine learning algorithms. Using linear regression or any other
technique and considering available features, we can easily predict and fill missing values. This technique adds
accuracy to the algorithm result, unless a missing value is expected to have a very high variance.
Feature Scaling- Feature scaling is the final step of data preprocessing in machine learning. It is used when data is
having highly varying magnitudes or values. It is a technique to standardize the independent variables of the dataset
in a specific range. It puts the values in the same range or same scale so that no variable is dominated by the other.
Feature scaling a technique to standardize independent features present in the data in the same range or same scale
during data preprocessing.
Semi-Supervised Learning- Semi-supervised learning is used on an input dataset that is not completely labeled or
unlabeled. In such situations we cannot use supervised or unsupervised techniques. The goal of a semi-supervised
model is to classify some of the unlabeled data using the labeled information Semi-supervised learning is of great
interest in machine learning because it can use readily available unlabeled data to improve supervised learning tasks
when the labeled data are scarce or expensive.
Under-fitting - When a model is not able to capture the essence of the training data properly because of less number
of parameters then this is known as under-fitting Model under-fitting may happen if less amount of available data is
used to train a model. This situation may arise due to high bias error and low variance error. One more reason for
model under-fitting is less information about data. Eg. if we try to build linear model with non-linear data. Linear
regression and logistic regression models may face such kind of situations
Over-fitting- When a model is built by considering too many features then it captures noise along with the underlying
pattern. So the model will fit so closely to the training data, which will leave very less scope for generalizability. Such a
situation is known as model over-fitting. This situation may arise due to low bias and high variance Example decision
tress are prone to over-fitting.
Scalability - In addition to accuracy based measures, classifiers can also be compared with respect to scalability
Scalability refers to the ability to build the classifier efficiently for large amounts of data. Scalability is typically
assessed with a series of datasets of increasing size
Precision - Precision tells us how many of the correctly predicted cases actually turned out to be positive Precision
means the percentage of results which are relevant. Precision mean the percentage of true positives, among all
examples that the classifier has labeled as positive. The value of precision can be obtained by the following
formula:- Precision TP / TP + FP
Recall - Recall is another important metric that refers to the percentage of total relevant results correctly classified by
classifier. Recall tells us how many of the actual positive cases we were able predict correctly with our model. It is
number of correct positive results divided by the numbe of all relevant samples ie all samples that should have been
identified as positive. It can also be called as sensitivity High recall indicates a low false negative rate.
F1 Score - In some applications, it is necessary to give higher priority to recall and in other applications it is necessary
to choose precision with high priority. But there are many applications in which recall and precision both needs to be
treated as equally important. So alternative way to use both precision and recall is to combine them into a single
measure. This is the approach of the F measure. One popular metric which combines precision and recall is called F1-
score, which is the harmonic mean of precision and recall
Polynomial Regression - For linear dataset, linear regression model works best. This model gives low error rate and
high accuracy. But, for non-linear dataset these linear models will perform worst. The model will increase both error
rate and loss function value and decrease the accuracy. So for nonlinear datasets different regression models are
required. In figure 3.5 (a) a linear regression model tries to fit all data points but it hardly covers any data points. In
figure 3.5 (b), for same dataset a non-linear model in the form of a curve covers most of the data points. This
indicates that a non-linear model is best suitable to fit non-linear data points rather than linear model.
Decision Tree - Decision tree is a supervised machine learning algorithm used for both prediction and classification. It
is mostly used for classification. It is based on divide and conquer strategy, where the input space is divided into
smaller regions until that becomes more manageable. Decision trees can handle both categorical and numerical data.
It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the
decision rules and each leaf node represents the outcome.
K - Nearest Neighbors (KNN)- K-Nearest Neighbor is non-parametric algorithm that is used for classification and
regression but mostly it is used for classification problems. The KNN rule is a very simple learning algorithm that relies
on the assumption that "things that look alike must be alike". K-NN is a lazy learner algorithm because it does not
have any specialized training phase, it uses all the data for training while classification. It does not learn from the
training set immediately instead it stores the training tuples or instances, so it is also called as instance-based learner.
The idea is to memorize the training set and then to predict the label of any new data on the basis of the labels of its
closest neighbors in the training set.
Naïve Bayes Classification- Naïve Bayes is a probabilistic classification algorithm based on Bayes' theorem. Bayes'
theorem gives the relationship between the probabilities of two events and their conditional probabilities. It is a
probabilistic classifier, which means it predicts output on the basis of the probability of an object. A naïve Bayes
classifier assumes that the presence or absence of a particular feature of a class is unrelated to the presence or
absence of other features. Bayes' theorem is used to determine the probability of a hypothesis with prior knowledge.
It depends on the conditional probability
Non-linear Regression - Linear regression models work extremely well in many scientific and engineering
applications, but these are not applicable for all situations. There are many practical situations where the response
variable and the predictor variables are related through a nonlinear function. Consider few practical situations where
we can observe growth of tumor or tissues, black body radiation, the progress of disease epidemic, electricity
consumption etc. In these examples the relationship between independent variables and dependent variable may
also be curvilinear. Sometimes in practical situations linear regression is not appropriate choice, so these kind of
circumstances have brought focus on nonlinear regression for better solutions.
Multiple Regression Simple linear regression describes the relationship between two variables. In above example.
experience & salary can be expressed by a linear function, where salary is totally dependent on years sof experience
of an employee.But practically it's not true because salary of an employee may depend on several other factors too,
Linear Regression - Linear regression is a simple method used for numeric predictions. Linear regression is a modeling
technique that is used to find a relationship between dependen variable (Y) and one or more independent variables
(X) using a best fit line. To predict a real-valued output y, for the input vector (X1, X2 …, Xp), the linear regression
model P f(x) = B+ ΣxB The linear model either assumes that the regression function is linear, or that the linear model is
a reasonable approximation. Here the B's are unknown parameters or coefficients, and the variables x¡ can come
from different sources like quantitative inputs, transformed quantitative inputs, numeric or dummy representations of
qualitative inputs etc.
Support Vector Machine- Support Vector Machine (SVM) is a supervised machine learning algorithm which can be
used for both classification and regression problems. Mostly, it is used in classification of both linear and non-linear
data. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is number of features
you have) with the value of each feature being the value of a particular coordinate Support Vectors are simply the co-
ordinates of individual observation. Then, we perform classification by finding the hyper-plane that differentiates the
two classes very well. The SVM classifier is a hyperplane or line which best segregates the two classes.
Reinforcement learning - Reinforcement learning is an area of Machine Learning. It is about taking

suitable action to maximize reward in a particular situation. It is employed by various software and
machines to find the best possible behavior or path it should take in a specific situation. Reinforcement
learning differs from supervised learning in a way that in supervised learning the training data has th e
answer key with it so the model is trained with the correct answer itself whereas in reinforcement
learning, there is no answer but the reinforcement agent decides what to do to perform the given task.
In the absence of a training dataset, it is bound to learn from its experience.
What are type of ANN –

1. Single-layer feed-forward network
2. Multilayer feed-forward network
3. Single node with its own feedback
4. Single-layer recurrent network
5. Multilayer recurrent network
K-means clustering is one of the most popular unsupervised, non-deterministic, and iterative
algorithms. The value of K denotes the number of desired clusters that are based on given dataset The K
means technique is simple. This is a centroid-based algorithm, where every cluster is associated with a
centroid. We first choose K initial centroids, where K is a user-defined parameter, namely, the number of
clusters desired. Each point is then assigned to the closest centroid, and each collection of points
assigned to a centroid is a cluster. The centroid of each cluster is then updated based upon the points
assigned to the cluster. This is repeated until no point changes clusters and the centroids remain the
same.
A dendrogram is a diagrammatic representation of hierarchical relationships between objects or groups

of objects. It is commonly used in the field of data analysis and clustering to visualize the results of
hierarchical clustering algorithms In a dendrogram, objects or groups of objects are represented as
points or rectangles called "leaves." These leaves are then linked together by branches, forming a tree-
like structure. The height or length of the branches represents the dissimilarity or distance between the
objects or groups. Objects that are similar or closely related are clustered together at lower levels of the
tree, while those that are dissimilar are placed further apart. Dendrograms are particularly useful for
exploring and interpreting the results of clustering analysis. They provide a visual representation of the
underlying structure and allow for the identification of clusters and subclusters. By cutting the
dendrogram at different levels, it is possible to obtain different clustering solutions with varying levels of
granularity.
Thompson Sampling - Thompson Sampling algorithm was first proposed in 1933 by William R.
Thompson. It is an algorithm that follows exploration and exploitation to maximize the cumulative
rewards obtained by performing an action. Thompson Sampling is also sometimes referred to as
Posterior Sampling or Probability Matching. This method is defined in context with multi armed bandit
problem. Let's have a look on this problem in order to understand Thompson sampling Thompson
Sampling is an approach that successfully tackles these kinds of problems. Thompson sampling is
method of solving the exploration and exploitation dilemma in a multi-armed bandit problem.
Q-Learning - The Q stands for quality in Q-learning, which means it specifies the quality of an action
taken by the agent. Q-learning involves a model free environment where agent is not seeking to learn
about the underlying mathematical model or probability distribution. Instead in Q -learning, the agent
attempts to construct an optimal policy directly by interacting with the environment. Q-learning uses
trial and error based approach in which agent repeatedly tries to solve the problem using varied
approaches across many iterations and continuously updates its policy as it learns more and more about
the environment.
Artificial Neural Network (ANN) - ANNs are at the very core of Deep Learning. They are versatile,
powerful, and scalable, making them ideal to tackle large and highly complex machine learning tasks,
such as classifying billions of images (e.g. Google Images), powering speech recognition services (e.g..
Apple's Siri), recommending the best videos to watch to hundreds of millions of users every day (eg.
YouTube), or learning to beat the world champion at the game of Go (DeepMind's AlphaGo).
Convolution Neural Network (CNN) - Convolutional Neural Networks (CNNs or ConvNets) gained
popularity in computer vision due to their extraordinary good performance on image classification tasks.
In image classification and image recognition, a Convolutional Neural Network plays a vital role. C NNs
are one of the most popular neural network architectures in deep learning. The key idea behind
convolutional neural networks is to build many layers of feature detectors to take the spatial
arrangement of pixels in an input image into account. It is designed to process the data using multiple
layers of arrays.
Recurrent Neural Network (RNN) - Feed forward networks generate output that is independent of the
previous one. There is no relation between the output produced at time t+1 with output generated at
time t. So feed forward networks are not suitable in those applications where relation between output
produced at previous state and current state is essential.

Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning

Uploaded by

Copyright:

Available Formats

What is Machine Learning?

- Machine learning uses statistical, computer science, algorithmic and probabilistic

Reinforcement learning - Reinforcement learning is an area of Machine Learning. It is about taking

What are type of ANN –

A dendrogram is a diagrammatic representation of hierarchical relationships between objects or groups

You might also like