You are on page 1of 16

MODULE I

1.What is machine learning and discuss any four examples of machine learning application's.
A. Machine learning is programming computers to optimize a performance criterion using
example data or past experience. We have a model defined up to some parameters, and learning
is the execution of a computer program to optimize the parameters of the model using the
training data or past experience. The model may be predictive to make predictions in the future,
or descriptive to gain knowledge from data, or both.

1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.

2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms
are widely used by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice instructions.

3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning.

Google understands the user interest using various machine learning algorithms and suggests the
product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars. Machine learning
plays a significant role in self-driving cars. Tesla, the most popular car manufacturing company
is working on self-driving car. It is using unsupervised learning method to train the car models to
detect people and objects while driving
2.Differentiate between Supervised and Unsupervised learning

3.Explain linear and Multivariate regression with examples

Linear Regression is Also called simple regression, linear regression establishes the
relationship between two variables. Linear regression is graphically depicted using a straight
line with the slope defining how the change in one variable impacts a change in the other. The
y-intercept of a linear regression relationship represents the value of one variable when the
value of the other is 0.
In linear regression, every dependent value has a single corresponding independent variable that
drives its value. For example, in the linear regression formula of y = 3x + 7, there is only one
possible outcome of 'y' if 'x' is defined as 2.
If the relationship between two variables does not follow a straight line, non linear
regression may be used instead. Linear and nonlinear regression are similar in that both track a
particular response from a set of variables. As the relationship between the variables becomes
more complex, nonlinear models have greater flexibility and capability of depicting the non-
constant slope.
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the
linear relationship, which means it finds how the value of the dependent variable is changing
according to the value of the independent variable

Mathematically, we can represent a linear regression as:


y= a0+a1x+ ε

Examples:
1. Predicting house prices: This is a common example of linear regression. It involves
analyzing sales data for a region to predict the price of a house.
2. Predicting sales: Linear regression can be used to evaluate trends and make forecasts. For
example, a company can analyze its sales data to forecast future sales.
3. Predicting weight: Linear regression can be used to predict weight based on height. For
example, if someone is 70 inches tall, their weight can be predicted using the equation
Weight = 80 + 2 x (70) = 220 lbs.
4. Predicting salary: Linear regression can be used to predict salary based on qualifications.
5. Predicting height: Linear regression can be used to predict height based on age.

Multivariate regression is a technique used to measure the degree to which the various
independent variable and various dependent variables are linearly related to each other. The
relation is said to be linear due to the correlation between the variables. Once the multivariate
regression is applied to the dataset, this method is then used to predict the behaviour of the
response variable based on its corresponding predictor variables.

Multivariate regression is commonly used as a supervised algorithm in machine learning, a


model to predict the behaviour of dependent variables and multiple independent variables.
Characteristics of multivariate regression

Multivariate regression allows one to have a different view of the relationship between various
variables from all the possible angles.

It helps you to predict the behaviour of the response variables depending on how the predictor
variables move.

Multivariate regression can be applied to various machine learning fields, economic, science
and medical research studies.
Example of multivariate regression

An agriculture expert decides to study the crops that were ruined in a certain region. He collects
the data about recent climatic changes, water supply, irrigation methods, pesticide usage, etc. To
understand why the crops are turning black, do not yield any fruits and dry out soon.

In the above example, the expert decides to collect the mentioned data, which act as the
independent variables. These variables will affect the dependent variables which are nothing but
the conditions of the crops. In such a case, using single regression would be a bad choice and
multivariate regression might just do the trick.

4.Explain about SVM Linear classification


A. Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or
nonlinear classification, regression, and even outlier detection tasks. SVMs can be used for a
variety of tasks, such as text classification, image classification, spam detection, handwriting
identification, gene expression analysis, face detection, and anomaly detection. SVMs are
adaptable and efficient in a variety of applications because they can manage high-dimensional
data and nonlinear relationships.
SVM algorithms are very effective as we try to find the maximum separating hyperplane
between the different classes available in the target feature.
Support Vector Machine
Support Vector Machine (SVM) is a supervised machine learning algorithm used for both
classification and regression. Though we say regression problems as well it’s best suited for
classification. The main objective of the SVM algorithm is to find the optimal hyperplane in an
N-dimensional space that can separate the data points in different classes in the feature space.
The hyperplane tries that the margin between the closest points of different classes should be
as maximum as possible. The dimension of the hyperplane depends upon the number of
features. If the number of input features is two, then the hyperplane is just a line. If the number
of input features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to
imagine when the number of features exceeds three.
Let’s consider two independent variables x 1, x2, and one dependent variable which is either a
blue circle or a red circle.
From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line
because we are considering only two input features x1, x2) that segregate our data points or do
a classification between red and blue circles. So how do we choose the best line or in general
the best hyperplane that segregates our data points?

How does SVM work?


One reasonable choice as the best hyperplane is the one that represents the largest separation or
margin between the two classes.

So we choose the hyperplane whose distance from it to the nearest data point on each side is
maximized. If such a hyperplane exists it is known as the maximum-margin
hyperplane/hard margin. So from the above figure, we choose L2. Let’s consider a scenario
like shown below
Here we have one blue ball in the boundary of the red ball. So how does SVM classify the
data? It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The
SVM algorithm has the characteristics to ignore the outlier and finds the best hyperplane that
maximizes the margin. SVM is robust to outliers.

So in this type of data point what SVM does is, finds the maximum margin as done with
previous data sets along with that it adds a penalty each time a point crosses the margin. So the
margins in these types of cases are called soft margins. When there is a soft margin to the data
set, the SVM tries to minimize (1/margin+∧(∑penalty)). Hinge loss is a commonly used
penalty. If no violations no hinge loss.If violations hinge loss proportional to the distance of
violation.
Till now, we were talking about linearly separable data(the group of blue balls and red balls are
separable by a straight line/linear line). What to do if data are not linearly separable?
Say, our data is shown in the figure above. SVM solves this by creating a new variable using
a kernel. We call a point xi on the line and we create a new variable yi as a function of distance
from origin o.so if we plot this we get something like as shown below

In this case, the new variable y is created as a function of distance from the origin. A non-
linear function that creates a new variable is referred to as a kernel.

5.Explain the concept of Bayes theorem with example


Baye's theorem is used for the calculation of a conditional probability where intuition often
fails. Although widely used in probability, the theorem is being applied in the machine
learning field too. Its use in machine learning includes the fitting of a model to a training
dataset and developing classification models.
An Illustration of Bayes theorem
Three boxes labeled as A, B, and C, are present. Details of the boxes are:
 Box A contains 2 red and 3 black balls
 Box B contains 3 red and 1 black ball
 And box C contains 1 red ball and 4 black balls
 All the three boxes are identical having an equal probability to be picked up. Therefore,
what is the probability that the red ball was picked up from box A?
Solution
Let E denote the event that a red ball is picked up and A, B and C denote that the ball is
picked up from their respective boxes. Therefore the conditional probability would be
P(A|E) which needs to be calculated.
The existing probabilities P(A) = P(B) = P (C) = 1 / 3, since all boxes have equal
probability of getting picked.
P(E|A) = Number of red balls in box A / Total number of balls in box A = 2 / 5
Similarly, P(E|B) = 3 / 4 and P(E|C) = 1 / 5
Then evidence P(E) = P(E|A)*P(A) + P(E|B)*P(B) + P(E|C)*P(C)
= (2/5) * (1/3) + (3/4) * (1/3) + (1/5) * (1/3) = 0.45
Therefore, P(A|E) = P(E|A) * P(A) / P(E) = (2/5) * (1/3) / 0.45 = 0.296

6.Explain Logistic Regression


A. Logistic regression is a supervised machine learning algorithm mainly used for
classification tasks where the goal is to predict the probability that an instance belongs to a
given class or not. It is a kind of statistical algorithm, which analyze the relationship between a
set of independent variables and the dependent binary variables. It is a powerful tool for
decision-making. For example email spam or not.
Logistic Regression
Logistic regression is a supervised machine learning algorithm mainly used for
binary classification where we use a logistic function, also known as a sigmoid function that
takes input as independent variables and produces a probability value between 0 and 1. For
example, we have two classes Class 0 and Class 1 if the value of the logistic function for an
input is greater than 0.5 (threshold value) then it belongs to Class 1 it belongs to Class 0. It’s
referred to as regression because it is the extension of linear regression but is mainly used for
classification problems. The difference between linear regression and logistic regression is that
linear regression output is the continuous value that can be anything while logistic regression
predicts the probability that an instance belongs to a given class or not.
Understanding Logistic Regression
It is used for predicting the categorical dependent variable using a given set of independent
variables.
 Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value.
 It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as
0 and 1, it gives the probabilistic values which lie between 0 and 1.
 Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
 In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
 The curve from the logistic function indicates the likelihood of something such as whether
the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
 Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
 Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification.

7. What are the two main problem classes of Statistical Supervised learning? Explain them

A. Supervised learning uses a training set to teach models to yield the desired output. This
training dataset includes inputs and correct outputs, which allow the model to learn over time.
The algorithm measures its accuracy through the loss function, adjusting until the error has been
sufficiently minimized.

Supervised learning can be separated into two types of problems when data mining—
classification and regression:
 Classification uses an algorithm to accurately assign test data into specific categories. It
recognizes specific entities within the dataset and attempts to draw some conclusions on
how those entities should be labeled or defined. Common classification algorithms are
linear classifiers, support vector machines (SVM), decision trees, k-nearest neighbor, and
random forest, which are described in more detail below.
 Regression is used to understand the relationship between dependent and independent
variables. It is commonly used to make projections, such as for sales revenue for a given
business.

Statistical supervised learning problems can be broadly categorized into two main classes:
classification and regression.
1. Classification:
 Definition: Classification is a type of supervised learning where the goal is to predict the
categorical class labels of new, unseen instances based on past observations.
 Example: Spam detection in emails is a classic example of a classification problem.
Given a set of emails labeled as spam or not spam, the algorithm learns to classify new,
unseen emails into one of these categories.
 Output: The output of a classification algorithm is a discrete class label. The algorithm
assigns each input instance to one of several predefined classes.
 Evaluation: Classification algorithms are evaluated based on metrics such as accuracy,
precision, recall, F1 score, and area under the ROC curve (AUC-ROC).
2. Regression:
 Definition: Regression is a type of supervised learning where the goal is to predict
continuous numerical values. In regression, the algorithm learns to map input features to
a continuous target variable.
 Example: Predicting house prices based on features like square footage, number of
bedrooms, and location is a regression problem. The algorithm aims to predict a
numerical value (price) rather than a categorical label.
 Output: The output of a regression algorithm is a continuous value, and the goal is to
minimize the difference between the predicted values and the actual values.
 Evaluation: Regression models are evaluated using metrics such as mean squared error
(MSE), mean absolute error (MAE), and R-squared. These metrics quantify the accuracy
of the predicted numerical values.

In summary, classification deals with predicting categorical labels, while regression deals with
predicting continuous numerical values. Both types of problems involve supervised learning,
where the algorithm learns from labeled training data to make predictions on new, unseen data.
The choice between classification and regression depends on the nature of the target variable in
the problem at hand
8(a)

8B

You might also like