You are on page 1of 5

Name:

Enrolment No:

UNIVERSITY OF PETROLEUM AND ENERGY STUDIES


Mid Semester Examination, March 2023
Course: Machine Learning Semester: IV
Program: B.Tech CSE(AIML) Time: 02 hrs. Course Code:CSAI2001
Max. Marks: 100

Instructions: There are two Sections. Attempt all questions.

SECTION A
1. Each Question will carry 8 Marks

S. Marks CO
No.
Q1  A machine learning professor wants to use the number of hours a student studies for a machine learning
final exam score (Y). A regression model is fit based on data collected from a class during the previous
semester, with the following results: Yi=35.0 + 3Xi. What is the interpretation of the Y-intercept b0 and
slope b1?
Ans. : Y-intercept b0=35 indicates that when the student does not study for the final exam the predicted
score is 35. The slope b1=3 indicates that for each increase of one hour in studying time, the predicted
change in final exam score is +3.
In a nutshell, the final exam score is predicted to increase by a mean of 3 points for each one-hour
increase in studying time.
 What is machine learning? What are abstraction and generalization in the context of Machine Learning?
2+6 CO1
Ans. Machine learning is a branch of AI and computer science which focuses on the use of
data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
Abstraction: The choice of model is typically not left up to the machine. Instead, the learning
task and data on hand inform model selection. The process of fitting a model to a dataset is
known as training. When the model has been trained, the data is transformed into an abstract
form that summarizes the original information.
Generalization:The term generalization describes the process of turning abstracted knowledge
into a form that can be utilized for future action, on tasks that are similar, but not identical, to
those it has seen before.
Q2 What are the metrics to be considered to evaluate any machine learning algorithm? You can explain this with
linear regression as a model taking it into consideration.
Ans. : Accuracy, Precision, Recall F-score.
For simple linear Regression :
R Square, Adjusted R, Standard error,
Measures of Variations : 8 CO1
 SST(Total sum of squares)
 SSR(Regression sum of squares)
 SSE(Error sum of squares)

Q3 Describe the procedure of Machine Learning with a block diagram.

Ans. Any appropriate diagram having training, model and evaluation phases can be 8 CO1
considered.

Q4 Compare Supervised and Unsupervised Machine learning algorithms.


Ans. Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Basically
8 CO2
supervised learning is when we teach or train the machine using data that is well labelled. Example:
SVM, Decision tree, Linear Regression, Random Forest etc.
Unsupervised learning is the training of a machine using information that is neither classified nor
labeled and allowing the algorithm to act on that information without guidance. Example : Clustering-
K-means.
Q5

8 CO2

Fig. 1
Refer to fig.1 (above), which type of function is it? What is the intuition behind using this function? Which type
of regression problem can be solved using this function?
Ans. (Hints) Sigmoid activation function used to solve classification problems (Logistic
Regression)…………………..Equation of logistic regression is
SECTION B
1. Each question will carry 15 marks.

Q6 i) Discuss in detail the market basket model. What are its applications? Suppose a database has five transactions
(refer to table).Compute support and confidence of the following transactions(association buying) :
 O=>N
 O=>K
ii)Give the definition of support and confidence.
Transaction ID Items bought
1 {M,O,N,K,E,Y}
2 {D,O,N,K,E,Y}
3 {M,A,K,E}
4 {M,U,C,K,Y}
5 {C,O,O,K,I,E}
Ans:
One of the most important methods used by major retailers to identify associations between products is market
basket analysis. 15 CO3
It operates by looking for product combinations that regularly appear together in transactions.
To put it another way, it enables businesses to discover connections between the products that customers
purchase.

In order to uncover strong rules found in transaction data using measures of interestingness based on the idea of
strong rules, association rules are frequently employed to analyse market basket model on transaction data.

Applications of Market Basket model :


 Cross-selling
 Product/Item Placement in stores
 Customer behaviour
 Affinity promotions etc. etc…………….

Two important parameters are support and confidence.


 SUPPORT( O=>N) = 2/5=0.4 , CONFIDENCE(O=>N) = 2/3=0.66
 SUPPORT (O=>K) = 3/5=0.6 , CONFIDENCE (O=>K) = 3/3=1

Q7

Two regression equations are shown above. Discuss the types of regressions and their significance in real-life
house price prediction problems. What are y, b0, b1, x1…..xn in the above equations, and how they are related to 15 CO2
your housing price prediction problem?
Ans. The above equation is simple linear Regression and bottom one is multiple linear regression. X1, X2….Xn
these are the features to predict house price and b1,b2,…..bn etc. are weightage of the different features. This
concept can be used to describe the entire answer.

Q8 Suppose a logistic regression has produced the following outputs :


Actual Predicted
0 0
0 1
1 0
1 1
1 1
1 0
0 0
Considering the above outputs, compute the following: Confusion matrix, Accuracy, Precision, Recall, and F-Score.
Can a confusion matrix be used in simple linear regression prediction problems? Justify your answer.
Ans. Accuracy= TP+TN/TP+FP+FN+TN =4/7=0.57 , Precision= TP/TP+FP=2/3=0.66, Recall=TP/TP+FN=2/4=0.5,
F1-Score=2PR/P+R= 2*0.66*0.5/0.66+0.5= 1.32/1.16=1.14 where P=precision, R=recall
Confusion Matrix is generally computed for classification problems not for linear regression.
Q9

Refer to the above three (regression) diagrams a, b, and c. What are the possible values of multiple r(coefficient of correlation) for all the
Ans; Multiple r (Coefficient of correlation) is the relationship between dependent and independent variable..It is generally measured in te
In the above diagrams : a) Positive correlation r= +1 b) Negative Correlation r=-1 and c) No correlation r=0

The table below shows the data (X=independent Variable, Y= dependent variable). Use this data
 to compute r(coefficient of correlation)
 to compute a(Y intercept) and b(slope) to fit a regression line Y=a+ bX.

X 1 2 3 4
Y 3 4 6 8

Ans. To compute r (coefficient of correlation): The linear correlation coefficient defines the degree of relation between two v
between two quantities.
If x & y are the two variables of discussion, then the correlation coefficient can be calculated using the formula.

Where n = Number of values or elements

∑x = Sum of 1st values list

∑y = Sum of 2nd values list

∑xy = Sum of the product of 1st and 2nd values

∑x2 = Sum of squares of 1st values

∑y2 = Sum of squares of 2nd values

Now to compute for line equation y=a +bx

Using the below mentioned formulae we compute the values of a and b. And finally line equation will be as follow

y=1 + 1.7x

You might also like