You are on page 1of 1

NED UNIVERSITY OF ENGINEERING & TECHNOLOGY

MS – DATA SCIENCE
FALL 2023
TOOLS AND TECHNIQUES
FOR DATA SCIENCE (CT-583)
Assignment 1
Max Marks: 10
Due Date: 25-Nov-2023

Question 1
Suppose you are conducting a study to determine whether there is an association between gender and
preference for a particular brand of soda. You survey 100 people and ask them whether they prefer Brand
A or Brand B, as well as their gender. The results are as follows: [2.5 marks]
Brand A Brand B Total
Male 30 20 50
Female 25 25 50
Total 55 45 100
Using the chi-square method, test the hypothesis that gender and brand preference are independent. Use a
significance level of 0.05.

Question 2 [2.5 marks]


Apply the smoothing (binning) methods with equi-depth i.e., by bin means and by bin boundaries to
pre-process the following data:

T, O, U, L, S, T, E, C, H, N, I, Q

Question 3 [2.5 marks]


Suppose we have a dataset of 100 people who applied for a credit card. Each person is described by
four features: age (in years), income, employment status, and credit score. The target variable is
whether or not the person was approved for a credit card (Y for approved, N for not approved). Here's a
small subset of the data:
Age Income Employment Credit Score Approved
18-23 Low No Max Y
24-35 Medium Yes Max Y
36-50 High Yes Min N
18-23 Low Yes Max Y
24-35 Medium No Min N

Construct a decision tree that predicts whether or not a person will be approved for a credit card using
information gain to determine which feature to split on at each node.

Question 4 [2.5 marks]


What is the probability that predicts whether or not a person will be approved for a credit card, according to
Naive Bayes using the scenario of Question 3?

You might also like