Professional Documents
Culture Documents
Unit 1
MCQs
2. Which of the following data is put into a formula to produce commonly accepted results?
a) Raw
b) Processed
c) Synchronized
d) All of the Mentioned
Unit 2
• Differentiate between descriptive statistics and inferential statistics with examples.
• Difference between mean, mode, median
• Difference between standard deviation, variance, range, inter quartile range
• Differentiate between population, sample, parameter, statistic
• Real life examples of normal distribution and binomial distribution
• Differentiate between: Point Estimate, Interval Estimate & Confidence Interval.
• Explain null and alternative hypothesis by considering the example for a flipping coin.
• Explain Type 1 & Type 2 errors in hypothesis testing with suitable examples.
• What is Significance Level? How it regulates the possibility of occurrence of Type 1 & Type 2
errors?
• Explain p values with example.
• Explain the interrelationship of Margin of Error and Standard Error?
• In the population, the average IQ is 100 with a standard deviation of 15. A team of scientists
want to test a new medication to see if it has either a positive or negative effect on intelligence
or not effect at all. A sample of 30 participants who have taken the medication has a mean of
140. Did the medication affect intelligence?
• Study the data distribution given in table and answer the questions below.
Value 1 2 3 4 5 6 7 8
No. of data points with 1 0 0 3 4 10 12 8
that value i.e. frequency
o What is the mean value?
o How would you describe the data distribution? Why?
Data Science – Question Bank - CVV
Unit 3
• Differentiate between Euclidean distance and Manhattan distance.
• Explain how gradient descent is used to fit parameterized models.
• Constrained optimization vs. unconstrained optimization
• Linear vs. Non Linear optimization
• Discrete optimization vs. Non Discrete optimization
• Explain the concept of Lp norm.
• State the advantages and disadvantages of using L1 norm.
• Illustrate with an example, L1 metric distance is always larger than 1.2 metric distance.
• Draw a typical Hessian Matrix? Indicate how is it used in Optimization
• Explain Gradient Descent.
Data Science – Question Bank - CVV
Unit 4
• What is machine learning? What is its role in data Science?
• Explain supervised and unsupervised machine learning?
• Applications of classifications in real life situations
• Regression vs. Clustering vs. Classification
•
• Why we measure impurity of a resulting node in Decision tree? List the different measures of
impurity in DT?
• There are 4 coins A, B, C and D out of which 3 coins are of equal weight and one coin is heavier.
Find out the heavier coin using Decision Tree.
Data Science – Question Bank - CVV
Unit 5
• Cluster the followingg eight points (with (x, y) representing locations) into three clusters: A1(2,
10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9). Initial cluster centers are:
A1(2, 10), A4(5, 8) and A7(1, 2). The distance function between two pointspoint a = (x1, y1) and b =
(x2, y2) is defined as-- Ρ(a, b) = |x2 – x1| + |y2 – y1|. Use K-Means
Means Algorithm to find the three
cluster centers after the second iteration.
• Apply KNN and predict the class for the test point (3,7) for k=3. Training points with class are
(x,y,class). (7,7,2), (7,4,2), (3,4,1), (1,4,1), (2,5,2), (3,8,1)
• Use K-Means
Means Algorithm to create two clusters.
clusters. Assume A(2, 2) and C(1, 1) are centers of the
two clusters.
• Marks scored by 10 students in mathematics and computer science are given in table below along
with their result as Pass or Fail. Pappu scores 41 marks in mathematics and 38 marks in computer
science. Using KNN classifier algorithm, determine whether Pappu has passed or failed using K as
1,2,3,5 and 7.
Student Mathematics Computer Result
Science
Naren 80 80 Pass
Amit 75 40 Pass
Deven 65 50 Pass
Surya 40 40 Pass
Sanjay 70 40 Pass
Teja 65 37 Fail
Akhilesh 70 25 Fail
Sharad 38 38 Fail
Data Science – Question Bank - CVV
Ajit 35 59 Fail
Shivraj 70 65 Pass
• Using the Naïve Bayes Classifier approach based on the training data set given in table.
Predict Class = Buy Laptop: Yes or No for the feature set: {Income = Low; Student =
No; Credit Rating = Excellent}
Unit 6
1. Accuracy
2. Precision
3. Recall
4. Specificity
5. F-Score
6. Error rate
• Explain the following methods used for training and testing –
1. Re substitution
2. K fold Cross-validation
3. Bootstrapping