FDS practice question paper
Here is Question Paper – II for Second Year B.Tech (AI & DS) based on the Data Science syllabus and
aligned with Dr. D.Y. Patil Vidyapeeth Pune's pattern. This version is for 60 marks and follows your
specified format.
DATA SCIENCE – End Semester Examination
Class: B.Tech (AI & DS) – Second Year
Subject Code: PCC-DS-401
Time: 2 Hours
Max. Marks: 60
Q.1 (15 Marks) – Multiple Choice Questions (1 mark each)
(Choose the correct option. Each question carries 1 mark.)
1. What type of data is “Yes/No”?
a) Ordinal
b) Nominal
c) Continuous
d) Discrete
2. Which tool is used to create interactive dashboards?
a) NumPy
b) Power BI
c) Notepad
d) MySQL
3. The IQR in a box plot is defined as:
a) Max - Min
b) Q1 - Q3
c) Q3 - Q1
d) Median - Mode
4. Data integration helps in:
a) Replacing data
b) Merging data from different sources
c) Visualizing data
d) Training models
5. Which algorithm is used in polynomial regression?
a) Logistic
b) Linear
c) Polynomial
d) All of the above
6. What is the use of feature selection?
a) Increase features
b) Reduce model accuracy
c) Improve model performance
d) Add noise
7. Data cleaning includes all of the following except:
a) Removing duplicates
b) Fixing data types
c) Creating dashboards
d) Handling missing values
8. Which is a filter-based method for feature selection?
a) Correlation
b) RFE
c) Lasso
d) Gradient Boosting
9. What does a high p-value (> 0.05) indicate in ANOVA?
a) Groups are similar
b) Groups are significantly different
c) Data is linear
d) Strong correlation
10. Cross-validation helps to:
a) Overfit a model
b) Improve training time
c) Estimate model generalization
d) Add more features
11. What is the default color gradient in a heatmap?
a) Red-Green
b) Coolwarm
c) Jet
d) Plasma
12. Which of the following is a type of supervised learning?
a) Clustering
b) Regression
c) PCA
d) Association
13. Ridge Regression is used to:
a) Prevent overfitting
b) Remove outliers
c) Improve graphics
d) Add features
14. Data Visualization helps in:
a) Making predictions
b) Training models
c) Understanding data patterns
d) Collecting data
15. Outliers are visualized best using:
a) Bar chart
b) Histogram
c) Pie chart
d) Box plot
Q.2 (21 Marks) – Answer Any Seven (Each 3 marks)
1. Define and explain the term “Overfitting.”
2. List three key steps in the Data Science Life Cycle.
3. What is the significance of statistical intervals?
4. Describe the importance of data visualization in storytelling.
5. What is data wrangling? List any two steps involved.
6. Differentiate between Nominal and Ordinal data.
7. Mention three big data technologies.
8. What is the purpose of feature selection in model building?
9. Write a short note on the role of a Data Scientist.
Q.3 (15 Marks) – Answer Any Three (Each 5 marks)
1. Explain how correlation is used in regression models.
2. Describe a case where EDA helped improve model performance.
3. Write a Python program to impute missing values in a dataset.
4. What is the role of pivot tables in data analysis?
Q.4 (9 Marks) – Answer Any One (10 marks)
1. Design a complete Data Science Life Cycle for a fraud detection system.
2. Explain Ridge Regression with a sample Python implementation and its importance.
3. Show how heatmaps are used to find correlation between features in Python.
Let me know if you’d like this formatted as a printable PDF or editable Word document.