You are on page 1of 7

Data Science quiz questions

1. What is data science primarily concerned with?


- A) Collecting data
- B) Analyzing data
- C) Storing data
- D) Creating data

**Answer: B) Analyzing data**

2. Which of the following is NOT a step in the data science process?


- A) Data collection
- B) Data cleaning
- C) Data deletion
- D) Data analysis

**Answer: C) Data deletion**

3. Which statistical measure describes the central tendency of a dataset?


- A) Mean
- B) Median
- C) Mode
- D) Range

**Answer: A) Mean**
4. What is the process of converting raw data into a more structured format
called?
- A) Data wrangling
- B) Data munging
- C) Data manipulation
- D) Data cleansing

**Answer: A) Data wrangling**

5. Which programming language is widely used for data analysis and


manipulation?
- A) Python
- B) Java
- C) C++
- D) HTML

**Answer: A) Python**

6. What type of analysis is used to identify patterns or relationships in data?


- A) Descriptive analysis
- B) Predictive analysis
- C) Inferential analysis
- D) Exploratory analysis

**Answer: D) Exploratory analysis**


7. Which type of data visualization is suitable for displaying trends over time?
- A) Scatter plot
- B) Histogram
- C) Line chart
- D) Pie chart

**Answer: C) Line chart**

8. What is the process of filling in missing data in a dataset called?


- A) Data integration
- B) Data imputation
- C) Data interpolation
- D) Data extrapolation

**Answer: B) Data imputation**

9. What is the term for a statistical model used to predict future outcomes based
on historical data?
- A) Descriptive model
- B) Predictive model
- C) Inferential model
- D) Exploratory model

**Answer: B) Predictive model**


10. Which of the following is NOT a supervised learning algorithm?
- A) Linear regression
- B) K-means clustering
- C) Decision tree
- D) Support vector machine

**Answer: B) K-means clustering**

11. What is the measure of a model's performance in making correct predictions


called?
- A) Accuracy
- B) Precision
- C) Recall
- D) F1 score

**Answer: A) Accuracy**

12. Which method is used to evaluate the performance of a classification


model?
- A) Confusion matrix
- B) Root mean squared error
- C) Mean absolute error
- D) R-squared

**Answer: A) Confusion matrix**


13. Which technique is used to reduce the dimensionality of a dataset?
- A) Principal component analysis (PCA)
- B) Linear regression
- C) Random forest
- D) Gradient boosting

**Answer: A) Principal component analysis (PCA)**

14. What type of learning algorithm does not require labeled training data?
- A) Supervised learning
- B) Unsupervised learning
- C) Semi-supervised learning
- D) Reinforcement learning

**Answer: B) Unsupervised learning**

15. Which algorithm is used for finding frequent itemsets in transactional


databases?
- A) Apriori
- B) K-nearest neighbors
- C) Naive Bayes
- D) Gradient descent

**Answer: A) Apriori**
16. What is the process of transforming categorical variables into numerical
values called?
- A) Feature engineering
- B) One-hot encoding
- C) Label encoding
- D) Normalization

**Answer: B) One-hot encoding**

17. What is the term for an extreme value that falls far from the majority of
other data points?
- A) Outlier
- B) Anomaly
- C) Noise
- D) Deviation

**Answer: A) Outlier**

18. Which method is used for dividing a dataset into training and testing sets?
- A) Cross-validation
- B) Holdout method
- C) Stratified sampling
- D) Bootstrap method

**Answer: B) Holdout method**


19. What is the process of scaling numerical features to a standard range called?
- A) Standardization
- B) Normalization
- C) Min-max scaling
- D) Feature scaling

**Answer: A) Standardization**

20. Which technique is used to address the problem of overfitting in machine


learning models?

- A) Regularization
- B) Data augmentation
- C) Dropout
- D) Ensemble learning

**Answer: A) Regularization**

You might also like