Data Science
UNIT I – Introduction to Core Concepts and Technologies
2 Marks Questions (Short Answer)
1. Define Data Science.
2. What is the main goal of data science?
3. List any two components of the data science process.
4. Name any two tools commonly used in the data science toolkit.
5. What are the types of data used in data science?
6. Define structured data and give one example.
7. What is unstructured data?
8. Give any two applications of data science.
9. Mention one ethical issue related to data science.
10. What is the role of a data scientist?
5–7 Marks Questions (Descriptive / Analytical Type)
1) Explain the term Data Science and describe its significance in modern technology.
2) Illustrate the data science process with a neat diagram and explain each step.
3) Describe the data science toolkit and discuss the roles of Python, R, and Jupyter Notebook.
4) Explain different types of data (structured, semi-structured, unstructured) with examples.
5) Discuss any three real-world applications of data science in detail.
6) Explain the difference between data science, data analytics, and machine learning.
7) Describe the ethical issues involved in data collection and analysis with examples.
8) Write short notes on data-driven decision-making and its importance in organizations.
9) Explain the key terminologies used in data science such as dataset, feature, label, and model.
10) Analyze the role of data science in business intelligence and prediction with suitable
examples.
UNIT II: Data Collection and Management
2 Marks Questions (Short Answer Type)
a. Define data collection.
b. What are the primary sources of data?
c. Give two examples of APIs used for data collection.
d. What is the purpose of data cleaning?
e. Define data storage in the context of data science.
f. What do you mean by multiple data sources?
g. Name any two Big Data frameworks.
h. What is web scraping?
i. Mention any two Python libraries used for web scraping.
j. What is the role of data management in data science?
5–7 Marks Questions (Descriptive / Analytical Type)
1. Explain the process of data collection in data science with suitable examples.
2. Describe the various sources of data – primary, secondary, structured, and unstructured.
3. Explain the concept of APIs for data collection. Give examples of APIs used in real-world
applications.
4. Discuss different methods used for exploring and fixing data before analysis.
5. Explain the importance of data storage and management in the data science lifecycle.
6. What are the challenges of using multiple data sources, and how can they be managed
effectively?
7. Describe the Big Data frameworks such as Hadoop and Spark, and explain their roles in data
management.
8. Explain the process of web scraping using Python. Mention the steps involved and tools
required.
9. Write a short note on Python libraries used for web scraping such as BeautifulSoup and
Scrapy with examples.
10. Compare and contrast traditional data storage systems with Big Data storage frameworks.
UNIT III – Data Analysis
2 Marks Questions (Short Answer Type)
1. Define data analysis.
2. What is the purpose of statistics in data analysis?
3. Define mean, median, and mode.
4. What is variance? Write its formula.
5. Define standard deviation.
6. What is the Central Limit Theorem (CLT)?
7. Write one application of linear regression.
8. What does SVM (Support Vector Machine) do?
9. Define K-Means clustering.
10. Differentiate between supervised and unsupervised learning
5–7 Marks Questions (Descriptive / Numerical Type)
1. Explain the role of statistics in data analysis. Discuss the different stages involved in the data
analysis process.
2. Compute the mean, median, and mode for the following dataset:
12,15,10,18,15,20,1512, 15, 10, 18, 15, 20, 1512,15,10,18,15,20,15
3. Define variance and standard deviation. Calculate both for the dataset:
5,7,9,10,125, 7, 9, 10, 125,7,9,10,12
4. Explain the Central Limit Theorem (CLT) and its significance in data analysis.
5. Discuss different data distribution types (normal, skewed, uniform) with suitable diagrams.
6. Describe the linear regression model and compute the line of best fit for the following data:
X 1 2 3 4
Y 2 4 6 8
7. Explain the concept of Support Vector Machine (SVM) with a suitable diagram.
8. Discuss the Naïve Bayes algorithm with an example.
Example:
Suppose 60% of emails are spam. If 80% of spam emails contain the word “offer”, and 20%
of non-spam emails contain “offer”,
Find the probability that an email containing “offer” is spam (use Bayes’ theorem).
9. Explain the Decision Tree algorithm and show how a simple dataset can be split using
information gain.
10. Describe the K-Means clustering algorithm and perform one iteration for the following
points:
Data points: (2,3), (3,4), (8,7), (9,8)
Initial centroids: C1(2,3) and C2(9,8)