You are on page 1of 7
PassBooks OMA BT AN TEM BH Question Bank Introduction to Data Science T.Y. - B.Sc. Computer Science This Pass Book Covers All Imp Questions Mr. Mukesh C. Jain Mr. Suresh R. Agrawal B-Tech-1T, M.E~ Comp Engg., PRD(Per) _—_B.Tech- IT’, SCJP, Programming Polyglot Pinnacle Pride, 1° Floor, Nr. Durvankur Dining Hall, Opp. Cosmos Bank, ‘Above Maharashtra Electronics, Tilak Road, Pune-411030 Contact: 9823782121 / 7276030223 | Visit Us: www.nsgacademy.in J All rights reserved by NSG Academy. No part of this bookis to be reproduced or transmitted in any farm, Electronic, Mechanical, Photocopy or any information stored ina retrieval system without prior permission in writing, from NSG ‘Academy. Breach of this condition i liable for legal action. f@ sc AcaDEMy" NSG ACADEMY | 9823782121 | www.nsgacademy.in Unit 1 — Introduction to Data Science 1) Define Data Science. 2) Explain 3V’s of Data Science. 3) What are the applications of Data Science? 4) Why Learn Data Science? 5) Explain the life cycle of Data Science with diagram 6) What are the components of data scientist toolbox? Explain anyone. 7) Explain Structured Data, Semi structured Data, Unstructured Data. Note — Explanation should cover: Example, Characteristics, Advantages and Disadvantages. OR What are the various types of data available? Give example of each. 8) Difference between structured and Unstructured Data. 9) What are the problems associated with unstructured data? 10) Define data source. What are the different sources of Data in data science? Explain anyone. 11) What is Open Data? Which principles are associated with open data? 12) What is social media data? Give examples of social media having their own API's, 13) What is Multimodal Data. 14) Explain different data formats in brief. 15) How is information Stored in Files? OR Explain any two ways in which data is stored in files. Note: Text files, CSV files, ZIP files, JSON files, XML files, HTML files, TAR files, Image files, Gzip files etc. 16) Give the difference between Rasterized format and Vectorized Format. 17) What is Data Set? Which aspects need to know about data sets? Give examples of any two available data sets. 18) What is compressed Data? 19) What is CSV Format? 20) What are the uses of Zip files? NSG ACADEMY | 9823782121 | www.nsgacademy.in Unit 2 — Statistical Data Analysis 1) Define Statistical Data Analysis. 2) What is the role of Statistics in Data Science? 3) Define Descriptive Statistics. List its categories. Note : There are three categories of Descriptive Statistics - Refer following Measures | Frequency Range Siancard | Seviaton_| 1 Vatarce 4) Define Inferential Statistics. List its categories. Note : There are two categories of Inferential Statistics — Refer following Figure. tia Statistios Hypotnesis | tasting Parametric, Est ‘Nor-param 5) Explain the measures of central tendency in brief. Note - Student should explain 3 to 4 points each of following -— 1. Mean i. Median ll, Mode NSG ACADEMY | 9823782121 | www.nsgacademy.in 6) Explain the measures of Dispersion in brief. Note - Student should explain 3 to 4 points each of following - 1. Range Il, Standard Deviation I Variance IV. _ Interquartile Range 7) What is Hypothesis Testing? 8) 9) 10) 11) 12) 13) 14) 15) Define Null hypothesis and Alternate Hypothesis. Explain the methods of parameter estimation. Note: i) Point Estimate ii) Interval Estimate Describe Data Matrix vs Dissimilarity Matrix. What is Outlier? Explain the Types of Outliers. Explain the Outlier Detection Methods. What is meant by - Mean, Median, Mode, Range, Variance, Standard Deviation? Calculate: Mean, Median, Mode, Range, Variance, Standard Deviation for the following list of values- Note- Few questions based on Inferential Stat 12,9, 7,5, 13, 6,7 ics topic are not covered in this question bank. Preparation of these topics may be time consuming, so you can ignore it. NSG ACADEMY | 9823782121 | www.nsgacademy.in Unit 3 — Data Preprocessing 1) What is mean by Data Preprocessing? Why it is needed? Purpose. 2) Explain in short ~ Data Preprocessing Steps. Note - Write max 2 to 3 points from each of the following steps- i) Data Cleaning ii) Data Integration iii) Data Transformation iv) Data Reduction v) Data Discretization 3) Define Data Object. 4) Whatis an attribute? Explain different types of data attributes with example. 5) What is Data Quality? Which factors are affected data quality? 6) What is Data Wrangling OR Munging? Note: Data Wrangling is also known as Data Munging. 7) Which operations/steps are involved in Data Wrangling / Data Munging? Note : Students should write 2 to 3 points each of the following — i) Data Cleaning Data Transformation iii) Data Reduction iv) _ Discretization 8) Define Data Cleaning. Why it is needed? Role of Data Cleaning. 9) What is missing Values? Explain two methods of data cleaning for missing values. 10) What is noisy data? Explain the causes/cases of noisy data. 11) Explain the various formatting issues. 12) What is Data Transformation? Benefits of Data Transformation. 13) Explain various Data Transformation methods/strategies/techniques. Note : Students should write 3 to 4 points each of the following — i) Rescaling ii) Normalizing iii) Binarizing iv) Standardizing v) Label and one hot encoding. NSG ACADEMY | 9823782121 | www.nsgacademy.in 14) What is Data Reduction? Purpose and benefits. 15) Explain Data reduction methods/strategies/techniques. Students should write 3 to 4 points each of the following - i) Dimensionality Reduction li) Data Cube Aggregation ii) Numerosity Reduction 16) Explain components of dimensionality. Note : Dimensionality can be divided into two main components- i) Feature Selection ii) Feature Extraction 17) What is Data Cube? 18) What is feature selection and feature extraction? Explain the methods/techniques of feature selection. 19) What is Data Discretization? Explain its types/approaches. Note : Types are - Top-down Discretization and Bottom-up Discretization. 20) List any two libraries used in Python for data analysis. Ans — Two commonly used libraries in Python for data analysis are: Pandas: Pandas is a powerful library for data manipulation and analysis. It provides data structures and tools for handling structured data, such as tables or time series data, making it easier to clean, manipulate, and analyze data. ll, NumPy: Numpy is a fundamental package for scientific computing in Python. It provides support for arrays, matrices, and mathematical functions, allowing for efficient numerical operations and data manipulation. NumPy is often used in conjunction with Pandas for handling numerical data efficiently. NSG ACADEMY | 9823782121 | www.nsgacademy.in Unit 4 — Data Visualization 1) What is EDA? 2) Define Data Visualization. Why the data visualization important for data analysis? Explain. 3) What is Visual coding? Types of Visual Coding. 4) Explain the concept of Visualization graph. 5) Explain in brief any two software's used for Data Visualization. Note: Tableau, Qlikview, Sisense, Looker, Microsoft PowerBI etc. 6) Explain Data visualization libraries in Python Note — Student should prepare 4 to 5 points each of the following libr Matplotlib, Seaborn, ggplot, Bokeh, plotly, Leather, Pygal, geoplotlib, Gleam, missing no library. 7) Write note on — Basic Data Visualization tools. Note: Student should study at least 4 to 5 points each of the following, any one can be asked in exam ~ Histograms, Bar charts/graphs, Scatter plots, Line Charts, Area Plots, Pie Charts and Donut Charts. 8) Write a short note on — Specialized Data Visualization tools. Note: Student should study at least 4 to 5 points each of the following, any one can be asked in exam ~ Boxplots, Bubble Plots, Heat Map, Dendrogram, Venn Diagram, Tree map, 3D Scatter Plots. 9) Write a short note on — Advanced Data Visualization Tool. Note - There are many Advance Data Visualization Tools are available like Violin Plots, Network Charts, Contour Maps, Radar Charts, Waffle Charts, Word cloud ete. But in syllabus WORDCLOUD is mentioned so study only WORDCLOUD in detail. 10) Write a short note on wordclouds. 11) What is Venn diagram? How to create it? Explain with example. 12) What is Histogram and Bar Chart? How to create them? What is difference between them? 13) Define following term- Histograms, Bar charts/graphs, Scatter plots, Line Charts, Area Plots, Pie Charts Donut Charts, Boxplots, Bubble Plots, Heat Map, Dendrogram, Venn Diagram, Tree map, 3D Scatter Plots 14) Explain the Data Visualization types in detail. 15) Explain the concept of Geospatial data, its libraries and tools. oa NSG ACADEMY | 9823782121 | www.nsgacademy.in

You might also like