DS

1.The process of getting insights from raw data is called .
A. Data extraction B. Data Analysis C. PreProcessing D. Data Science.
2.ETL stands for

A. Extract,Traversal,learn B. Extract,Transform,learn C. extract ,transform,load
D. none of the above
3.Which is an example of binary data.

A. 0 or 1 B. Toss of a coin C. Switch on or off D. All
4.Which statement is not correct for ordinal data?

A. Data is ordered. B. It has a natural hierarchy. C. The intervals between the ranks may not
be necessarily equal. D. None
5.Full Form of IDE

A. Integrated Development Environment B. Independent Development Environment
C. Inclusive Data Environment D. none of the above
6.Which statement is not true for Big Data

A. Big is a moving target B. Big is when it can’t fit on one machine
C. Big data is a cultural phenomenon D. None
7.Mode is
A. Typical Value B. Most Common value C. Average Value D. None
8.Five number summary is

A. minimum, maximum, mean,median and mode B. mean, median, mode and the quartiles.
C. minimum, maximum, mean and the quartiles. D. minimum, maximum, median and the
quartiles.
9.________________is a visual representation of the five number summary.

A. Boxplot B. Histogram C. dotplot D. none
10.______is a data point which differs so much from the rest of the data that it doesn’t seem to
belong to the set.
A. Quantile B. IQR C. Outlier D. None
11._________ is the graphical representation of information and data.

A. Data visualisation B. Data representation C. Both a and b D. None
12.EDA stands for

A. Exploratory data Analysis B. Explainatory data Analysis C. Exploratory design Analysis
D. None
13.The process of transforming and mapping data from one "raw" data form into analytic ready
format is called_____
A. Data Wrangling B. Data analysis C. both a and b D. None.
14.What are the challenges of data extraction?
A. merge/join data sets from diverse sources B. Security C. Both D. None
15.Data cleaning involves

A. Handling missing data B. Correcting data types of Variables C. Correcting Errors of input
D. All E. None
16.A______________ is a list of the observed categories and a count of the number of observations
in each.
A. Matrix B. Frame C. Frequency distribution D. None
17.Which statement is not correct for Exploratory Data Analysis A. Donald Knuth
A. involves graphical displays of data. B. involves numerical summaries of data.
C. First step in data analysis D. All E. None
18.Which statement is wrong about big data

A. Collecting and using a lot of data rather than a small sample
B. Accepting messiness in data
C. Giving up on knowing the causes
D. Can fit into one machine.
19.which statements are true for Nominal Data

A. Categoriacal data where the data is coded in a manner that it represents a label.
B. Can only count but cannot order or measure nominal data
C. Both
D. None
20.Which is not an example of discreet data?

A. No. of People in a room
B. No. of items in a basket
C. No. of hours in a day
D. Height , weight
21.____________ includes huge volume, high velocity, and extensible variety of datA)
A.Massive data
B.Extensive data
C.Small data
D.Big data
22.A multi-disciplinary field that uses scientific methods,processes,algorithms and systems
to extract knowledge from data is called
A) Data Science
B) Machine learning
C) Computer Science
D) Statistical Research
23.For ___________ observation, researcher specifies in detail what is to be observed and
how measurements are to be recordeD)
A.Exploratory
B. Functional
C. Structured
D. Unstructured
24.____________ is the process which involves extracting data from various source systems.
A.Data analysis
B.Data transformation
C.Data Extraction
D.Data warehouse
25. Data from entire population or sample is summarized with mean,standard deviation
in
A) Descriptive Statistics
B) Inferential Statistics
C) Predictive Analytics
D) Enhanced Data Analysis
26. Data Curation process:
A) Preserving
B) Sharing
C) Discovering
D) All
27.________________ visualizes the distribution of data over a continuous interval or

certain time perioD)
A) Box plot
B) Pie chart
C) Histogram
D) Bar chart
28. Which of the following graph can be used for simple summarization of data?
A) Scatterplot
B) Overlaying
C) Barplot
D) Bargraph
29. Phenomenon under data analysis used for gaining a better understanding of data is
called ______________
A) Exploratory Data Analysis
B) Explore Data Analysis
C) Exploratory Data Analytics
D) Enhanced Data Analysis
30. Height is example of _____________.
a)Continuous Data
b)Discrete Data
c)Ordinal data
d) Quantative Data
31. _______________ data is difficult to manipulate and typically needs to be

processed in some way before it can be used in standard data analysis software.
a) Structured data
b) Unstructured data
c) Summerized data
d) Frequency data
32. Which of the following is known as raw data?
a) tidy data
b) unprocessed data
c) flat data
d) all of the mentioned
33. Point out the correct statement.
a) Primary data is original source of data

b) Secondary data is original source of data
c) Questions are obtained after data processing steps
d) None of the Mentioned
34 Which of the the following function gives first few rows of data information from
the table?
a) head
b) tail
c) summary
d) none of the mentioned
35. Data that summarize all observations in a category are called __________ data.
a) frequency
b) summarized
c) raw
36.Which of the following functions is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
37. When data are classified according to a single characteristic, it is called:
(a) Quantitative classification (b) Qualitative classification
(c) Area classification (d) Simple classification
38.Cluster is
A.)Group of similar objects that differ significantly from other objects

B.)Operations on a database to transform or simplify data in order to prepare it for a
machine-learning algorithm
C.)Symbolic representation of facts or ideas from which information can potentially be

extracted
D.)None of these
39. Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the m
40. Suppose we would like to perform clustering on spatial data such as the geometrical
locations of houses. We wish to produce clusters of many different sizes and shapes. Which
of the following methods is the most appropriate?
a) Decision Trees b) Density-based clustering
c) Model-based clustering d) K-means clustering
41. In classification, the data are arranged according to:
(a) Similarities (b) Differences

(c) Percentages (d) Ratios
42. Classification is
A.) A subdivision of a set of examples into a number of classes
B.)A measure of the accuracy, of the classification of a concept that is given by a certain
theory
C.)The task of assigning a classification to a set of examples
D.)None of these
43. ______ measures asymmetry about the mean of the probability
distribution of a random variable.
a. skewness
b. covariance
c. variance
d. Kurtosis
44. ___________shows all individual data points.
a. Box-plot
b. scatter plot
c. line plot
d. pie chart
45. What is true about Data Visualization?
A. Data Visualization is used to communicate information clearly and efficiently to

users by the usage of information graphics such as tables and charts.
B. Data Visualization helps users in analyzing a large amount of data in a simpler
way.
C. Data Visualization makes complex data more accessible, understandable, and
usable.
D. All of the above
46. Data can be visualized using?
A. graphs
B. charts
C. maps
D. All of the above
47. Which of the following is false?
A. data visualization include the ability to absorb information quickly

B. Data visualization is another form of visual art
C. Data visualization decrease the insights and take slower decisions
D. None Of the above
48. Which of the following are the Data Sources in data science?
A. Structured
B. UnStructured
C. Both A and B
D. None Of the above
49. Amazon Web Services falls into which of the following cloud-computing
category?
● Platform as a Service
● Software as a Service
● Infrastructure as a Service
● Back-end as a Service
50. What are the Authentication in AWS?
● User Name/Password
● Access Key
● Access Key/ Session Token
● All of the above
51 Which of the following is the most important language for Data Science?
a) Java
b) Ruby
c) R
d) None of the mentioned
52. Point out the wrong statement.
a) Merging concerns combining datasets on the same observations to produce a result with
more variables
b) Data visualization is the organization of information according to preset specifications
c) Subsetting can be used to select and exclude variables and observations

d) All of the mentioned
53. ___________ provides an web service interface that provides resizable compute capacity in
the AWS cloud.
a)EC2
B)S3
C)ES2
D)EC3
54. MongoDB support cross platform and is written in ___________language.

a) C++
b) R
c)Java
d)Python
55. MongoDB is _______ Database.

a)SQL
b)NoSQL
c)RDBMS
d)DBMS

DS

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DS

Uploaded by

Copyright:

Available Formats

1.The process of getting insights from raw data is called .

A. Data extraction B. Data Analysis C. PreProcessing D. Data Science.

2.ETL stands for

3.Which is an example of binary data.

4.Which statement is not correct for ordinal data?

5.Full Form of IDE

6.Which statement is not true for Big Data

8.Five number summary is

9.________________is a visual representation of the five number summary.

11._________ is the graphical representation of information and data.

12.EDA stands for

15.Data cleaning involves

18.Which statement is wrong about big data

19.which statements are true for Nominal Data

20.Which is not an example of discreet data?

27.________________ visualizes the distribution of data over a continuous interval or

31. _______________ data is difficult to manipulate and typically needs to be

a) Primary data is original source of data

d) none of the mentioned

37. When data are classified according to a single characteristic, it is called:

(a) Quantitative classification (b) Qualitative classification

(c) Area classification (d) Simple classification

A.)Group of similar objects that differ significantly from other objects

C.)Symbolic representation of facts or ideas from which information can potentially be

a) defined distance metric

c) initial guess as to cluster centroids

a) Decision Trees b) Density-based clustering

c) Model-based clustering d) K-means clustering

41. In classification, the data are arranged according to:

(a) Similarities (b) Differences

distribution of a random variable.

44. ___________shows all individual data points.

45. What is true about Data Visualization?

A. Data Visualization is used to communicate information clearly and efficiently to

46. Data can be visualized using?

47. Which of the following is false?

A. data visualization include the ability to absorb information quickly

50. What are the Authentication in AWS?

d) None of the mentioned

52. Point out the wrong statement.

b) Data visualization is the organization of information according to preset specifications

c) Subsetting can be used to select and exclude variables and observations

54. MongoDB support cross platform and is written in ___________language.

55. MongoDB is _______ Database.

You might also like