You are on page 1of 9

1.The process of getting insights from raw data is called .

A. Data extraction B. Data Analysis C. PreProcessing D. Data Science.

2.ETL stands for


A. Extract,Traversal,learn B. Extract,Transform,learn C. extract ,transform,load
D. none of the above

3.Which is an example of binary data.


A. 0 or 1 B. Toss of a coin C. Switch on or off D. All

4.Which statement is not correct for ordinal data?


A. Data is ordered. B. It has a natural hierarchy. C. The intervals between the ranks may not
be necessarily equal. D. None

5.Full Form of IDE


A. Integrated Development Environment B. Independent Development Environment
C. Inclusive Data Environment D. none of the above

6.Which statement is not true for Big Data


A. Big is a moving target B. Big is when it can’t fit on one machine
C. Big data is a cultural phenomenon D. None

7.Mode is
A. Typical Value B. Most Common value C. Average Value D. None

8.Five number summary is


A. minimum, maximum, mean,median and mode B. mean, median, mode and the quartiles.
C. minimum, maximum, mean and the quartiles. D. minimum, maximum, median and the
quartiles.

9.________________is a visual representation of the five number summary.


A. Boxplot B. Histogram C. dotplot D. none

10.______is a data point which differs so much from the rest of the data that it doesn’t seem to
belong to the set.
A. Quantile B. IQR C. Outlier D. None

11._________ is the graphical representation of information and data.


A. Data visualisation B. Data representation C. Both a and b D. None

12.EDA stands for


A. Exploratory data Analysis B. Explainatory data Analysis C. Exploratory design Analysis
D. None

13.The process of transforming and mapping data from one "raw" data form into analytic ready
format is called_____
A. Data Wrangling B. Data analysis C. both a and b D. None.
14.What are the challenges of data extraction?
A. merge/join data sets from diverse sources B. Security C. Both D. None

15.Data cleaning involves


A. Handling missing data B. Correcting data types of Variables C. Correcting Errors of input
D. All E. None

16.A______________ is a list of the observed categories and a count of the number of observations
in each.
A. Matrix B. Frame C. Frequency distribution D. None

17.Which statement is not correct for Exploratory Data Analysis A. Donald Knuth
A. involves graphical displays of data. B. involves numerical summaries of data.
C. First step in data analysis D. All E. None

18.Which statement is wrong about big data


A. Collecting and using a lot of data rather than a small sample
B. Accepting messiness in data
C. Giving up on knowing the causes
D. Can fit into one machine.

19.which statements are true for Nominal Data


A. Categoriacal data where the data is coded in a manner that it represents a label.
B. Can only count but cannot order or measure nominal data
C. Both
D. None

20.Which is not an example of discreet data?


A. No. of People in a room
B. No. of items in a basket
C. No. of hours in a day
D. Height , weight
21.____________ includes huge volume, high velocity, and extensible variety of datA)
A.Massive data
B.Extensive data
C.Small data
D.Big data
22.A multi-disciplinary field that uses scientific methods,processes,algorithms and systems
to extract knowledge from data is called
A) Data Science
B) Machine learning
C) Computer Science
D) Statistical Research
23.For ___________ observation, researcher specifies in detail what is to be observed and
how measurements are to be recordeD)
A.Exploratory
B. Functional
C. Structured
D. Unstructured
24.____________ is the process which involves extracting data from various source systems.
A.Data analysis
B.Data transformation
C.Data Extraction
D.Data warehouse

25. Data from entire population or sample is summarized with mean,standard deviation
in

A) Descriptive Statistics
B) Inferential Statistics
C) Predictive Analytics
D) Enhanced Data Analysis
26. Data Curation process:

A) Preserving
B) Sharing
C) Discovering
D) All

27.________________ visualizes the distribution of data over a continuous interval or


certain time perioD)
A) Box plot
B) Pie chart
C) Histogram
D) Bar chart
28. Which of the following graph can be used for simple summarization of data?

A) Scatterplot

B) Overlaying

C) Barplot

D) Bargraph

29. Phenomenon under data analysis used for gaining a better understanding of data is
called ______________
A) Exploratory Data Analysis
B) Explore Data Analysis
C) Exploratory Data Analytics
D) Enhanced Data Analysis
30. Height is example of _____________.

a)Continuous Data

b)Discrete Data

c)Ordinal data

d) Quantative Data

31. _______________ data is difficult to manipulate and typically needs to be


processed in some way before it can be used in standard data analysis software.
a) Structured data
b) Unstructured data
c) Summerized data
d) Frequency data
32. Which of the following is known as raw data?
a) tidy data
b) unprocessed data
c) flat data
d) all of the mentioned
33. Point out the correct statement.

a) Primary data is original source of data


b) Secondary data is original source of data
c) Questions are obtained after data processing steps
d) None of the Mentioned
34 Which of the the following function gives first few rows of data information from
the table?
a) head
b) tail
c) summary
d) none of the mentioned
35. Data that summarize all observations in a category are called __________ data.
a) frequency
b) summarized

c) raw
d) none of the mentioned
36.Which of the following functions is used for k-means clustering?

a) k-means

b) k-mean

c) heatmap

d) none of the mentioned

37. When data are classified according to a single characteristic, it is called:

(a) Quantitative classification (b) Qualitative classification

(c) Area classification (d) Simple classification

38.Cluster is

A.)Group of similar objects that differ significantly from other objects


B.)Operations on a database to transform or simplify data in order to prepare it for a
machine-learning algorithm

C.)Symbolic representation of facts or ideas from which information can potentially be


extracted

D.)None of these
39. Which of the following is required by K-means clustering?

a) defined distance metric

b) number of clusters

c) initial guess as to cluster centroids

d) all of the m

40. Suppose we would like to perform clustering on spatial data such as the geometrical
locations of houses. We wish to produce clusters of many different sizes and shapes. Which
of the following methods is the most appropriate?

a) Decision Trees b) Density-based clustering

c) Model-based clustering d) K-means clustering

41. In classification, the data are arranged according to:

(a) Similarities (b) Differences


(c) Percentages (d) Ratios

42. Classification is
A.) A subdivision of a set of examples into a number of classes
B.)A measure of the accuracy, of the classification of a concept that is given by a certain
theory
C.)The task of assigning a classification to a set of examples
D.)None of these
43. ______ measures asymmetry about the mean of the probability

distribution of a random variable.

a. skewness

b. covariance
c. variance

d. Kurtosis

44. ___________shows all individual data points.

a. Box-plot

b. scatter plot

c. line plot

d. pie chart

45. What is true about Data Visualization?

A. Data Visualization is used to communicate information clearly and efficiently to


users by the usage of information graphics such as tables and charts.
B. Data Visualization helps users in analyzing a large amount of data in a simpler
way.
C. Data Visualization makes complex data more accessible, understandable, and
usable.
D. All of the above

46. Data can be visualized using?

A. graphs
B. charts
C. maps
D. All of the above

47. Which of the following is false?

A. data visualization include the ability to absorb information quickly


B. Data visualization is another form of visual art
C. Data visualization decrease the insights and take slower decisions
D. None Of the above
48. Which of the following are the Data Sources in data science?

A. Structured
B. UnStructured
C. Both A and B
D. None Of the above

49. Amazon Web Services falls into which of the following cloud-computing
category?
● Platform as a Service
● Software as a Service
● Infrastructure as a Service
● Back-end as a Service

50. What are the Authentication in AWS?

● User Name/Password
● Access Key
● Access Key/ Session Token
● All of the above

51 Which of the following is the most important language for Data Science?

a) Java

b) Ruby

c) R

d) None of the mentioned

52. Point out the wrong statement.

a) Merging concerns combining datasets on the same observations to produce a result with
more variables

b) Data visualization is the organization of information according to preset specifications

c) Subsetting can be used to select and exclude variables and observations


d) All of the mentioned

53. ___________ provides an web service interface that provides resizable compute capacity in
the AWS cloud.

a)EC2

B)S3

C)ES2

D)EC3

54. MongoDB support cross platform and is written in ___________language.


a) C++

b) R

c)Java

d)Python

55. MongoDB is _______ Database.


a)SQL

b)NoSQL

c)RDBMS

d)DBMS

You might also like