Confidential

GB (N)
KB (N)
1 The size of big data

MB (N)
Peta bytes (Y)
Facebook (N)
Google (N)
2 These data come from many sources like

Amazon (N)
All of these (Y)
huge data (N)
big data (N)
3 Satellite gives
Minimum data (N)
Both a and b (Y)
10^15 (Y)
10^17 (N)
4 Petabyte=
10^18 (N)
10^19 (N)
10^15 (N)
10^17 (N)
5 Exabyte=
10^18 (Y)
10^19 (N)
10^15 (N)
10^17 (N)
6 Zettabyte=
10^18 (N)
10^21 (Y)
10^21 (N)
10^22 (N)
7 Yottabyte=
10^23 (N)
10^24 (Y)
10^21 (N)
10^22 (N)
8 Padma=
10^23 (N)
10^32 (Y)
human brain capacity (Y)
animal's brain capacity (N)
9 2.5 petabytes =
downloaded documents of 1 day (N)
None of these (N)
genome sequences of 7 billion people (Y)
genome sequences of 8 billion people (N)
10 4.75 exabytes=
Total digital data created in 2008 (Y)
Total digital data created in 2009 (N)
11 422 exabytes=
10 billion messages (N)

A billion messages (Y)
12 Facebook can generate approximately

structured (N)
unstructured (N)
13 Big Data can be

semi-structured (N)
All of these (Y)
Tabular form (Y)
Non tabular form (N)
14 Structured data
variable form (N)
None of these (N)
structured (N)
Un structured (Y)
15 Any data with unknown form

semi-structured (N)
None of these (N)

heterogeneous data source (Y)
non heterogeneous data source (N)
16 Unstructured data is
fixed format (N)
None of these (N)
XML file (Y)
HTML file (N)
17 semi-structured data is a data represented in an

Node JS file (N)
None of these (N)
Structured data (N)
unstructured data (N)
18 Web server logs is an example of

Quasi-structured Data (Y)
None of these (N)
Filter (N)
Reliable (N)
19 Veracity means
manage data (N)
All of these (Y)

demanding data rapidly (Y)
demanding data slowly (N)
20 The primary aspect of Big Data is to provide

no demanding data (N)
None of these (N)
business processes (N)
sensors (N)
21 Big data velocity deals

mobile devices (N)
All of these (Y)
costs of data management (Y)
data quality (N)
22 Big Data approaches are reducing the

standardization (N)
None of these (N)
Identify the data we have (N)
Identify the data we need (N)
23 Big data focus on

what you want to achieve (N)
All of these (Y)

organizational fitness (N)
suitability of the business challenge (N)
24 Aspects of the appropriateness of big data

big data’s contribution to the organization" (N)
All of these (Y)
computing resources (N)
collection of storage (N)
Most big data applications achieve their

25
performance through Both collection of storage and resources (Y)
None of these (N)
Awareness of the architecture of computing platform (N)
Hardware (N)
26 Big data application is directly dependent on

software (N)
All of these (Y)
CPU (N)
processor (N)
27 Processing capability, often referred to as a

node (N)
All of these (Y)
memory (Y)
XML file (N)
Most single node machines have a limit to the

28
amount of XML file and memory (N)
None of these (N)
persistence of data (Y)
non-demanding data (N)
29 Storage provides
demanding data (N)
None of these (N)
persistence of data (N)
pipes (Y)
30 Network provides
demanding data (N)
All of these (N)
the pool of processing nodes (N)
assigns tasks (N)

31 A master job manager oversees
monitors the activity (N)
All of these (Y)
the data storage pool (N)
distributes datasets (N)
32 A storage manager oversees

Both the data storage pool and distribute datasets (Y)
None of these (N)
local (N)
close (N)
33 Threads process data

minimize the costs of data access latency (N)
All of these (Y)
distributed file system (Y)
Centralized file system (N)
34 Hadoop comes with a

Both distributed and central (N)
None of these (N)
difficult access (N)

easier access (Y)
35 HDFS provides
Both easier and difficult access (N)
None of these (N)
fault tolerant (N)
designed using low-cost hardware. (N)
36 HDFS is highly
Both fault tolerant and designed using low-cost hardware (Y)
None of these (N)
permissions (N)
authentication (N)
37 HDFS provides file

Both permissions and authentication (Y)
None of these (N)
64MB (Y)
64 peta bytes (N)
38 The default block size of HDFS is

64 TB (N)
64 GB (N)
read-write operations (Y)
read operations (N)
39 Datanodes perform
write operations (N)
None of these (N)
Manages the file system namespace (N)
Regulates client’s access to files (N)
40 The namenode performs

renaming (N)
All of these (Y)
distributed data nodes (Y)
data nodes (N)
41 The name node effectively coordinates

central data nodes (N)
None of these (N)
computation (N)
analysis (N)
42 Map functions is/are

set pairs (N)
All of these (Y)
parallel (Y)
sequential (N)
43 The tasks that can be executed in map reduce

logical (N)
None of these (N)
MapReduce (N)
HDFS (N)
44 what are the components of big data

YARN (N)
All of these (Y)
volume (N)
velocity (N)
45 What are the 4 V's in big data

variety (N)
All of these (Y)
Facebook (Y)
apple (N)
46 The world's largest hadoop cluster

datamatics (N)
None of these (N)
Structured (N)
Semi Structured (N)
have a structure but cannot be stored in a

47
database Unstructured (N)
None of these (Y)
Velocity (N)
variety (N)
refers to the ability to turn your data useful

48
for business Value (Y)
Volume (N)
HDFS (N)
Hadoop (Y)
is an open source framework for storing

49 data and running application on clusters of
commodity hardware. MapReduce (N)
Cloud (N)
Validation (Y)
Verification (N)
is factors considered before Adopting Big

50
Data Technology Data (N)
Design (N)
MAPPER (N)
REDUCER (Y)
takes the grouped key-value paired data as

51 input and runs a Reducer function on each one of
them COMBINER (N)
PARTITIONER (N)
Decision Nodes (N)
End Nodes (N)
Choose from the following that are Decision Tree

52
nodes? Chance Nodes (N)
All of Above (Y)
Decision tree (Y)
Graphs (N)
A is a decision support tool that uses a tree-
like graph or model of decisions and their
53
possible consequences, including chance event
outcomes, resource costs, and utility. Trees (N)
Neural Networks (N)
Disks (N)
Squares (Y)
54 Decision Nodes are represented by
Circles (N)
Triangles (N)
Disks (N)
Squares (N)
55 Chance Nodes are represented by

Circles (Y)
Triangles (N)
Disks (N)
Squares (N)
56 End Nodes are represented by

Circles (N)
Triangles (Y)
Possible Scenarios can be added (N)
Use a white box model, If given result is provided by a model (N)
Which of the following are the advantage/s of

57
Decision Trees? Worst, best and expected values can be determined for different
scenarios (N)
All of Above (Y)
data mining process (Y)

not data mining process (N)
58 classification is
clustering process (N)
None of these (N)
Attributes are equally important (N)
Attributes are statistically dependent of one another given the class

value (N)
Which of the following statements about Naive

59
Bayes is incorrect? Attributes are statistically independent of one another given the class
value. (Y)
Attributes can be nominal or numeric (N)
Full distribution (N)
Joint distribution (Y)
How the bayesian network can be used to answer

60
any query?" Partial distribution (N)
All of these (N)
Functionally dependent (N)
Dependant (N)
What is the consequence between a node and its

61
predecessors while creating bayesian network?" Conditionally independent (Y)
Both Conditionally dependant & Dependant (N)
A component of a network (Y)
In the context of KDD and data mining, this refers to random errors in
a database table. (N)
62 Node is
One of the defining aspects of a data warehouse (N)
None of these (N)
exclusive method (Y)
inclusive method (N)
The classification method in which the upper limit

63 of interval is same as of lower class interval is
called mid point method (N)
None of these (N)
Assumes that all the features in a dataset are equally important (N)
Assumes that all the features in a dataset are independent (N)
Which of the following is true about Naive Bayes

64/63
?
both (Y)
None of these (N)
Partitioning methods (N)

Hierarchical methods (N)
65 categories of clustering methods

Grid based methods (N)
All of these (Y)
Scalability (N)
Ability to deal with noisy data (N)
66 Requirements of cluster analysis Minimal requirements for domain knowledge to determine input
parameters (N)
All of these (Y)
statistical classifiers (N)
predict class (N)
67 Bayesian Classifiers are

both (Y)
None of these (N)
Solving queries (N)
Increasing complexity (N)
68 Where does the bayes rule can be used?

Decreasing complexity (N)
Answering probabilistic query (Y)

Twitter(N)
Google(Y)
Which of the following is not an example of Social

69/
Media? Insta(N)
Youtube(N)
TB(N)
YB(N)
By 2027, the volume of data produced digitally

70/68
will reach to ZB(Y)
EB(N)
Google(N)
NetFlix(N)
Which of the following options is not the example

71
of NoSql ? Amazon(Y)
CERN(N)
Open-Source(N)
Scalability(N)
What are the different features of Big Data

72
Analysis? Data Recovery(N)
All the above(Y)

Finding the appropriate features is hard(N)
Recommendations for new users(N)
73 In Content-based Approach problem is/are

Both Finding the appropriate features is hard and Recommendations
for new users(Y)
None of these(N)
possible(Y)
impossible(N)
74 Can decision tree be used for clustering?

impossible in some scenario(N)
None of these(N)
1(N)
2(Y)
There are major classification

75
collaborative filtering mechanisms 3(N)
None of these(N)
content based systems(N)
hybrid system(N)
recommended items based on

76
similarity measures between users and/or items collaborative filtering system (Y)
none of these(N)
market basket analysis(Y)
itemset filtering(N)
77 Association rules are sometimes referred to as

frequent item set analysis(N)
none of these(N)
Mapper(Y)
Reducer(N)
maps input key/value pairs to a set

78
of intermediate key/value pairs Both Mapper and Reducer(N)
None of the above(N)
task(N)
output(N)
The number of maps is usually driven by the total

79
size of . input(Y)
none(N)
structured(N)
unstructured(Y)
NoSQL databases is used mainly for handling

80/78
large volumes of data semi-structured(N)
79 is missing
None of above(N)
Cassandra(N)
Scylla(N)
Which of the following is not an example of a

81/80
nosql database management system? Handhoop / Hbase(N)
PostgreSQL(Y)
Uses JSON(Y)
Needs a schema(N)
Which of the following is a characteristic of a

82
NoSQL database? Requires JOINs(N)
Uses tables for storage(N)
Network(N)
Distributed(N)
83 NoSQL databases are most often referred to as

Relational(Y)
Object-oriented(N)
Field(Y)
Database(N)
Which of the following represent column in

85/83
NoSQL Collection(N)
Document(N)
High availability(Y)
Low availability(N)
86 The core principle of nosql is

both High & Low availability(N)
None of above(N)
Scalability(N)
Relational data(Y)
Which of the following is not a strong feature for

87 Faster data access than RDBMS.(N)
nosql databases?
Data easily held across multiple servers(N)
Document databases.(N)
Key-value stores(N)
88 What are the types of nosql databases

Graph & Column-oriented databases.(N)
All of the above(Y)
89
Key-value(Y)
Document(N)
Which of the following are the simplest NoSQL

90/87
databases? Wide-column(N)
All of the above(N)
NoSQL is not suitable for storing structured data.(N)
NoSQL databases allow storing non-structured data.(N)
91 What is the aim of nosql?

NoSQL is a new data format to store large datasets.(Y)
NoSQL provides an alternative to SQL databases to store textual
data.(N)
ALWAYS True(N)
True only for Apache Hadoop(Y)
92 Hadoop is open source.

True only for Apache and Cloudera Hadoop(N)
ALWAYS False(N)
Analytics(N)
Data mining(N)
The Process of describing the data that is huge

93
and complex to store and process is known as Big Data(Y)
Data Warehouse(N)
Text file, Audio Files, Video Files(Y)
Only Text data(N)
94 Unstructured Data Consists of:

Tagged Data(N)
Weather forecasting(N)
Marketing(N)
Check below the best answer to “which industries
95 employ the use of so-called “Big Data” in their
day to day operations? Healthcare(N)
All of the above(Y)
It is a distributed framework(N)
The main algorithm used in it is Map Reduce(N)
Which one of the following is false about

96
Hadoop?
It runs with commodity hardware(N)
All are true(Y)
Data Node(N)
NameNode(Y)
Which of the Node serves as the master and

97//91
there is only one NameNode per cluster. Data block(N)
Replication(N)
Hive(N)
Imphala(N)
98 which of the File system is used by HBase?

Hadoop(Y)
Scala(N)
Data Node(N)
NameNode(Y)
A serves as the master and there is

99
only one NameNode per cluster. Data block(N)
Replication(N)
unstructured(Y)
structured(N)
NoSQL databases is used mainly for handling

100
large volumes of data. semi-structured(N)
all of the mentioned(N)
Creation of a record(N)
Modification of a record(N)
101 Hbase creates a new version of a record during

Deletion of a record(N)
All the above(Y)
sequence of data items that arrive in some order and may be seen only
once.(Y)
sequence of data items that arrive in some order and may be seen
twice.(N)
102 Real-time data stream is
sequence of data items that arrive in same order(N)

sequence of data items that arrive in different order(N)
It is possible to delete an element from a Bloom filter.(N)
A Bloom filter always returns the correct result.(N)
Which of the following statements about standard

103 It is possible to alter the hash functions of a full Bloom filter to create
Bloom filters is correct?
more space.(N)
A Bloom filter always returns TRUE when testing for a previously added
element(Y)
Accept those tuples in the stream that meet a criterion(Y)
Accept data in the stream that meet a criterion.(N)
104/96 In Filtering Streams

Accept those class in the stream that meet a criterion(N)
Accept rows in the stream that meet a criterion. (N)
through all stream elements whose keys are in Set(Y)
through all stream elements whose keys are in class(N)
The purpose of the Bloom filter is to

105
allow
through all data elements whose keys are in Set(N)
through all touple elements whose keys are in Set(N)

worker-master fashion(N)
master-slave fashion(Y)
106 HDFS works in a fashion.

master-worker fashion(N)
slave-master fashion(N)
web traffic(N)
internet(N)
Which one does not belong to application of data

107
stream? sensor data(N)
None of these(Y)
mining query stream(Y)
mining login stream(N)
Google wants to know which queries are frequent

108
today than yesterday mining search stream(N)
mining click stream(N)
Mining query stream(N)
Mining login stream(N)
Yahoo wants to know which of its pages are

109
getting unusual number of hits in the past Mining search stream(N)
Mining click stream(Y)
financial applications(N)
network monitoring(N)
Which was not following the data stream

110
concepts? fraud detection(Y)
web application(N)
document(N)
key-value(Y)
111 A store is a simple database that when

graph(N)
simple(N)
mapped, reduce(N)
mapping, Reduction(N)
The MapReduce algorithm contains two important

112
tasks, namely . Map, Reduction(N)
Map, Reduce(Y)
Accept those tuples in the stream that meet a criterion.(Y)
Accept data in the stream that meet a criterion.(N)
113 In Filtering Streams

Accept those class in the stream that meet a criterion(N)
Accept rows in the stream that meet a criterion.(N)
continuous queries(N)
one time queries(Y)
In streaming queries, alter the user when stock

114
crosses over a price point is an example of sampling queries(N)
none of these(N)
MongoDB(Y)
Oracle (N)
115 Which data base is popular?

Mysql(N)
Not SQL(N)
No usage of SQL(N)
116 No SQL means

Not only SQL(Y)
Not for SQL(N)
Google (N)
NetFlix(N)
117 Which is not example of NoSQL?

Amazon(Y)
None of these(N)
Twitter(Y)
Facebook(N)
118 Graph model of NoSQL used in

Google (N)
WhatsAPP(N)
column based(N)
key value based(N)
119 MongoDB is
document based(Y)
graph based(N)
Local file(N)
HDFS(N)
120 Hive query can be stored in

Both(Y)
Can not be stored(N)
Made read only by setting the read only option(Y)

Always writeable(N)
121 Hbase tables are

Always read only(N)
Are made read only using the query to the table(N)
high in size(N)
speed of data(N)
122 What is true about Variety in bigdata?

data from(N)
data in certain(Y)
Cassandra(Y)
Riak(N)
123 Which of the following is a wide-column store?

MongoDB(N)
Redis(N)
Larry Page(N)
Doug Cutting (Y)
124 Hadoop developed by

Mark (N)
Bill Gates(N)
poor results(N)
poor data(N)
125 Problems in recommendation systems

Lack of data(N)
All of these(Y)
content(N)
collaborative(N)
126 Type of recommender systems

knowledge(N)
All of these(Y)
Finding frequent patterns(N)
associations(N)
127 Association Mining is

correlations(N)
All of these(Y)
Basket data analysis(N)
cross-marketing(N)
128 Applications of Association rules

clustering(N)
All of these(Y)
coherent signals(N)
packets of data(N)
129 A data stream is a sequence of digitally encoded

data packets(N)
All of these(Y)
analyzes data(N)
correlates data(N)
130 Real Time Analytics Platform (RTAP)

predicts outcomes(N)
All of these(Y)
unbounded in size(N)
generated continuously in real time(N)
131 A data stream is potentially

the volume of the data is very large(N)
All of these(Y)
Large data volume(N)
likely structured(N)
132 Data Stream is

arriving a very high rate(N)
All of these(Y)
Security applications(N)
Telecom call records(N)
133 Data streams are in actions

Financial applications(N)
All of these(Y)
Can eliminate the need for large data engineering projects(N)
Performance, high availability and fault tolerance built in(N)
134 Benefits of a modern streaming architecture

Flexibility and support for multiple use cases(N)
All of these(Y)
THE END OF MCQ

THIS WORD COLOR MEANS NOT IN SIRS
PDF
WORD ANS NOT VALID CHECK ANS FROM PDF

Confidential

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Confidential

Uploaded by

Copyright:

Available Formats

GB (N)

1 The size of big data

Peta bytes (Y)

2 These data come from many sources like

All of these (Y)

huge data (N)

big data (N)

Both a and b (Y)

human brain capacity (Y)

animal's brain capacity (N)

None of these (N)

genome sequences of 7 billion people (Y)

genome sequences of 8 billion people (N)

genome sequences of 10 billion people (N)

Total digital data created in 2008 (Y)

Total digital data created in 2009 (N)

Total digital data created in 2011 (N)

10 billion messages (N)

12 Facebook can generate approximately

30 billion messages (N)

13 Big Data can be

All of these (Y)

Tabular form (Y)

Non tabular form (N)

None of these (N)

15 Any data with unknown form

None of these (N)

non heterogeneous data source (N)

None of these (N)

XML file (Y)

HTML file (N)

17 semi-structured data is a data represented in an

None of these (N)

Structured data (N)

unstructured data (N)

18 Web server logs is an example of

None of these (N)

All of these (Y)

demanding data slowly (N)

20 The primary aspect of Big Data is to provide

None of these (N)

business processes (N)

21 Big data velocity deals

All of these (Y)

costs of data management (Y)

data quality (N)

22 Big Data approaches are reducing the

None of these (N)

Identify the data we have (N)

Identify the data we need (N)

23 Big data focus on

All of these (Y)

suitability of the business challenge (N)

24 Aspects of the appropriateness of big data

All of these (Y)

computing resources (N)

collection of storage (N)

Most big data applications achieve their

None of these (N)

Awareness of the architecture of computing platform (N)

26 Big data application is directly dependent on

All of these (Y)

27 Processing capability, often referred to as a