Data Analytics Important Questions

UNIT 1
 Difference between Data Science, Big Data and Data Analytics.

 Explain about Data processing infrastructure challenges.
 Difference between Data Warehouse and Data Mart.
 Illustrate architecture of Data Ware house and its components.
 What are the reasons for explosive growth of data.
 Explain in detail different Big Data Processing Architectures
 Explain about four V's of Big Data.
 Define Data Velocity with examples
 Applications of Data Science, Big Data and Data Analytics.
Unit 2
 Summarizing data with R programming.

 .Find the probability of getting 3 doublets when a pair of fair dice are thrown for 10 times
and find the probability of getting 3 or lesser doublets.
 Explain in detail about Regression Analysis
 What is Central Limit Theorem (CLT).
 Define Binomial distribution and Normal Distribution.
 Explain about Random and Bivariate random variables with suitable examples.
 Explain about built-in functions of Binomial Distribution and Normal Distribution
Unit3
 Define Hadoop.
 What are the features of Hadoop
 Explain in detail about Hadoop architecture.
 Explain about architecture of Google file system.
 List out Big data technologies
Which of the following are parts of the 5 P's of data science and what is the additional P
introduced in the slides?
 People
 Purpose
 Product
 Perception
 Process
 Programmability
 Platforms
Which of the following are part of the four main categories to acquire, access, and retrieve
data?
 NoSQL Storage
 Remote Data
 Traditional Databases
 Web Services
 Text Files
What are the steps required for data analysis?
 Investigate, Build Model, Evaluate

 Classification, Regression, Analysis
 Regression, Evaluate, Classification
 Select Technique, Build Model, Evaluate
Of the following, which is a technique mentioned in the videos for building a model?
 Investigation
 Validation
 Evaluation
 Analysis
What is the first step in finding a right problem to tackle in data science?
 Assess the Situation

 Ask the Right Questions
 Define the Problem
 Define Goals
What is the first step in determining a big data strategy?
 Business Objectives
 Collect Data
 Build In-House Expertise
 Organizational Buy-In
According to Ilkay, why is exploring data crucial to better modeling?
Data exploration...
 leads to data understanding which allows an informed analysis of the data.

 enables a description of data which allows visualization.
 enables understanding of general trends, correlations, and outliers.
 enables histograms and others graphs as data visualization.
Why is data science mainly about teamwork?
 Analytic solutions are required.

 Engineering solutions are preferred.
 Data science requires a variety of expertise in different fields.
 Exhibition of curiosity is required.
What are the ways to address data quality issues?
 Remove outliers.
 Generate best estimates for invalid values.
 Remove data with missing values.
 Data Wrangling
 Merge duplicate records.
What is done to the data in the preparation stage?
 Retrieve Data
 Select Analytical Techniques
 Build Models
 Identify Data Sets and Query Data
 Understanding Nature of Data and Preliminary Analysis
According to analysts, for what can traditional IT systems provide a foundation when they’re
integrated with big data technologies like Hadoop?
A. Big data management and data mining
B. Data warehousing and business intelligence
C. Management of Hadoop clusters
D. Collecting and storing unstructured data
All of the following accurately describe Hadoop, EXCEPT:
A. Open source
B. Real-time
C. Java-based
D. Distributed computing approach
_________ has the world’s largest Hadoop cluster.
A. Apple
B. Datamatics
C. Facebook
D. None of the mentioned
What are the five V’s of Big Data?
A. Volume
B. Velocity
C. Variety
D. All the above

________ hides the limitations of Java behind a powerful and concise Clojure API for
Cascading.
A. Scalding
B. Cascalog
C. Hcatalog
D. Hcalding
The MapReduce algorithm contains two important tasks, namely __________.
A. mapped, reduce
B. mapping, Reduction
C. Map, Reduction
D. Map, Reduce
2.takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (key/value pairs)
A.Map
B.Reduce
C.BothAandB
D. Node
3. task, which takes the output from a map as an input and combines those data tuples into a
smaller set of tuples.
A. Map
B. Reduce
C. Node
D. Both A and B
4. In how many stages the MapReduce program executes?

A. 2
B. 3
C. 4
D. 5
5.Which of the following is used to schedules jobs and tracks the assign jobs to Task tracker?
A. SlaveNode
B. MasterNode
C. JobTracker
D. Task Tracker
6.
Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?
A. Task
B. Job
C. Mapper
D. PayLoad
Which of the following commnd runs a DFS admin client?

A. secondaryadminnode
B. nameadmin
C. dfsadmin
D. adminsck
Although the Hadoop framework is implemented in Java, MapReduce applications need not be
written in ____________
A. C
B. C#
C. Java
D. None of the above
The number of maps is usually driven by the total size of ____________
A. Inputs
B. Output
C. Task
D. None of the above
Which of the following is not a NoSQL database?
Cassandra
MongoDB
SQL Server
None of the above
Which of the following is a nosql database type?
SQL
JSON
Document databases
None of the Above
Which of the following are the simplest NoSQL databases?
Key-value
Document
Wide-column
All of the above

Which of the following is not a strong feature for nosql databases?
Scalability
Relational data
Faster data access than RDBMS.
Data easily held across multiple servers
NoSQL can be referred to as .............
No SQL
Only SQL
Not Only SQL
SQL Undefined
What ETL stands for?
A.Data Inspection
B.transformtion
C.Extract,Transform,Load
D.Data Flow
Which of these steps is executed at the end of every stage of ETL – extract, clean, conform?
Loggingtheactivitytoaflatfile
Displayingthedatatotheuser
Stagingthedatatothedatabase
Sending a message about the tasks
ETL execution or operation approach falls into which of these two major categories:
Planning&Execution
Implementation&Testing
Scheduling&Support
Maintenance & Support
One of the requirements while designing an ETL system is how quickly source data can be
delivered to end users. This is referred as:
Dataspeed
Datalineage
Datalatency
Data availability
Extracting the data from the source systems, is involved in _______ step of ETL process.
extract
transform
load
planning
Point out the correct statement.
Hadoop run on the .............
Debian
Unix-like
Bare metal
Cross-platform
Which of the following statement is incorrect about Hadoop?
It runs with commodity hard ware
It is best for live streaming of data
It is a part of the Apache project sponsored by the ASF
All of the above
________ step is performed by data scientist after acquiring the data.

A) Data Cleaning B) Data Integration
C) Data Replication D) Data loading
Email data is an example of ______
A) Structured data B) Un-Structured data
C) Semi-Structured data D) Scattered
A coin is tossed up 4 times. The probability that tails turn up in 3 cases is
A) 1/2 B) 1/3
C) 1/4 D) 1/6
Find median and mode of the messages received on 9 consecutive days 18, 11, 9, 5, 18, 4, 15,
13, 17.
A) 13, 6 B) 13, 18
C) 18, 15 D) 15, 16
What is a numerical descriptive measure calculated from a sample called?
A) Parameter B) Statistics
C) Population D) Sampling Distribution
Subset of the sample space is called_________
A) an Event B) an Experiment
C) a Mutual exclusive event D) Independent events
Data can be categorized in to groups.
A) 1 B) 2
C) 3 D) 4
Which of the following are benefits of Big Data processing?
A) Businesses can utilize outside intelligence while taking decisions
B) Improved customer service
C) Better operational efficiency
D) All the above
Which of the following language is used in Data Science?
A) C B) C++
C) R D) Ruby
What are the four V's of Big Data?
A) Volume B) Velocity
C) Variety D) All the above

Data Analytics Important Questions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analytics Important Questions

Uploaded by

Copyright:

Available Formats

UNIT 1

 Difference between Data Science, Big Data and Data Analytics.

 Summarizing data with R programming.

What are the steps required for data analysis?

 Investigate, Build Model, Evaluate

 Assess the Situation

According to Ilkay, why is exploring data crucial to better modeling?

 leads to data understanding which allows an informed analysis of the data.

Why is data science mainly about teamwork?

 Analytic solutions are required.

What are the ways to address data quality issues?

What is done to the data in the preparation stage?

B. Data warehousing and business intelligence

C. Management of Hadoop clusters

D. Collecting and storing unstructured data

All of the following accurately describe Hadoop, EXCEPT:

D. Distributed computing approach

_________ has the world’s largest Hadoop cluster.

D. None of the mentioned

What are the five V’s of Big Data?

D. All the above

The MapReduce algorithm contains two important tasks, namely __________.

4. In how many stages the MapReduce program executes?

Which of the following commnd runs a DFS admin client?

Which of the following is not a NoSQL database?

None of the above

Which of the following is a nosql database type?

None of the Above

Which of the following are the simplest NoSQL databases?

All of the above

Faster data access than RDBMS.

Data easily held across multiple servers

NoSQL can be referred to as .............

Not Only SQL

What ETL stands for?

Point out the correct statement.

Hadoop run on the .............

It runs with commodity hard ware

It is best for live streaming of data

It is a part of the Apache project sponsored by the ASF

All of the above

________ step is performed by data scientist after acquiring the data.

You might also like