You are on page 1of 11

UNIT 1

 Difference between Data Science, Big Data and Data Analytics.


 Explain about Data processing infrastructure challenges.
 Difference between Data Warehouse and Data Mart.
 Illustrate architecture of Data Ware house and its components.
 What are the reasons for explosive growth of data.
 Explain in detail different Big Data Processing Architectures
 Explain about four V's of Big Data.
 Define Data Velocity with examples
 Applications of Data Science, Big Data and Data Analytics.

Unit 2

 Summarizing data with R programming.


 .Find the probability of getting 3 doublets when a pair of fair dice are thrown for 10 times
and find the probability of getting 3 or lesser doublets.
 Explain in detail about Regression Analysis
 What is Central Limit Theorem (CLT).
 Define Binomial distribution and Normal Distribution.
 Explain about Random and Bivariate random variables with suitable examples.
 Explain about built-in functions of Binomial Distribution and Normal Distribution

Unit3

 Define Hadoop.
 What are the features of Hadoop
 Explain in detail about Hadoop architecture.
 Explain about architecture of Google file system.
 List out Big data technologies
Which of the following are parts of the 5 P's of data science and what is the additional P
introduced in the slides?

 People
 Purpose
 Product
 Perception
 Process
 Programmability
 Platforms

Which of the following are part of the four main categories to acquire, access, and retrieve
data?

 NoSQL Storage
 Remote Data
 Traditional Databases
 Web Services
 Text Files

What are the steps required for data analysis?

 Investigate, Build Model, Evaluate


 Classification, Regression, Analysis
 Regression, Evaluate, Classification
 Select Technique, Build Model, Evaluate

Of the following, which is a technique mentioned in the videos for building a model?

 Investigation
 Validation
 Evaluation
 Analysis

What is the first step in finding a right problem to tackle in data science?

 Assess the Situation


 Ask the Right Questions
 Define the Problem
 Define Goals
What is the first step in determining a big data strategy?

 Business Objectives
 Collect Data
 Build In-House Expertise
 Organizational Buy-In

According to Ilkay, why is exploring data crucial to better modeling?

Data exploration...

 leads to data understanding which allows an informed analysis of the data.


 enables a description of data which allows visualization.
 enables understanding of general trends, correlations, and outliers.
 enables histograms and others graphs as data visualization.

Why is data science mainly about teamwork?

 Analytic solutions are required.


 Engineering solutions are preferred.
 Data science requires a variety of expertise in different fields.
 Exhibition of curiosity is required.

What are the ways to address data quality issues?

 Remove outliers.
 Generate best estimates for invalid values.
 Remove data with missing values.
 Data Wrangling
 Merge duplicate records.

What is done to the data in the preparation stage?

 Retrieve Data
 Select Analytical Techniques
 Build Models
 Identify Data Sets and Query Data
 Understanding Nature of Data and Preliminary Analysis

According to analysts, for what can traditional IT systems provide a foundation when they’re
integrated with big data technologies like Hadoop?
A. Big data management and data mining

B. Data warehousing and business intelligence

C. Management of Hadoop clusters

D. Collecting and storing unstructured data

All of the following accurately describe Hadoop, EXCEPT:

A. Open source

B. Real-time

C. Java-based

D. Distributed computing approach

_________ has the world’s largest Hadoop cluster.

A. Apple

B. Datamatics

C. Facebook

D. None of the mentioned

What are the five V’s of Big Data?

A. Volume

B. Velocity

C. Variety

D. All the above


________ hides the limitations of Java behind a powerful and concise Clojure API for
Cascading.

A. Scalding

B. Cascalog

C. Hcatalog

D. Hcalding

The MapReduce algorithm contains two important tasks, namely __________.

A. mapped, reduce
B. mapping, Reduction
C. Map, Reduction
D. Map, Reduce

 2.takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (key/value pairs)

A.Map
B.Reduce
C.BothAandB
D. Node

3. task, which takes the output from a map as an input and combines those data tuples into a
smaller set of tuples.

A. Map
B. Reduce
C. Node
D. Both A and B

4. In how many stages the MapReduce program executes?


A. 2
B. 3
C. 4
D. 5

5.Which of the following is used to schedules jobs and tracks the assign jobs to Task tracker?

A. SlaveNode
B. MasterNode
C. JobTracker
D. Task Tracker

6.

Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?
A. Task
B. Job
C. Mapper
D. PayLoad

Which of the following commnd runs a DFS admin client?


A. secondaryadminnode
B. nameadmin
C. dfsadmin
D. adminsck

Although the Hadoop framework is implemented in Java, MapReduce applications need not be
written in ____________

A. C
B. C#
C. Java
D. None of the above
The number of maps is usually driven by the total size of ____________

A. Inputs
B. Output
C. Task
D. None of the above

Which of the following is not a NoSQL database?

 Cassandra

 MongoDB

 SQL Server

 None of the above

Which of the following is a nosql database type?

 SQL

 JSON

 Document databases

 None of the Above

Which of the following are the simplest NoSQL databases?

 Key-value

 Document

 Wide-column

 All of the above


Which of the following is not a strong feature for nosql databases?

 Scalability

 Relational data

 Faster data access than RDBMS.

 Data easily held across multiple servers

NoSQL can be referred to as .............

 No SQL

 Only SQL

 Not Only SQL

 SQL Undefined

What ETL stands for?

A.Data Inspection

B.transformtion

C.Extract,Transform,Load

D.Data Flow

Which of these steps is executed at the end of every stage of ETL – extract, clean, conform?
Loggingtheactivitytoaflatfile
Displayingthedatatotheuser
Stagingthedatatothedatabase
Sending a message about the tasks

ETL execution or operation approach falls into which of these two major categories:
Planning&Execution
Implementation&Testing
Scheduling&Support
Maintenance & Support

One of the requirements while designing an ETL system is how quickly source data can be
delivered to end users. This is referred as:
Dataspeed
Datalineage
Datalatency
Data availability

Extracting the data from the source systems, is involved in _______ step of ETL process.
extract
transform
load
planning

Point out the correct statement.

Hadoop run on the .............

 Debian

 Unix-like

 Bare metal

 Cross-platform
Which of the following statement is incorrect about Hadoop?

 It runs with commodity hard ware

 It is best for live streaming of data

 It is a part of the Apache project sponsored by the ASF

 All of the above

________ step is performed by data scientist after acquiring the data.


A) Data Cleaning B) Data Integration
C) Data Replication D) Data loading
Email data is an example of ______
A) Structured data B) Un-Structured data
C) Semi-Structured data D) Scattered
A coin is tossed up 4 times. The probability that tails turn up in 3 cases is
A) 1/2 B) 1/3
C) 1/4 D) 1/6
Find median and mode of the messages received on 9 consecutive days 18, 11, 9, 5, 18, 4, 15,
13, 17.
A) 13, 6 B) 13, 18
C) 18, 15 D) 15, 16
What is a numerical descriptive measure calculated from a sample called?
A) Parameter B) Statistics
C) Population D) Sampling Distribution
Subset of the sample space is called_________
A) an Event B) an Experiment
C) a Mutual exclusive event D) Independent events
Data can be categorized in to groups.
A) 1 B) 2
C) 3 D) 4
Which of the following are benefits of Big Data processing?
A) Businesses can utilize outside intelligence while taking decisions
B) Improved customer service
C) Better operational efficiency
D) All the above
Which of the following language is used in Data Science?
A) C B) C++
C) R D) Ruby
What are the four V's of Big Data?
A) Volume B) Velocity
C) Variety D) All the above

You might also like