Professional Documents
Culture Documents
Unit 2
Unit3
Define Hadoop.
What are the features of Hadoop
Explain in detail about Hadoop architecture.
Explain about architecture of Google file system.
List out Big data technologies
Which of the following are parts of the 5 P's of data science and what is the additional P
introduced in the slides?
People
Purpose
Product
Perception
Process
Programmability
Platforms
Which of the following are part of the four main categories to acquire, access, and retrieve
data?
NoSQL Storage
Remote Data
Traditional Databases
Web Services
Text Files
Of the following, which is a technique mentioned in the videos for building a model?
Investigation
Validation
Evaluation
Analysis
What is the first step in finding a right problem to tackle in data science?
Business Objectives
Collect Data
Build In-House Expertise
Organizational Buy-In
Data exploration...
Remove outliers.
Generate best estimates for invalid values.
Remove data with missing values.
Data Wrangling
Merge duplicate records.
Retrieve Data
Select Analytical Techniques
Build Models
Identify Data Sets and Query Data
Understanding Nature of Data and Preliminary Analysis
According to analysts, for what can traditional IT systems provide a foundation when they’re
integrated with big data technologies like Hadoop?
A. Big data management and data mining
A. Open source
B. Real-time
C. Java-based
A. Apple
B. Datamatics
C. Facebook
A. Volume
B. Velocity
C. Variety
A. Scalding
B. Cascalog
C. Hcatalog
D. Hcalding
A. mapped, reduce
B. mapping, Reduction
C. Map, Reduction
D. Map, Reduce
2.takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (key/value pairs)
A.Map
B.Reduce
C.BothAandB
D. Node
3. task, which takes the output from a map as an input and combines those data tuples into a
smaller set of tuples.
A. Map
B. Reduce
C. Node
D. Both A and B
5.Which of the following is used to schedules jobs and tracks the assign jobs to Task tracker?
A. SlaveNode
B. MasterNode
C. JobTracker
D. Task Tracker
6.
Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?
A. Task
B. Job
C. Mapper
D. PayLoad
Although the Hadoop framework is implemented in Java, MapReduce applications need not be
written in ____________
A. C
B. C#
C. Java
D. None of the above
The number of maps is usually driven by the total size of ____________
A. Inputs
B. Output
C. Task
D. None of the above
Cassandra
MongoDB
SQL Server
SQL
JSON
Document databases
Key-value
Document
Wide-column
Scalability
Relational data
No SQL
Only SQL
SQL Undefined
A.Data Inspection
B.transformtion
C.Extract,Transform,Load
D.Data Flow
Which of these steps is executed at the end of every stage of ETL – extract, clean, conform?
Loggingtheactivitytoaflatfile
Displayingthedatatotheuser
Stagingthedatatothedatabase
Sending a message about the tasks
ETL execution or operation approach falls into which of these two major categories:
Planning&Execution
Implementation&Testing
Scheduling&Support
Maintenance & Support
One of the requirements while designing an ETL system is how quickly source data can be
delivered to end users. This is referred as:
Dataspeed
Datalineage
Datalatency
Data availability
Extracting the data from the source systems, is involved in _______ step of ETL process.
extract
transform
load
planning
Debian
Unix-like
Bare metal
Cross-platform
Which of the following statement is incorrect about Hadoop?