You are on page 1of 2

Basic Hadoop

Big DAta

Educate Explorer Engage Execute

For transaction Query and reporting


for log data Data mining
Social Media Optimization

For ex A380 from London to Singapore generates 1PB of storage

1. What is big data?


Large data set
Doesnt fit in traditional database
Volume of data is very huge
2. Four Vs of big data? (3Vs given by IBM)
Volume (petabytes) (stock data , health data)
Velocity
Variety
Veracity
3. Diff between Structured and UnStuctured Data
Structured Data( Traditional Database)
Un structured Data( data from social networking site)
Semi Structured Data( Xml Data)
4. Main Componenets of Hadooop?
file system (Networking), Hdfs (Stroage Layers), MapReduce,
(Processing Layer)(Job= Dataset into Key Value), Yarn

Hdfs Questions
Block size 64 Mb or 128 Mb
Cloudera (other distribution) now giving 128 Mb
Depends on the organization for setting block size

1. diff between Namenode, Backup and checkpoint node?


name node: - key structure , metadata ,file structure stores
in NM, client contacts namenode, two files fsimage, editlog.
Checkpoint node: - to keep latest Fs image . y .. to restore
it.
Backup same facilities like checkpoint .. keep the all the
files up to date

2. Commodity Hardware?
all these servers in an inexpensive servers.. and easy to
add anoter hardware.. Master cannot be commodity
hardware.. it should be expensive.. DN can b cmmodidyt
hardware
3. Rack awareness?
physical location of the date node the first or nearest
block in the data node is called rack location

Map Reduce
The 3 primary phases
Shuffle, sorting and Partioning
Shuffle = helps to process intermediate dat. It will just shuffle
the data accoding to our requirement
Sorting = set of intermediate keys,, length of the key , based
on the key, length of the value , based on the value
Partitioning partition the keys based on the values, no of
partitions is no of reducers

1. Can reducers communicate with each other?(because of


their isolation) No they wont.
2. what is task Instance
the task which runs on one each salve node..

HBase
1. What is RowKey?

2. Differenc between rdbms data model and hbase


relational | non relational
non partitioning | Support partitioning
Structured | All structured
Sql | Non sql

3. Diff between Hbase and Hive


hive is metadata , HQL
Hbase has no map reduce
4. Row deletion in Hbase