Professional Documents
Culture Documents
Submitted To : Submitted By :
Mr. Aravendra Sharma Dileep Singh
IMS (1900360149021)
CONTENT
Pros
More read requests:
Add more slave nodes
Ensure that all read requests are routed to the slaves
Should the master fail, the slaves can still handle read requests
Good for datasets with a read-intensive dataset
Pros and cons of Master-Slave Replication
Cons:
Limited by its ability to process updates and to pass those updates on
Its failure does eliminate the ability to handle writes until:
the master is restored or a new master is appointed
Inconsistency due to slow propagation of changes to the slaves
Bad for datasets with heavy write traffic.
Peer-to-Peer Replication
Pros:
you can ride over node failures without losing access to data
you can easily add nodes to improve your performance
Cons:
Inconsistency!
Slow propagation of changes to copies on different nodes
Inconsistencies on read lead to problems but are relatively transient
Two people can update different copies of the same record stored on different nodes
at the same time - a write-write conflict.
Sharing & Replication
Version stamps help you detect concurrency conflicts. When you read data, then
update it, you can check the version stamp to ensure nobody updated the data
between your read and write.
With distributed systems, a vector of version stamps allows you to detect when
different nodes have conflicting updates.
Map Reduce
The map task reads data from an aggregate and boils it down to relevant key-
value pairs. Maps only read a single record at a time and can thus be parallelized
and run on the node that stores the record.
Reduce tasks take many values for a single key output from map tasks and
summarize them into a single output. Each reducer operates on the result of a
single key, so it can be parallelized by key.
Reducers that have the same form for input and output can be combined into
pipelines. This improves parallelism and reduces the amount of data to be
transferred.
Map-reduce operations can be composed into pipelines where the output of one
reduce is the input to another operation's map.
Partitioning & Combining
Partitioner:
A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the data
using a user-defined condition, which works like a hash function.
The total number of partitions is same as the number of Reducer tasks for the job.
Combiner:
The Combiner class is used in between the Map class and the Reduce class to reduce the
volume of data transfer between Map and Reduce.
Usually, the output of the map task is large and the data transferred to the reduce task is high.
Thank you