/  24
 
Map/Reduce, Hadoop & Pig
Data mining applied on the enterprise
 
Definitions
Data mining
is the process of extractingpatterns from data. Commonly used in awide range of profiling practices, such asmarketing, surveillance, fraud detectionand scientific discovery.
A
Framework 
is a re-usable design for asoftware system (or subsystem). A softwareframework may include support programs,code libraries, a scripting language, orother software to help develop and
gluetogether 
the different components of asoftware project. Various parts of theframework may be exposed through an API.
 
Map/Reduce
Framework for processing huge datasets on certainkinds of distributable problems using a largenumber of computers.
MapReduce provides
Automatic parallelization & distribution
Fault tolerance
I/O scheduling
Status monitoring
Use cases
Document clustering
Machine learning
Inverted index construction
Was used to completely regenerate Google's index of the World Wide Web

Share & Embed

More from this user

Add a Comment

Characters: ...