You are on page 1of 1

About Hadoop

Hadoop is software package which provides a framework for distributed storage and distributed
processing. For distributed storage there is a unique file system called HDFS (Hadoop Distributed File
System). For Distributed processing there is a unique processing engine called Mapreduce.

Both HDFS and Mapreduce are created to function on distributed systems or cluster of machines where
their actual caliber is tested. Also we need to understand the kind of problems that can fit into this
framework comprising of HDFS and Mapreduce. Due to distributed nature of storage and processing
performance is drastically improved as tasks in such environment can be accomplished parallel.

Hadoop can be deployed on a single machine as well as on cluster of machines. Hadoop on a single
machine is called pseudo distributed mode of installation and is not ment for development rather it can
be used by beginners for learning and understanding the hadoop environment using simple examples

When it comes to cluster hadoop framework, HDFS is scaled across all the machines which are part of
cluster and forms a single logical file system where the files are stores in distributed manner with little
redundancy to provide fault tolerant environment. With Mapreduce also it’s the same concept except
that there is distributed processing rather than distributed storage. Mapreduce in turn is divided into
two functions called map and reduce. Map performs the main computation on distributed input and
produces partial results, reduce aggregates the partial results into a single result.

Topic Title

1. Flexible stock data analytics using bigdata and machine learning techniques
2. Machine learning techniques for efficient stock analytics in collaboration with bigdata
techniques for better performance and flexibility
3. Application of machine learning and bigdata techniques on stock data analytics

You might also like