You are on page 1of 25

THE BIG DATA

TECHNOLOGY
LANDSCAPE
By:Syed Nawaz
Asst. Professor
SREC
CSE
Topics To Learn
 What is NOSQL
 Where it is Used
 Types of NOSQL Databases
 Why NoSQL
 Advantages of noSQL
 What we miss with NoSQL
 Difference between SQL and noSQL
What is noSQL
 Not only SQL
 Non-relational, opensource, distributed DB’s which
can handle rich variety of data
(struc,semi,unstructured).
Features of NoSQL
 NoSQL DB’s are non-relational:Store data in the
form of key-value pairs,document-oreinted or
column-oriented or graph-based DB’s.
 Distributed
 No support for ACID: Follow CAP thereom
 No fixed Schema:flexible schema
Types of NoSQL Databases
 Key-value
 Document
 Column
 Graph
Key-value
 It maintains a big hash table of keys and values.
 Ex: Dynamo,Redis,Riak etc

Key Value
First name Sai
Last name kumar
Document Database
 It maintains data in collections constituted of
documents.
 Ex: MongoDB, Apache CouchDB, Couchbase etc…
 Sample document database:
 {
 “Book name”: “BDA”,
 “publication”: “wiley India”
 “Year of Publication”: “2011”
 }
Column
 Each storage block has data from only one column.
 Ex: Cassandra, HBase etc…
Graph
 Also called as Network database.
 Graph stores data in nodes.
 Ex: Neo4j,HyperGraphDB…
Why NoSQL
 Scale Out architecture
 Store variety of data
 Dynamic Schema
 Auto Sharding: Spread data across different nodes
 Replication: Availabity,Fault tolerance and
recovery
Advantages of Nosql
What we miss with NoSQL
 Joins
 GroupBy
 ACID properties
 SQL
 Easy integration with other applications that
support SQL.
Nosql in industry
SQL vs NoSQL
Hadoop
 It is an open source framework given by apache
software foundation for storing and processing
huge datasets with a cluster of commodity
hardware.
HDFS Architecture
Features of hadoop
 Optimized to handle massive quantities of data.
 Shared nothing architecture
 Data replication
 High throughput
 Complements OLTP and OLAP
 NOT good when work cannot be parallelized
 NOT good for processing small files
Key advantages of Hadoop
 Stores data in native format
 Scalable
 Cost-effective
 Resilient to failure
 Flexible
 Fast
Versions of Hadoop
Hadoop Ecosystem
Hadoop Ecosystem
 HDFS: It simply stores data files .
 Hbase: Hadoop’s database. It supports structured data storage for large
databases.
 Hive: Similar to ANSI SQL.
 Pig: data flow language. Pig scripts are automatically converted to map
reduce programs by pig interpreter.
 Zookeeper: coordination service for distributed applications.
 Oozie: workflow schedular to manage hadoop jobs.
 Mahout : scalable machine learning and data mining library.
 Chukwa: data collection system for managing large distributed systems.
 Sqoop: data transfer between RDBMS and hadoop.
 Ambari : web based tool for provisioninig,managing and monitoring hadoop
cluster.
Anatomy of File read in hadoop
Anatomy of File write in hadoop
Working with HDFS commands
 Hadoop fs –ls /
 Hadoop fs –ls –R /
 Hadoop fs –mkdir /sample
 Hadoop fs –put /root/sample/test.txt /sample/test.txt
 Hadoop fs –get /sample/test.txt /root/sample/testsample.txt
 Hadoop fs –copyfromlocal /roor/sample/test.txt
/sample/testsample.txt
 Hadoop fs –cat /sample/test.txt
 Hadoop fs –cp /sample/test.txt /sample1
 Hadoop fs –rm-r /sample1
 Successfully compiled ur drivercode,maper code and reducer code
 Export your jar files
 WordCount.jar
 $hadoop jar WordCount.jar /packdecmo/WordCount
/sample/test.txt /sample/wordcountoutput
 Wordcountoutput

 Hadoop fs –cat /wordcountoutput/part-r-00000


 Hi,2
Part-r-00000
 Cse,1
_sucess
 Student,1 s

You might also like