Professional Documents
Culture Documents
TECHNOLOGY
LANDSCAPE
By:Syed Nawaz
Asst. Professor
SREC
CSE
Topics To Learn
What is NOSQL
Where it is Used
Types of NOSQL Databases
Why NoSQL
Advantages of noSQL
What we miss with NoSQL
Difference between SQL and noSQL
What is noSQL
Not only SQL
Non-relational, opensource, distributed DB’s which
can handle rich variety of data
(struc,semi,unstructured).
Features of NoSQL
NoSQL DB’s are non-relational:Store data in the
form of key-value pairs,document-oreinted or
column-oriented or graph-based DB’s.
Distributed
No support for ACID: Follow CAP thereom
No fixed Schema:flexible schema
Types of NoSQL Databases
Key-value
Document
Column
Graph
Key-value
It maintains a big hash table of keys and values.
Ex: Dynamo,Redis,Riak etc
Key Value
First name Sai
Last name kumar
Document Database
It maintains data in collections constituted of
documents.
Ex: MongoDB, Apache CouchDB, Couchbase etc…
Sample document database:
{
“Book name”: “BDA”,
“publication”: “wiley India”
“Year of Publication”: “2011”
}
Column
Each storage block has data from only one column.
Ex: Cassandra, HBase etc…
Graph
Also called as Network database.
Graph stores data in nodes.
Ex: Neo4j,HyperGraphDB…
Why NoSQL
Scale Out architecture
Store variety of data
Dynamic Schema
Auto Sharding: Spread data across different nodes
Replication: Availabity,Fault tolerance and
recovery
Advantages of Nosql
What we miss with NoSQL
Joins
GroupBy
ACID properties
SQL
Easy integration with other applications that
support SQL.
Nosql in industry
SQL vs NoSQL
Hadoop
It is an open source framework given by apache
software foundation for storing and processing
huge datasets with a cluster of commodity
hardware.
HDFS Architecture
Features of hadoop
Optimized to handle massive quantities of data.
Shared nothing architecture
Data replication
High throughput
Complements OLTP and OLAP
NOT good when work cannot be parallelized
NOT good for processing small files
Key advantages of Hadoop
Stores data in native format
Scalable
Cost-effective
Resilient to failure
Flexible
Fast
Versions of Hadoop
Hadoop Ecosystem
Hadoop Ecosystem
HDFS: It simply stores data files .
Hbase: Hadoop’s database. It supports structured data storage for large
databases.
Hive: Similar to ANSI SQL.
Pig: data flow language. Pig scripts are automatically converted to map
reduce programs by pig interpreter.
Zookeeper: coordination service for distributed applications.
Oozie: workflow schedular to manage hadoop jobs.
Mahout : scalable machine learning and data mining library.
Chukwa: data collection system for managing large distributed systems.
Sqoop: data transfer between RDBMS and hadoop.
Ambari : web based tool for provisioninig,managing and monitoring hadoop
cluster.
Anatomy of File read in hadoop
Anatomy of File write in hadoop
Working with HDFS commands
Hadoop fs –ls /
Hadoop fs –ls –R /
Hadoop fs –mkdir /sample
Hadoop fs –put /root/sample/test.txt /sample/test.txt
Hadoop fs –get /sample/test.txt /root/sample/testsample.txt
Hadoop fs –copyfromlocal /roor/sample/test.txt
/sample/testsample.txt
Hadoop fs –cat /sample/test.txt
Hadoop fs –cp /sample/test.txt /sample1
Hadoop fs –rm-r /sample1
Successfully compiled ur drivercode,maper code and reducer code
Export your jar files
WordCount.jar
$hadoop jar WordCount.jar /packdecmo/WordCount
/sample/test.txt /sample/wordcountoutput
Wordcountoutput