You are on page 1of 10

What Is Apache Tez ?

● An application framework
● Build on top of Apache Hadoop YARN
● Uses directed-acyclic-graphs ( DAG's )
● Open source / Apache 2.0 license
● Scaleable
● Performant
Hadoop Eco Sphere
Tez DAG

● Tez directed-acyclic-graphs ( DAG )


● Distributed data processing
● Vertices represent data transformation
● Edges represent data movement
● For data processing applications
● TEZ is an execution engine
● Built on top of YARN
Tez Performance

● Performance improvement compared to Map Reduce


– No need for HDFS storage between MR jobs
– Better execution performance
● Expressive dataflow API for DAG
– Visualise what you wish to construct
– Add processor vertices to graph
– Add data movement edges to graph
– To build the computational DAG that you require
Tez Deployment

● Tez is client side


● Install Tez client locally
● Build task DAG
● Load DAG/Tez libraries to HDFS
● Execute YARN based job
– From Tez client
– Using HDFS based DAG library
Tez Existing MR Tasks

● Tez can process existing Map Reduce ( MR ) tasks


● No need for any modification
● Allows for phased migration
– Of existing MR jobs to DAG's
● Allows for near real time task types
● Rather than just MR tasks which are
– Batch oriented
– Iterative
– Resource intensive
Tez API

● Tez DAG defines the job


● Vertex defines one DAG job step
– Requires user logic and resources for step
● Edge defines one DAG data movement step
– From producer to consumer
– Edge properties define movement
● How data moves
● Schedules when data moves relationally
● Defines durability of data
Tez Hive

● Increased performance
– Compared to Map Reduce usage
● No need to use HDFS for intermediate steps
● Greater parallelism via DAG's
● Less complex steps in DAG compared to MR
● Reduced latency
● Higher throughput
● Better speed
Available Books

● See “Big Data Made Easy”


– Apress Jan 2015

See “Mastering Apache Spark”
– Packt Oct 2015

See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”

● Find the author on Amazon


– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

Connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
Connect

● Feel free to connect on LinkedIn


– www.linkedin.com/in/mike-frampton-38563020

● See my open source blog at


– open-source-systems.blogspot.com/

● I am always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration

You might also like