You are on page 1of 19

Presto DB

DONE BY :
LAKSANTINI KHALID

FA H D B A R H D A D I

AHMED BENCHEKROUN
Agenda
1 Introduction

2 What is a Query Engine?

3 Presto DB Explained

4 Architecture

5 implementation
Introduction
begins as a Facebook experiment to conduct
interactive Analytic Queries of the use of 3OOPB
Data Warehouse which was built on Hadoop or
hadoopHDFS.
Facebook uses Apache HIVE, which is designed
and released in 2008. who brings SQL syntax to
familiarity with the Hadoop Ecosystem.
Apache Hive made a huge difference in Hadoop
Ecosystem, by converting Java MapReduce
operations into SQL-like queries and allowing
them to run at scale. it was not however
doing high performance required for interactive
searches.
1
What is a Query Engine?

 A piece of software that sits on top of a


database or server and executes queries against
data in that database or server to provide
answers for users or applications.
More specifically, a SQL query engine interprets
SQL commands and language to access data in
relation systems.
Many use SQL query engines for CRUD ( create,
read, update, delete) operations and enforce
data policies that relational data models and
database management systems require.

2
Query Engine VS Databases
 A search engine is a tool that allows users to  Databases: A database is simply a method of
search for information on the internet and is tracking and organizing information, and is
thus useful for location information created by thus a crucial aspect of running a business
Governments, Corporations, Individuals, and
groups.
 Examples of Query Engine include Presto,
Apache Drill, Cloudera Impala, and Apache
Spark.

3
Big Data Analysis
 Big Data analysis uses advanced analytic
techniques for large, heterogeneous big data
sets. Which contains structured, semi-
structured, and unstructured data from many
sources and sizes.
 It has characteristics such as high volume,
high velocity, and high variety.
 Big data analysis may help you make better and
faster decision models and forecast future
outcomes to improve your business intelligence
considering an open source.
AI, mobile devices, social media and the internet
of Things are driving data sources to become more
complicated than traditional data sources (loT) 4
How does It work?
 Presto is a distributed system that runs on
 It is built with storage abstraction to make it to develop
Hadoop and uses an architecture similar to a
classic massively parallel processing (MPP) pluggable connections making it extendable to any data
database management system. source.

Presto is designed to support the standard


NSI SQL semantics including complex queries
aggregations join

 Presto divides the request into many phases


across the worker nodes after the query is
compiled to prevent any needless input output
overhead all processing is done in the memory
and pipelined across a network between these
phases.
3
Presto db core concepts
Presto processes (SQL)
queries that are executed Exchange
Driver and
over a distributed cluster of Operator
coordinators and workers
A stage is a
component of a
Data Sources Stage
Connector, a catalogue, a query statement
schema, and eventually a
table are covered in the
data sources section
The distributed query
strategy is split down
into numerous steps in
Server Types
Tasks and the Presto architecture
The coordinator and Splits
the worker are two
sorts of servers At least one parallel
Execution driver is present in
model for each task.
Queries
Architecture
Multiple workers are coordinated by a single coordinator, the client
inputs SQL statements which are then passed and planned after which
the employees are assigned to parallel jobs workers collaborate to
process rows from data sources and create results for the clients.

Presto'sdesign is comparable to that of other database


management systems that employ cluster computing,
often known as massively parallel processing (MPP).

 Presto does not write intermediate results to the disk


leading to a considerable speed gain over the original
Apache hive execution paradigm which is employed in
the Hadoop map-reduce process on each Query.

3
Hive
Metastore

Presto
Worker

Presto Presto Presto HDFS


CLI Coordinator Worker

Presto
Worker

Schematics of the Presto’s Architecture

3
Implémentations
Implémentations
Implémentations
Implémentations
Implémentations
Implémentations
Conclusion

You might also like