You are on page 1of 46

A Technical Overview of the Vertica

Architecture
Ryan Roelke
Senior Software Engineer
Vertica Architecture
 Key aspects of Vertica’s design
 Database background
 An Analytic Query in 2005 (Industry)
 An Analytic Query in 2005 (Academia)
 An Analytic Query in 2020 (Vertica)
3
https://www.visualcapitalist.com/what-happens-in-an-internet-minute-in-2019/
OLTP (On-Line Transaction Processing)
 Key-value queries
 Lots of small, incremental
changes
- Update to an inventory or bank
account

 Traditional workload

https://www.visualcapitalist.com/what-happens-in-an-internet-minute-in-2019/
OLTP (On-Line Transaction Processing)
 CREATE TABLE accounts (
user_id INTEGER,
user_name VARCHAR(128),
balance INTEGER,

);

 UPDATE accounts
SET balance = balance + 10
WHERE user_id = 123456789;

https://www.visualcapitalist.com/what-happens-in-an-internet-minute-in-2019/
OLAP (On-Line Analytical Processing)
 Identify trends in data over time
 Use data for decision-making
- Failure prevention or detection

 Big Data!

https://www.visualcapitalist.com/what-happens-in-an-internet-minute-in-2019/
OLAP (On-Line Analytical Processing)
 CREATE TABLE searches(
timestamp TIMESTAMP,
user_id INTEGER,
...
);

 SELECT
hour(timestamp),
count(distinct user_id)
FROM searches
GROUP BY hour(timestamp);

https://www.visualcapitalist.com/what-happens-in-an-internet-minute-in-2019/
Rewind to 2005
 "Big Data" is not a term on anyone's radar
 "The Cloud" is also not on anyone's radar
 What is your database doing?
- OLTP
- OLAP is not mainstream yet
1 An Analytic Query in 2005
Example – Stock Prices

SYMBOL PRICE ... ... ... ... TIMESTAMP

GOOGL 1000 ... ... ... ... 04.02.2005-14:00:00

AAPL 192 ... ... ... ... 04.02.2005-14:02:00

GOOGL 1010 ... ... ... ... 04.03.2005-09:01:00

AAPL 197 ... ... ... ... 04.03.2005-09:03:00

10
Example – Stock Prices

stockprices.txt
GOOGL, 1000, …, …, …, …, …, …, …, …, …, 04.02.2005-14:00:00,
AAPL, 192, …, …, …, …, …, …, …, …, …, 04.02.2005-14:02:00,
GOOGL, 1010, …, …, …, …, …, …, …, …, …, 04.03.2005-09:00:00,
AAPL, 193, …, …, …, …, …, …, …, …, …, 04.03.2005-09:03:00,




11
Example – Stock Prices

stockprices.txt
GOOGL, 1000, …, …, 04.02.2005-14:00:00,
AAPL, 192, …, …, 04.02.2005-14:02:00,
Index GOOGL, 1010, …, …, 04.03.2005-09:01:00,
AAPL, 04.02.2005-14:02:00
AAPL, 04.03.2005-09:03:00 AAPL, 197, …, …, 04.03.2005-09:03:00,
GOOGL, 04.02.2005-14:00:00
GOOGL, 04.03.2005-09:01:00 …


12
Example – Stock Prices
▪ UPDATE stockprices
SET price = price + 1
WHERE symbol = 'AAPL'
AND timestamp = '04.03.2005-09:03:00';

stockprices.txt
GOOGL, 1000, …, …, 04.02.2005-14:00:00,
Index AAPL, 192, …, …, 04.02.2005-14:02:00,
AAPL, 04.02.2005-14:02:00
AAPL, 04.03.2005-09:03:00 GOOGL, 1010, …, …, 04.03.2005-09:01:00,
GOOGL, 04.02.2005-14:00:00
GOOGL, 04.03.2005-09:01:00 AAPL, 198, …, …, 04.03.2005-09:03:00

13
Example – Stock Prices
▪ SELECT avg(price) FROM stocks
WHERE symbol = 'AAPL'
AND date(timestamp) = '04.02.2019';

stockprices.txt
GOOGL, 1000, …, …, 04.02.2005-14:00:00,
Index AAPL, 192, …, …, 04.02.2005-14:02:00,
AAPL, 04.02.2005-14:02:00
AAPL, 04.03.2005-09:03:00 GOOGL, 1010, …, …, 04.03.2005-09:01:00,
GOOGL, 04.02.2005-14:00:00
GOOGL, 04.03.2005-09:01:00 AAPL, 198, …, …, 04.03.2005-09:03:00,

14
Example – Stock Prices

stockprices.txt
GOOGL, 1000, …, …, …, …, …, …, …, …, …, 04.02.2005-14:00:00,
AAPL, 192, …, …, …, …, …, …, …, …, …, 04.02.2005-14:02:00,
GOOGL, 1010, …, …, …, …, …, …, …, …, …, 04.03.2005-09:00:00,
AAPL, 193, …, …, …, …, …, …, …, …, …, 04.03.2005-09:03:00,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,

Index …



…, …, …, …, …, …, …, …, …, …,
…, …, …, …, …, …, …, …, …, …,
AAPL, 04.02.2005-14:02:00 … … …, …, …, …, …, …, …, …, …, …,
AAPL, 04.03.2005-09:03:00
GOOGL, 04.02.2005-14:00:00 … … …, …, …, …, …, …, …, …, …, …,
GOOGL, 04.03.2005-09:01:00 … … …, …, …, …, …, …, …, …, …, …,

… … … …, …, …, …, …, …, …, …, …, …,
… … … …, …, …, …, …, …, …, …, …, …,

… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …,
… … …, …, …, …, …, …, …, …, …, …

15
An Analytic Query in 2005:
2
Academia
The C-Store Paper
Memory Hierarchy

Source: https://cs61.seas.harvard.edu/cs61wiki/images/2/28/Architecture.pdf

18
Stock Prices in C-Store (1)

stocks.symbol stocks.values stocks.timestamp


GOOGL 1000 … … … … 04.02.2005-14:00:00
AAPL 192 … … … … 04.02.2005-14:03:00
GOOGL 1010 … … … … 04.03.2005-09:01:00
AAPL 198 … … … … 04.03.2005-09:03:00
... ... ...

19
Stock Prices in C-Store (1)
▪ SELECT avg(price) FROM stocks
WHERE symbol = 'AAPL'
AND date(timestamp) = '04.02.2019';

stocks.symbol stocks.values stocks.timestamp


GOOGL 1000 … … … … 04.02.2005-14:00:00
AAPL 192 … … … … 04.02.2005-14:03:00
GOOGL 1010 … … … … 04.03.2005-09:01:00
AAPL 198 … … … … 04.03.2005-09:03:00
... ... ...

20
Stock Prices in C-Store (1)
symbol values timestamp
GOOGL 1000 04.02.2005-14:00:00
AAPL 192 04.02.2005-14:03:00
GOOGL 1010 04.03.2005-09:01:00
AAPL 198 04.03.2005-09:03:00
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …






... ... ... ... …


… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …
… … …

21
Stock Prices in C-Store (2)
GOOGL 1000 04.02.2005-14:00:00
AAPL 192 04.02.2005-14:03:00
GOOGL 1010 04.03.2005-09:01:00
AAPL 198 04.03.2005-09:03:00
... ... ...

GOOGL 1000 04.02.2005-14:00:00


AAPL 192 04.02.2005-14:03:00
AAPL 193 04.02.2005-14:06:00

GOOGL 1010 04.03.2005-09:01:00


AAPL 198 04.03.2005-09:03:00
22
Stock Prices in C-Store (3)
A GOOGL A 1000 A 04.02.2005-14:00:00
B AAPL B 192 B 04.02.2005-14:03:00
C AAPL C 193 C 04.02.2005-14:06:00
… … …

B AAPL B 192 B 04.02.2005-14:03:00


AAPL 194 04.02.2005-14:05:00
C AAPL C 193 C 04.02.2005-14:06:00
AAPL 194 04.02.2005-14:08:00
A GOOGL A 1000 A 04.02.2005-14:00:00
… … …
23
Stock Prices in C-Store (4)
AAPL 192 04.02.2005-14:03:00
AAPL 197
194 04.02.2005-14:05:00
04.03.2005-09:01:00
AAPL 193 04.02.2005-14:06:00
04.02.2005-14:01:00

AAPL …
194 04.02.2005-14:08:00

GOOGL 1000 04.02.2005-14:00:00
… … …

AAPL, 4 192 04.02.2005-14:03:00


GOOGL, 8 194 04.02.2005-14:05:00
193 04.02.2005-14:06:00
194 04.02.2005-14:08:00
24 … …
C-Store Wrap-Up
 Column-oriented storage
- Only scan the columns a query cares about
 Partition data into multiple file sets
- All rows in a file have the same value for the partition column
 Keep data in sorted order
- Similar values are stored together
 Similar data stored adjacently will compress well
C-Store Wrap-Up
 Is this design useful in an enterprise environment?
One Design Does Not Fit All
 We have designed our storage layer to answer one specific query:

SELECT avg(price) FROM stocks


WHERE symbol = 'AAPL'
AND date(timestamp) = '04.02.2019';

 What if we change the question we're asking?


One Design Does Not Fit All
 What if we want to look at who is trading stocks?
- Data is partitioned on date
- This is probably still useful
- Data is ordered on symbol
- We'll have to look at all of the rows before we know which ones pass
- Rows with the same trader are not necessarily close together
One Server Does Not Fit All
 What if we run out of room for files?
- Lots of small files could use up all inodes
- High data volume could fill the entire disk
 What if the system encounters a problem?
One Deployment Does Not Fit All
 What if we need to integrate with cloud applications?
 What if workload volumes change?
 What if we need to migrate our database somewhere else?
An Analytic Query in 2020:
3
Vertica
What is Vertica?
 First commercial database based on the C-Store paper
- Column-oriented storage
- Table partitioning
- Sorted data

 How does Vertica build upon the C-Store design to solve its problems?
- Provide tools for users to customize their storage
What is Vertica?
 How does Vertica build upon the C-Store design to solve its problems?
- One Design Does Not Fit All:
Provides tools for users to customize their storage
- One Server Does Not Fit All:
Coordinates data and load distribution in a multi-node system
- One Deployment Does Not Fit All:
Eschews commodity hardware and offers features targeted at the cloud
One Design Does Not Fit All
 Table is a logical idea
- Column specification doesn't tell you how they are stored

 Give the user another tool which designs the storage


- CREATE PROJECTION
- Each projection specifies a different way to lay out table data
- Each table can have multiple projections
- Vertica chooses which to use for each query
One Design Does Not Fit All – Stocks Example
 Table is a logical idea
- CREATE TABLE stocks (symbol VARCHAR(8),
price INTEGER, trader VARCHAR(32),
timestamp TIMESTAMP) PARTITION BY date(timestamp);

- CREATE PROJECTION p_stocks_symbol_activity


(symbol ENCODING RLE, price, timestamp)
AS SELECT symbol, price, timestamp
ORDER BY symbol, timestamp;
One Design Does Not Fit All – Stocks Example
 Table is a logical idea
- CREATE TABLE stocks (symbol VARCHAR(8),
price INTEGER, trader VARCHAR(32),
timestamp TIMESTAMP) PARTITION BY date(timestamp);

- CREATE PROJECTION p_stocks_trader_activity


(trader ENCODING RLE, timestamp, symbol, price)
AS SELECT trader, timestamp, symbol, price
ORDER BY trader, timestamp, symbol;
One Design Does Not Fit All
 How should you decide what design is best for your queries?
- Our "Database designer" tool will help you
- Learn more in the session
"Vertica Database Designer - Today and Tomorrow"
One Server Does Not Fit All
 Data volume in 2020 dwarfs data volume in 2005
 Vertica distributes data in a multi-node system
- Each node keeps some data in a locally-attached disk
- Data is replicated for high availability
- All nodes are created equal

Vertica Node
Vertica Node Vertica Node

Storage
Storage Storage
One Server Does Not Fit All
 Projections can be segmented,
mapping row to node based on some
property of the row
 Projections have K-safety, which sets
data redundancy

 CREATE PROJECTION p_stockprices


(symbol ENCODING RLE,
price, timestamp)
AS SELECT symbol, price, timestamp
FROM stockprices
ORDER BY symbol, timestamp
SEGMENTED BY HASH(symbol) ALL NODES
KSAFE 1;
One Server Does Not Fit All
 CREATE PROJECTION p_stockprices (symbol ENCODING RLE, price, timestamp)
AS SELECT symbol, price, timestamp FROM stockprices
ORDER BY symbol, timestamp SEGMENTED BY HASH(symbol) ALL NODES KSAFE 1;

hash(symbol) 0212348234

GOOGL, 1000, …, …, 04.02.2005-14:00:00


One Deployment Does Not Fit All
 Segmentation design scales to any size of deployment
 Architecture is hardware-independent
One Deployment Does Not Fit All

 Changing the cluster configuration moves a lot of data

858993459

0 N1 N2 1717986918

N5 N3
N4
3435973836 2576980377
One Deployment Does Not Fit All
 Vertica Eon Mode
- Designed for cloud integration and rapidly-
changing cluster sizes
 Learn more at the breakout session
"Eon Mode: Past, Present and Future"
Vertica – Further Reading

 Conference publications
- C-Store: http://db.csail.mit.edu/projects/cstore/vldb.pdf
- Vertica Seven Years Later:
http://vldb.org/pvldb/vol5/p1790_andrewlamb_vldb2012.pdf
- Eon Mode: https://www.vertica.com/wp-
content/uploads/2018/05/Vertica_EON_SIGMOD_Paper.pdf

 Vertica Documentation
- https://www.vertica.com/docs/9.3.x/HTML/Content/Home.htm

44
Learn More: academy.vertica.com
Try it Free: vertica.com/try

You might also like