You are on page 1of 47

Astronomy, Petabytes, and

MySQL

MySQL Conference
Santa Clara, CA
April 16, 2008

Kian-Tat Lim
Stanford Linear Accelerator Center
Outline

LSST
LSST Database
LSST Database + MySQL

2 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST

What Is It?
Why Build It?

3 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST

What Is It?
Why Build It?

4 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Telescope

Proposed
telescope to be
built in Chile
5 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Large

3.2 gigapixel camera

8.4 meter diameter mirror

6 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Synoptic Survey

Wide

Deep

Fast
7 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST

What Is It?
Why Build It?

8 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Dark Matter and Energy

Photo: J. A. Tyson, W. Colley, E.  9 / 47
MySQL Conference
L. Turner, and NASA April 16, 2008 Santa Clara, CA
Variable Objects

10 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Transient Objects

11 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Moving Objects

Photo: D. Roddy, Lunar and Planetary Institute 12 / 47


MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database

What’s In It?
How Big?
How Often?
What Queries?
Unusual Needs
13 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database

What’s In It?
How Big?
How Often?
What Queries?
Unusual Needs
14 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Database: Components

Moving
Objects Object Catalog
Catalog

Provenance
Source Catalog Statistics
Summaries

Difference Image Source Catalog

Image Metadata

Calibration Engineering and Facility Database

15 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Astronomical Objects

Moving
Objects Object Catalog
Catalog

Provenance
Source Catalog Statistics
Summaries

Difference Image Source Catalog

Image Metadata

Calibration Engineering and Facility Database

16 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Sources

Moving
Objects Object Catalog
Catalog

Provenance
Source Catalog Statistics
Summaries

Difference Image Source Catalog

Image Metadata

Calibration Engineering and Facility Database

17 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Changes

Moving
Objects Object Catalog
Catalog

Provenance
Source Catalog Statistics
Summaries

Difference Image Source Catalog

Image Metadata

Calibration Engineering and Facility Database

18 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Image Metadata

Moving
Objects Object Catalog
Catalog

Provenance
Source Catalog Statistics
Summaries

Difference Image Source Catalog

Image Metadata

Calibration Engineering and Facility Database

19 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Calibration and Facility

Moving
Objects Object Catalog
Catalog

Provenance
Source Catalog Statistics
Summaries

Difference Image Source Catalog

Image Metadata

Calibration Engineering and Facility Database

20 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database

What’s In It?
How Big?
How Often?
What Queries?
Unusual Needs
21 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Sagans of Rows

49 billion objects

2.8 trillion sources

22 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Lots of Columns

308 columns for objects

56 columns for sources

(for now)
23 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Database Size

Grows to >14 PB
24 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database

What’s In It?
How Big?
How Often?
What Queries?
Unusual Needs
25 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Frequency

Nightly updates

Semi-annual data
releases

26 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database

What’s In It?
How Big?
How Often?
What Queries?
Unusual Needs
27 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Queries

•All about an object


•All objects meeting criteria
•All objects near objects meeting
criteria
•All objects with interesting time
series
•All pairs of objects with similar time
series
28 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database

What’s In It?
How Big?
How Often?
What Queries?
Unusual Needs
29 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Unusual Needs

Flexibility

Provenance

30 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database + MySQL

Why MySQL?
Scalability?
Performance?

31 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database + MySQL

Why MySQL?
Scalability?
Performance?

32 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
MySQL

Relational database
management system
33 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Open Source

Vibrant community

Strong company support

34 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Hardware

Runs on commodity
hardware

35 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
In-Memory Tables

Needed for near-real-time


processing

36 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database + MySQL

Why MySQL?
Scalability?
Performance?

37 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
“MySQL Grid”

38 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Partitioning

Large tables partitioned


spatially
39 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Replication

Dimension tables likely


replicated
40 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Needs: Distributor/Combiner

LSST will build prototype


Need long-term support

41 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
LSST Database + MySQL

Why MySQL?
Scalability?
Performance?

42 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Per-Column Indexing

2X data size
43 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Needs: Optimizer

Efficient use of multiple


(20-30) indexes

44 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Needs: Indexes

Bitmap/compressed
indexes

45 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Needs: Storage Engine

“Shared scan” for long-


running full-table queries
46 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA
Summary

Building a petabyte DB

MySQL can be a core


component

47 / 47
MySQL Conference
April 16, 2008 Santa Clara, CA