You are on page 1of 41

Bigdata®: 

Enabling the Semantic Web 
at Web‐Scale

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Presentation outline
• What is big data?
• Bigdata Architecture
• Bigdata RDF Database
• Performance
• Roadmap

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
What is “big data?”
• Big data is a new way of thinking about and processing 
massive data.
– Petabyte scale
– Commodity hardware
– Distributed processing

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
The origins of “big data”
• Google published several inspiring papers that have captured a huge 
mindshare.
– GDFS, Map/Reduce, bigtable.
• Competition has emerged among “cloud as service” providers:
– E3, S3, GAE, BlueCloud, Cloudera, etc.
• An increasing number of open source efforts provide cloud computing 
frameworks:
– Hadoop, Bigdata, CouchDB, Hypertable, mg4j, Cassandra….

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Who has “big data?”
• USG
• Finance
• Biomedical & Pharmaceutical
• Large corporations
• Major web players
• High energy physics

http://dataspora.com/blog/tipping‐points‐and‐big‐data/
http://www.wired.com/wired/archive/14.10/cloudware.html
http://radar.oreilly.com/2009/04/linkedin‐chief‐scientist‐on‐analytics‐and‐bigdata.html
http://www.nature.com/nature/journal/v455/n7209/full/455001a.html
http://queue.acm.org/detail.cfm?id=1563874

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Technologies that go “big”
• Distributed file systems
– GFS, S3, HDFS

• Map / reduce
– Lowers the bar for distributed computing
– Good for data locality in inputs
• E.g., documents in, hash‐partitioned full text index out.

• Sparse row stores
– High read / write concurrency using atomic row operations
– Basic data model is
• { primary key, column name, timestamp } : { value }

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
The killer big data application
• Clouds + “Open” Data = Big Data Integration
• Critical advantages
– Fast integration cycle
– Open standards
– Integrates heterogeneous data, linked data, 
structured data.
– Opportunistic exploitation of data, including data 
which can not be integrated quickly enough today 
to derive its business value.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bigdata® Architecture

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
bigdata®
• Petabyte scale • High performance
• Dynamic sharding • High concurrency (MVCC)
• Commodity hardware • HA Architecture
• Open source, Java • Temporal database

Semantic web database

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Key Differentiators
• Dynamic sharding
– Incrementally scale from 10s, to 100s, to 1000s of nodes.
• Temporal database
– Fast access to historical database states.
• HA Architecture
– Built in design for high availability.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bigdata® Services
Centralized services Distributed services

Transaction Metadata Load


Manager Service Balancer Data Services
- Index data
- Join processing

Client Services
- Distributed job execution

• Jini – Service discovery.


• Zookeeper – Configuration management, global
locks, and master elections.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Service Discovery
Clients Metadata Data 1. Services discover service
Service Services registrars and advertise
themselves.
3. Discover 2. Clients discover registrars,
& locate
lookup the metadata service,
and use it to obtain locators
spanning key ranges of
interest for a scale-out index.
2. Advertise
& monitor

3. Clients resolve locators to


data service identifiers, then
lookup the data services in
1. advertise the service registrar.
4. Clients talk directly to data
services.
Jini Registrar
5. Client libraries encapsulate
this for applications.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
The Data Service

Scattered journal
journal
writes journal

overflow

Gathered index
reads segments

Append only journals and read-


Clients Data optimized index segments are
Services basic building blocks.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bigdata® Indices
• Dynamically key‐range partitioned B+Trees for indices
– Index entries (tuples) map unsigned byte[ ] keys to byte[ ] values.
– Tuples also have “delete flag” and timestamp
• Index partitions distributed across data services on a cluster
– Located by centralized metadata service

root

n0 n1 … nN

t0 t1 t2 t3 t4 t5 t6 t7

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Dynamic Key Range Partitioning
Splits break down
p0
split
p1 p2
the indices
dynamically as the
data scale
increases.

p1 p2 join
p3

Moves redistribute
the data onto
move existing or new
p3 p4
nodes in the
cluster.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Dynamic Key Range Partitioning
• Initial conditions place a 
single index partition on 
an arbitrary host 
representing the entire 
B+Tree.
([],∞)
p0
Metadata
Service

Data
Service 1

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Dynamic Key Range Partitioning
• Writes cause the 
partition to grow.  
Eventually its size on disk 
will exceed a 
preconfigured threshold.
([],∞)

Metadata
Service p0

Data
Service 1

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Dynamic Key Range Partitioning
• Instead of a simple two‐
way split, the initial 
partition is “scatter‐split”
so that all data services 
p1 can start managing data.
([],∞) p7 • Nine data services in this 
p2 p3
Metadata example.
Service p8 p9
p4 p5 p6

Data
Service 1

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Dynamic Key Range Partitioning
• The newly created 
(1) partitions are then 
p1
Data moved to the various 
Service 1 data services.
• Subsequent splits are 
(2) p2 two‐way and moves 
Metadata Data occur based on relative 
Service Service 2
(…) server load (decided by 
load balancer service).
(9) p9
Data
Service 9

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bigdata® Scale‐Out Math
L0 metadata L0 200M L0 metadata partition with 256 byte records

L1 metadata L1 L1 L1 200M L1 metadata partition with 1024 byte records.

Index partitions p0 p1 pn 200M per application index partition.

• L0 alone can address 16 Terabytes.


• L1 can address 30 Exabytes per index.
• Even larger address spaces if L0 > 200M.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bigdata® RDF Database

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bigdata® RDF Database
• Covering indices (ala YARS, etc).
• Three database modes:
– triples, provenance, or quads.
• Very high data rates
• High‐level query (SPARQL)

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Covering Indices

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
RDF Database Modes
• Triples
– 2 lexicon indices, 3 statement indices.
– RDFS+ inference.
– All you need for lots of applications.
• Provenance
– Datum level provenance.
– Query for statement metadata using SPARQL.
– No complex reification.
– No new indices.
– RDFS+ inference.
• Quads
– Named graph support.
– Useful for lots of things, including some provenance schemes.
– 6 statement indices, so nearly twice the footprint on the disk.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Statement Level Provenance
• Important to know where data came from in a 
mashup

• <mike, memberOf, SYSTAP>
• <http://www.systap.com, sourceOf, ....>

• But you CAN NOT say that in RDF.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
RDF “Reification”
• Creates a “model” of the statement.

<_s1, subject, mike>


<_s1, predicate, memberOf>
<_s1, object, SYSTAP>
<_s1, type, Statement>

• Then you can say,
<http://www.systap.com, sourceOf, _s1>

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
bigdata® Statement Identifiers (SIDs)
• Statement identifiers let you do exactly what 
you want:
<mike, memberOf, SYSTAP, _s1>
<http://www.systap.com, sourceOf, _s1>

• SIDs look just like blank nodes
• And you can use them in SPARQL
construct { ?s <memberOf> ?o . ?s1 ?p1 ?sid . }
where {
?s1 ?p1 ?sid .
GRAPH ?sid { ?s <memberOf> ?o }
}

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bulk Data Load
• Very high data load rates
– 1B triples in under an hour (10 data nodes, 4 clients)
• Executed as a distributed job
– Read data from a file system, the web, HDFS, etc.
• Database remains available for query during load
– Read from historical commit points.

• Lot’s of work was required to get high throughput!

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Identifying and Resolving Performance 
Bottlenecks
Jan Fed Mar Apr May Jun 300,000 triples per second (less than
300k one hour for LUBM 8000).

Asynchronous writes for TERM2ID,


Triples Per Second

reduced RAM demands; increased


parser threads. 13B triples loaded.
200k
Eliminated machine and shard hot
spots; asynchronous write API (130k).

100k Faster & smarter moves for shards

Increased write service concurrency


(70k)

Baseline on cluster (30k)

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bigdata® U8000 Data Load
Told Triples Loaded

1.2 310k tps


Billions

1.0

0.8
Told Triples

0.6

ID2TERM
0.4 Splits
0.2
Scatter
Splits 0.0
1 11 21 31 41 51
time (minutes)

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Remaining Bottlenecks
• Index partition splits
– Tend to occur together.
– Fix is to schedule splits proactively.
• Indices
– Faster index segment builds.
– Various hotspots (shared concurrent LRU).
• Clients
– Buffer index writes for the target host, not the target shard.
• Can we double performance again?

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Asynchronous index write API
• Shared, asynchronous write buffers
– Decouples client from latency of write requests
– Transparently handles index partition splits, moves, etc.
– Filters duplicates before RMI
– Chunkier writes on indices
– Much higher throughput

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Distributed Data Load Job

Scattered
Task writes
queue

Job Master Clients Data


Services

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Writes are buffered inside the client

P1

Pn

P1

Scattered
Task writes
Queue Pn

P1

Pn

Job Master Clients Data


Services

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Client scatters writes against indices

SPO#1

SPO P1

P2
SPO#2
P3

SPO#3

Client Data
Services

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Query evaluation
• Nested subquery
– Clients demand data from the shards, process joins locally.
– Can generate a huge number of RMI requests.
• Pipeline joins
– Map binding sets over the shards, executing joins close to the data.
– 50x faster for distributed query (based on earlier data distribution 
patterns).
• New join algorithms
– E.g., push statement patterns
– Latency and resource requirements
– Etc.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Preparing a query
Original query:

SELECT ?x WHERE {
?x a ub:GraduateStudent ;
ub:takesCourse
<http://www.Department0.University0.edu/GraduateCourse0>.
}

Translated query:

query :- (x 8 256) ^ (x 400 3048)

Query execution plan (access paths selected, joins reordered):

query :- pos(x 400 3048) ^ spo(x 8 256)

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Pipeline join execution
SPO#1 spo#1(x,8,256)
POS#2

Join Master spo#2(x,8,256)


SPO#2
Task POS#3 pos#2(x,400,3048)

spo#3(x,8,256)
SPO#3
POS#1

Client Data
Services
pos(x 400 3048) spo(x 8 256)

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Query Performance
10/22/2009 #trials=10 #parallel=1
Query Time Result# delta-t % change
query1 254 4 24 10%
query2 8,212,149 2,528 (10,227,868) -55%
query3 194 6 (67) -26%
query4 876 34 (422) -33%
query5 1932 719 (75) -4%
query6 713,445 69,222,196 (2,477,634) -78%
query7 838 61 (29) -3%
query8 3239 6463 (2,539) -44%
query9 2,851,182 1,379,952 (2,699,119) -49%
query10 121 0 (11) -8%
query11 261 0 (88) -25%
query12 1709 0 (227) -12%
query13 47 0 9 24%
query14 646,517 63,400,587 (2,426,916) -79%
Total 12,432,764 134,012,550 (17,834,962) -59%

• Cluster of 10 nodes.
• 60% improvement in one week.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bigdata® Roadmap
• Parallel materialization of RDFS closure[1,2]
• Distributed query optimization
• High‐Availability architecture

[1] Jesse Weaver, James A. Hendler. Parallel Materialization of the Finite RDFS Closure for Hundreds
of Millions of Triples.
[2] Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen. Department of Computer
Science, Vrije Universiteit Amsterdam, the Netherlands, Scalable Distributed Reasoning using
MapReduce.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Bryan Thompson
Chief Scientist
SYSTAP, LLC
bryan@systap.com

bigdata  ®

Flexible
Reliable
Affordable
Web‐scale computing.

bigdata® SYSTAP™, LLC


Presented at ISWC 2009 © 2006-2009 All Rights Reserved

You might also like