Bigdata®: Enabling The Semantic Web at Web Scale

Bigdata®:
Enabling the Semantic Web
at Web‐Scale
bigdata® SYSTAP™, LLC

Presented at ISWC 2009 © 2006-2009 All Rights Reserved
Presentation outline
• What is big data?
• Bigdata Architecture
• Bigdata RDF Database
• Performance
• Roadmap

What is “big data?”
• Big data is a new way of thinking about and processing
massive data.
– Petabyte scale
– Commodity hardware
– Distributed processing

The origins of “big data”
• Google published several inspiring papers that have captured a huge
mindshare.
– GDFS, Map/Reduce, bigtable.
• Competition has emerged among “cloud as service” providers:
– E3, S3, GAE, BlueCloud, Cloudera, etc.
• An increasing number of open source efforts provide cloud computing
frameworks:
– Hadoop, Bigdata, CouchDB, Hypertable, mg4j, Cassandra….

Who has “big data?”
• USG
• Finance
• Biomedical & Pharmaceutical
• Large corporations
• Major web players
• High energy physics
http://dataspora.com/blog/tipping‐points‐and‐big‐data/
http://www.wired.com/wired/archive/14.10/cloudware.html
http://radar.oreilly.com/2009/04/linkedin‐chief‐scientist‐on‐analytics‐and‐bigdata.html
http://www.nature.com/nature/journal/v455/n7209/full/455001a.html
http://queue.acm.org/detail.cfm?id=1563874

Technologies that go “big”
• Distributed file systems
– GFS, S3, HDFS
• Map / reduce
– Lowers the bar for distributed computing
– Good for data locality in inputs
• E.g., documents in, hash‐partitioned full text index out.
• Sparse row stores
– High read / write concurrency using atomic row operations
– Basic data model is
• { primary key, column name, timestamp } : { value }

The killer big data application
• Clouds + “Open” Data = Big Data Integration
• Critical advantages
– Fast integration cycle
– Open standards
– Integrates heterogeneous data, linked data,
structured data.
– Opportunistic exploitation of data, including data
which can not be integrated quickly enough today
to derive its business value.

Bigdata® Architecture

bigdata®
• Petabyte scale • High performance
• Dynamic sharding • High concurrency (MVCC)
• Commodity hardware • HA Architecture
• Open source, Java • Temporal database
Semantic web database

Key Differentiators
• Dynamic sharding
– Incrementally scale from 10s, to 100s, to 1000s of nodes.
• Temporal database
– Fast access to historical database states.
• HA Architecture
– Built in design for high availability.

Bigdata® Services
Centralized services Distributed services
Transaction Metadata Load

Manager Service Balancer Data Services
- Index data
- Join processing
Client Services
- Distributed job execution
• Jini – Service discovery.

• Zookeeper – Configuration management, global
locks, and master elections.

Service Discovery
Clients Metadata Data 1. Services discover service
Service Services registrars and advertise
themselves.
3. Discover 2. Clients discover registrars,
& locate
lookup the metadata service,
and use it to obtain locators
spanning key ranges of
interest for a scale-out index.
2. Advertise
& monitor
3. Clients resolve locators to

data service identifiers, then
lookup the data services in
1. advertise the service registrar.
4. Clients talk directly to data
services.
Jini Registrar
5. Client libraries encapsulate
this for applications.

The Data Service
Scattered journal
journal
writes journal
overflow
Gathered index
reads segments
Append only journals and read-

Clients Data optimized index segments are
Services basic building blocks.

Bigdata® Indices
• Dynamically key‐range partitioned B+Trees for indices
– Index entries (tuples) map unsigned byte[ ] keys to byte[ ] values.
– Tuples also have “delete flag” and timestamp
• Index partitions distributed across data services on a cluster
– Located by centralized metadata service
root
n0 n1 … nN
t0 t1 t2 t3 t4 t5 t6 t7

Dynamic Key Range Partitioning
Splits break down
p0
split
p1 p2
the indices
dynamically as the
data scale
increases.
p1 p2 join
p3
Moves redistribute
the data onto
move existing or new
p3 p4
nodes in the
cluster.

• Initial conditions place a
single index partition on
an arbitrary host
representing the entire
B+Tree.
([],∞)
p0
Metadata
Service
Data
Service 1

• Writes cause the
partition to grow.
Eventually its size on disk
will exceed a
preconfigured threshold.
([],∞)
Metadata
Service p0
Data
Service 1

• Instead of a simple two‐
way split, the initial
partition is “scatter‐split”
so that all data services
p1 can start managing data.
([],∞) p7 • Nine data services in this
p2 p3
Metadata example.
Service p8 p9
p4 p5 p6
Data
Service 1

• The newly created
(1) partitions are then
p1
Data moved to the various
Service 1 data services.
• Subsequent splits are
(2) p2 two‐way and moves
Metadata Data occur based on relative
Service Service 2
(…) server load (decided by
load balancer service).
(9) p9
Data
Service 9

Bigdata® Scale‐Out Math
L0 metadata L0 200M L0 metadata partition with 256 byte records
L1 metadata L1 L1 L1 200M L1 metadata partition with 1024 byte records.
Index partitions p0 p1 pn 200M per application index partition.
• L0 alone can address 16 Terabytes.

• L1 can address 30 Exabytes per index.
• Even larger address spaces if L0 > 200M.

Bigdata® RDF Database

Bigdata® RDF Database
• Covering indices (ala YARS, etc).
• Three database modes:
– triples, provenance, or quads.
• Very high data rates
• High‐level query (SPARQL)

Covering Indices

RDF Database Modes
• Triples
– 2 lexicon indices, 3 statement indices.
– RDFS+ inference.
– All you need for lots of applications.
• Provenance
– Datum level provenance.
– Query for statement metadata using SPARQL.
– No complex reification.
– No new indices.
– RDFS+ inference.
• Quads
– Named graph support.
– Useful for lots of things, including some provenance schemes.
– 6 statement indices, so nearly twice the footprint on the disk.

Statement Level Provenance
• Important to know where data came from in a
mashup
• <mike, memberOf, SYSTAP>
• <http://www.systap.com, sourceOf, ....>
• But you CAN NOT say that in RDF.

RDF “Reification”
• Creates a “model” of the statement.
<_s1, subject, mike>

<_s1, predicate, memberOf>
<_s1, object, SYSTAP>
<_s1, type, Statement>
• Then you can say,
<http://www.systap.com, sourceOf, _s1>

bigdata® Statement Identifiers (SIDs)
• Statement identifiers let you do exactly what
you want:
<mike, memberOf, SYSTAP, _s1>
<http://www.systap.com, sourceOf, _s1>
• SIDs look just like blank nodes
• And you can use them in SPARQL
construct { ?s <memberOf> ?o . ?s1 ?p1 ?sid . }
where {
?s1 ?p1 ?sid .
GRAPH ?sid { ?s <memberOf> ?o }
}

Bulk Data Load
• Very high data load rates
– 1B triples in under an hour (10 data nodes, 4 clients)
• Executed as a distributed job
– Read data from a file system, the web, HDFS, etc.
• Database remains available for query during load
– Read from historical commit points.
• Lot’s of work was required to get high throughput!

Identifying and Resolving Performance
Bottlenecks
Jan Fed Mar Apr May Jun 300,000 triples per second (less than
300k one hour for LUBM 8000).
Asynchronous writes for TERM2ID,

Triples Per Second
reduced RAM demands; increased

parser threads. 13B triples loaded.
200k
Eliminated machine and shard hot
spots; asynchronous write API (130k).
100k Faster & smarter moves for shards
Increased write service concurrency

(70k)
Baseline on cluster (30k)

Bigdata® U8000 Data Load
Told Triples Loaded
1.2 310k tps

Billions
1.0
0.8
Told Triples
0.6
ID2TERM
0.4 Splits
0.2
Scatter
Splits 0.0
1 11 21 31 41 51
time (minutes)

Remaining Bottlenecks
• Index partition splits
– Tend to occur together.
– Fix is to schedule splits proactively.
• Indices
– Faster index segment builds.
– Various hotspots (shared concurrent LRU).
• Clients
– Buffer index writes for the target host, not the target shard.
• Can we double performance again?

Asynchronous index write API
• Shared, asynchronous write buffers
– Decouples client from latency of write requests
– Transparently handles index partition splits, moves, etc.
– Filters duplicates before RMI
– Chunkier writes on indices
– Much higher throughput

Distributed Data Load Job
Scattered
Task writes
queue
Job Master Clients Data

Services

Writes are buffered inside the client
P1
Pn
P1
Scattered
Task writes
Queue Pn
P1
Pn
Job Master Clients Data

Services

Client scatters writes against indices
SPO#1
SPO P1
P2
SPO#2
P3
SPO#3
Client Data
Services

Query evaluation
• Nested subquery
– Clients demand data from the shards, process joins locally.
– Can generate a huge number of RMI requests.
• Pipeline joins
– Map binding sets over the shards, executing joins close to the data.
– 50x faster for distributed query (based on earlier data distribution
patterns).
• New join algorithms
– E.g., push statement patterns
– Latency and resource requirements
– Etc.

Preparing a query
Original query:
SELECT ?x WHERE {
?x a ub:GraduateStudent ;
ub:takesCourse
<http://www.Department0.University0.edu/GraduateCourse0>.
}
Translated query:
query :- (x 8 256) ^ (x 400 3048)
Query execution plan (access paths selected, joins reordered):
query :- pos(x 400 3048) ^ spo(x 8 256)

Pipeline join execution
SPO#1 spo#1(x,8,256)
POS#2
Join Master spo#2(x,8,256)

SPO#2
Task POS#3 pos#2(x,400,3048)
spo#3(x,8,256)
SPO#3
POS#1
Client Data
Services
pos(x 400 3048) spo(x 8 256)

Query Performance
10/22/2009 #trials=10 #parallel=1
Query Time Result# delta-t % change
query1 254 4 24 10%
query2 8,212,149 2,528 (10,227,868) -55%
query3 194 6 (67) -26%
query4 876 34 (422) -33%
query5 1932 719 (75) -4%
query6 713,445 69,222,196 (2,477,634) -78%
query7 838 61 (29) -3%
query8 3239 6463 (2,539) -44%
query9 2,851,182 1,379,952 (2,699,119) -49%
query10 121 0 (11) -8%
query11 261 0 (88) -25%
query12 1709 0 (227) -12%
query13 47 0 9 24%
query14 646,517 63,400,587 (2,426,916) -79%
Total 12,432,764 134,012,550 (17,834,962) -59%
• Cluster of 10 nodes.
• 60% improvement in one week.

Bigdata® Roadmap
• Parallel materialization of RDFS closure[1,2]
• Distributed query optimization
• High‐Availability architecture
[1] Jesse Weaver, James A. Hendler. Parallel Materialization of the Finite RDFS Closure for Hundreds
of Millions of Triples.
[2] Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen. Department of Computer
Science, Vrije Universiteit Amsterdam, the Netherlands, Scalable Distributed Reasoning using
MapReduce.

Bryan Thompson
Chief Scientist
SYSTAP, LLC
bryan@systap.com
bigdata ®
Flexible
Reliable
Affordable
Web‐scale computing.


Bigdata®: Enabling The Semantic Web at Web Scale

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bigdata®: Enabling The Semantic Web at Web Scale

Uploaded by

Copyright:

Available Formats

Bigdata®:

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

Transaction Metadata Load

• Jini – Service discovery.

bigdata® SYSTAP™, LLC

3. Clients resolve locators to

bigdata® SYSTAP™, LLC

Append only journals and read-

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

L1 metadata L1 L1 L1 200M L1 metadata partition with 1024 byte records.

Index partitions p0 p1 pn 200M per application index partition.

• L0 alone can address 16 Terabytes.

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

<_s1, subject, mike>

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

Asynchronous writes for TERM2ID,

reduced RAM demands; increased

100k Faster & smarter moves for shards

Increased write service concurrency

Baseline on cluster (30k)

bigdata® SYSTAP™, LLC

1.2 310k tps

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

Job Master Clients Data

bigdata® SYSTAP™, LLC

Job Master Clients Data

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

query :- (x 8 256) ^ (x 400 3048)

query :- pos(x 400 3048) ^ spo(x 8 256)

bigdata® SYSTAP™, LLC

Join Master spo#2(x,8,256)

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

bigdata® SYSTAP™, LLC

You might also like