Professional Documents
Culture Documents
Enabling the Semantic Web
at Web‐Scale
http://dataspora.com/blog/tipping‐points‐and‐big‐data/
http://www.wired.com/wired/archive/14.10/cloudware.html
http://radar.oreilly.com/2009/04/linkedin‐chief‐scientist‐on‐analytics‐and‐bigdata.html
http://www.nature.com/nature/journal/v455/n7209/full/455001a.html
http://queue.acm.org/detail.cfm?id=1563874
• Map / reduce
– Lowers the bar for distributed computing
– Good for data locality in inputs
• E.g., documents in, hash‐partitioned full text index out.
• Sparse row stores
– High read / write concurrency using atomic row operations
– Basic data model is
• { primary key, column name, timestamp } : { value }
Semantic web database
Client Services
- Distributed job execution
Scattered journal
journal
writes journal
overflow
Gathered index
reads segments
root
n0 n1 … nN
t0 t1 t2 t3 t4 t5 t6 t7
p1 p2 join
p3
Moves redistribute
the data onto
move existing or new
p3 p4
nodes in the
cluster.
Data
Service 1
Metadata
Service p0
Data
Service 1
Data
Service 1
• <mike, memberOf, SYSTAP>
• <http://www.systap.com, sourceOf, ....>
• But you CAN NOT say that in RDF.
• Then you can say,
<http://www.systap.com, sourceOf, _s1>
• SIDs look just like blank nodes
• And you can use them in SPARQL
construct { ?s <memberOf> ?o . ?s1 ?p1 ?sid . }
where {
?s1 ?p1 ?sid .
GRAPH ?sid { ?s <memberOf> ?o }
}
• Lot’s of work was required to get high throughput!
1.0
0.8
Told Triples
0.6
ID2TERM
0.4 Splits
0.2
Scatter
Splits 0.0
1 11 21 31 41 51
time (minutes)
Scattered
Task writes
queue
P1
Pn
P1
Scattered
Task writes
Queue Pn
P1
Pn
SPO#1
SPO P1
P2
SPO#2
P3
SPO#3
Client Data
Services
SELECT ?x WHERE {
?x a ub:GraduateStudent ;
ub:takesCourse
<http://www.Department0.University0.edu/GraduateCourse0>.
}
Translated query:
Query execution plan (access paths selected, joins reordered):
spo#3(x,8,256)
SPO#3
POS#1
Client Data
Services
pos(x 400 3048) spo(x 8 256)
• Cluster of 10 nodes.
• 60% improvement in one week.
[1] Jesse Weaver, James A. Hendler. Parallel Materialization of the Finite RDFS Closure for Hundreds
of Millions of Triples.
[2] Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen. Department of Computer
Science, Vrije Universiteit Amsterdam, the Netherlands, Scalable Distributed Reasoning using
MapReduce.
bigdata ®
Flexible
Reliable
Affordable
Web‐scale computing.