Professional Documents
Culture Documents
Roger Lucena Master Thesis - Sorbonne - BadWolf - Internship Project at Google (2020)
Roger Lucena Master Thesis - Sorbonne - BadWolf - Internship Project at Google (2020)
BadWolf
Internship Project
Abuse @BHZ Weekly • November 13, 2020
opensource.google/projects/badwolf
github.com/google/badwolf
2. Project
a. Optimization (FILTER keyword)
b. Profiling
c. Tracing
d. Usability (query language)
3. Demo!
4. Wrap-up
BadWolf overview
What is BW?
Why did we need it?
How is it used by Google?
A glimpse into BQL
Born in
Shrewsbury
BadWolf
data
What is it? mode data
l”
Born in
Shrewsbury
BadWolf
data
What is it? mode data
l”
Shrewsbury
BadWolf
data
What is it? mode data
l”
Shrewsbury
Population
60,000
Counter Abuse Technology Attorney / Client Privileged and Confidential 12
(Reso RDF
urce D
Fram escription
ewor
k)
W
“meta 3C
BadWolf
data
What is it? mode data
l”
Population
60,000
Counter Abuse Technology Attorney / Client Privileged and Confidential 13
(Reso RDF
urce D
Fram escription
ewor
k)
W
“meta 3C
BadWolf
data
What is it? mode data
l”
Population
60,000
Counter Abuse Technology Attorney / Client Privileged and Confidential 14
(Reso RDF
urce D
Fram escription
ewor
k)
W
“meta 3C
BadWolf
data
What is it? mode data
l”
Population@[2018]
60,000
Counter Abuse Technology Attorney / Client Privileged and Confidential 15
BadWolf BW extends the predicate semantics:
Immutable predicates
Shrewsbury
Immutable predicates
Temporal predicates
Shrewsbury
Only true at some point in time
eg: Shrewsbury has a population (in 2018) of 60,000
Population@[2018] BW predicate: “population”@[2018]
Immutable predicates
Name from
Temporal predicates
BQL (BadWolf Query The storage interface is implemented Storage and linking of
Language) separately arbitrary objects in a directed
eg: Spanner, RAM memory graph
- “SQL for graphs”
- Similar to SPARQL
Born in
BW nodes format: /TYPE<ID>
60,000
Born in
BW nodes format: /TYPE<ID>
BQL Query
“parent_of”@[]
/u<mary> /u<peter>
/u<john>
“parent_of”@[]
BQL Query
/u<eve>
SELECT ?grandparent, ?grandchild Triples of the ?family graph:
FROM ?family
WHERE { /u<joe> "parent_of"@[] /u<mary>
?grandparent "parent_of"@[] ?x . /u<joe> "parent_of"@[] /u<peter>
?x "parent_of"@[] ?grandchild /u<peter> "parent_of"@[] /u<john>
}; /u<peter> "parent_of"@[] /u<eve>
“parent_of”@[]
/u<mary> /u<peter>
/u<john>
“parent_of”@[]
BQL Query
/u<eve>
SELECT ?grandparent, ?grandchild Triples of the ?family graph:
FROM ?family
WHERE { /u<joe> "parent_of"@[] /u<mary>
?grandparent "parent_of"@[] ?x . /u<joe> "parent_of"@[] /u<peter>
?x "parent_of"@[] ?grandchild /u<peter> "parent_of"@[] /u<john>
}; /u<peter> "parent_of"@[] /u<eve>
Result:
?grandparent ?grandchild
/u<joe> /u<john>
/u<joe> /u<eve>
Project
a.
Optimization
Implement a FILTER
keyword
(from lexer/grammar
to planner and
storage/driver levels)
a. b.
Optimization Profiling
a. b. c.
Optimization Profiling Tracing
a. b. c. d.
Optimization Profiling Tracing Usability
Implement a FILTER Add support for pprof Make the BadWolf Make BQL more
keyword profiling in BadWolf tracer more easily compliant with W3C
extendable, improve recommendations,
(from lexer/grammar (memory and CPU
debugability, assure improve reliability,
to planner and profiles activated
coverage, add verbosity make it more intuitive,
storage/driver levels) through the BW CLI)
levels support a more
complete HAVING
clause
Project
heavy queries
clients
a. b. c. d.
Optimization Profiling Tracing Usability
Implement a FILTER Add support for pprof Make the BadWolf Make BQL more
keyword profiling in BadWolf tracer more easily compliant with W3C
extendable, improve recommendations,
(from lexer/grammar (memory and CPU
debugability, assure improve reliability,
to planner and profiles activated
coverage, add verbosity make it more intuitive,
storage/driver levels) through the BW CLI)
levels support a more
complete HAVING
clause
● Why?
Allow the user to customize, in a level closer to storage/driver, the data they want
to retrieve - improving performance
a. /u<peter>
/u<peter>
“bought”@[1901]
“bought”@[1902]
/gift<model 1>
/gift<model 2>
Optimization /u<peter>
…
“bought”@[1903] /gift<model 3>
FILTER keyword example How to get only the last triple of the time series?
Without FILTER:
Result:
?pred ?obj
“bought”@[2000] /gift<model 2000>
a. /u<peter>
/u<peter>
“bought”@[1901]
“bought”@[1902]
/gift<model 1>
/gift<model 2>
Optimization /u<peter>
…
“bought”@[1903] /gift<model 3>
FILTER keyword example How to get only the last triple of the time series?
Result: Result:
?pred ?obj ?pred ?obj
“bought”@[2000] /gift<model 2000> “bought”@[2000] /gift<model 2000>
● IsImmutable
Allows only Immutable predicates for the given binding
● IsTemporal
Allows only Temporal predicates for the given binding
● Why?
Transparency on memory and CPU metrics
Helpful to identify bottlenecks and see what is happening behind the drapes
● How?
Through the BadWolf CLI
CPUProfileRate: 1000 Hz
● Trace more key metadata to make it more helpful in the occasion of production issues
● Have control over its verbosity to customize the output depending on the use case and also
to not overcharge the server without need
Also, optional trailing dot for the last clause inside WHERE (W3C)
Demo
Wrap-up
Opportunity to work on Touch multiple levels of the Participate on technical Learn a lot: Go,
very different implementation of the query and non-technical SPARQL, project
dimensions of BadWolf language and the BW discussions about the development, design,
processing flow future of BadWolf agile
1. Query Optimization
2. Profiling ● Lexer ● Design docs
In spite of not having
3. Tracing ● Grammar ● One pagers
access to g3docs,
4. Usability ● Semantic / Hooks ● GitHub Issues
google3 and code search
● Planner
● Storage / Driver
Opportunity to work on Touch multiple levels of the Participate on technical Learn a lot: Go,
very different implementation of the query and non-technical SPARQL, project
dimensions of BadWolf language and the BW discussions about the development, design,
processing flow future of BadWolf agile
1. Query Optimization
2. Profiling ● Lexer ● Design docs
In spite of not having
3. Tracing ● Grammar ● One pagers
access to g3docs,
4. Usability ● Semantic / Hooks ● GitHub Issues
google3 and code search
● Planner
● Storage / Driver
● GitHub - BadWolf
https://github.com/google/badwolf