The Next Evolution of Big Data Analytics Tim Estes

Automated Understanding

Founder and CEO

What do Alice & The USPTO have in common?

2

What is Understanding?
Awareness Reading Relating Comprehending Inference Interpretation Prediction Creation
3

Big Data? What about My data?
60-250 messages
300Kb Daily average size 0.521 Kb/s average rate

UNDERSTANDING = WHAT I DO WITH MY DATA

10 text messages
10Kb Daily average size 0.521 Kb/s average rate

30-50 Articles
50Kb Daily average size 0.521 Kb/s average rate

Daily

5 Hours
4826 kb 0.268 Kb/s

3 - 8 calls * 30 minutes
3350Kb Daily average size 0.521 Kb/s average rate

N/A
0.521 Kb/s average rate

10-30 minutes
372Kb Daily average size 0.521 Kb/s average rate

20-60 minutes
744Kb Daily average size 0.521 Kb/s average rate

It takes me 144,000 hours or 16.42 years of my life to just keep up with the data that I’m consuming

4

What do I do with all of this data?

{
People Place Time
5

OUr Data versus All Data

If your data equals approximately 2 seconds, All data would equal almost 22 million days or just over 59,000 Years
All Data = 1,887,438,800,000 Gigabytes Digital Text Created = 877,189,636.24 Gigabytes My Data = 1.68 Gigabytes 6

A change must come
We can no longer cope with understanding unstructured data manually in a Big Data world. We must tie technology that can scale horizontally to the function of understanding. In short, Understanding must become Automated.
7

Automated Understanding: it’s about the 80%

80%

Awareness Reading Relating

80%

Inference Interpretation Prediction Creation

20%

Comprehending

8

How do you automate understanding?
Inputs
Unstructured

Integrated Functions
•Doc Summarization •Associative Net •Co-Reference •Disambiguation •Link Analysis •Geo Reasoning •Temporal Reasoning •Fact Extraction •NLP

Outputs
•People understood in
space and time •Connections/relationships and their contexts •Data fusion from multiple sources & types •Links back to the source document if needed
9

Structured Social Multiple Languages

Deep Dive on automateD understanding

80%

Awareness Reading Relating

80%

•Content Ingest •Metadata Indexing

Rules
•NLP •(NER) •Doc Summarization

•Fact Extraction •Geo Reasoning •Temporal Reasoning •Link Analysis •Associative Net •Co-Reference •Disambiguation

Ontology

Comprehending

10

Synthesys Enterprise

How do YOu SCALE AutomateD Understanding?
SynthesysCloud
• • • • Synthesys UI/Gadgets Business Partners Application Developers New Market Solutions

REST API

REST API

Unstructured Structured Social Multiple Languages 11

AutomateD Understanding: Analytics with Benefits
Synthesys can do the initial heavy lifting in 2% of the time
(actually much less)

Giving you 80% of the time to be creative and productive Which means 18% more time back to us for more important things than reading data 12

Two Proof Cases
(That we can talk about and show in public)

13

Understanding alice: The Problem
The Hatter was the first to break the silence. `What day of the month is it?' he said, turning to Alice: he had taken his watch out of his pocket, and was looking at it uneasily, shaking it every now and then, and holding it to his ear. Alice considered a little, and then said `The fourth.' `Two days wrong!' sighed the Hatter. `I told you butter wouldn't suit the works!' he added looking angrily at the March Hare. `It was the best butter,' the March Hare meekly replied. `Yes, but some crumbs must have got in as well,' the Hatter grumbled: `you shouldn't have put it in with the bread-knife.' The March Hare took the watch and looked at it gloomily: then he dipped it into his cup of tea, and looked at it again: but he could think of nothing better to say than his first remark, `It was the best butter, you know.' Alice had been looking over his shoulder with some curiosity. `What a funny watch!' she remarked. `It tells the day of the month, and doesn't tell what o'clock it is!' `Why should it?' muttered the Hatter. `Does your watch tell you what year it is?' `Of course not,' Alice replied very readily: `but that's because it stays the same year for such a long time together.' `Which is just the case with mine,' said the Hatter. Alice felt dreadfully puzzled. The Hatter's remark seemed to have no sort of meaning in it, and yet it was certainly English. `I don't quite understand you,' she said, as politely as she could. `The Dormouse is asleep again,' said the Hatter, and he poured a little hot tea upon its nose. The Dormouse shook its head impatiently, and said, without opening its eyes, `Of course, of course; just what I was going to remark myself.' `Have you guessed the riddle yet?' the Hatter said, turning to Alice again. `No, I give it up,' Alice replied: `what's the answer?' `I haven't the slightest idea,' said the Hatter. `Nor I,' said the March Hare.

Typically, unstructured data technologies have been cursed by domain-specificity So let’s throw a really odd domain/distribution of words and see how it does....

From Chapter 7, “A Mad Tea Party”

14

Understanding alice: what we built
Pierogi
Predicted Data

Synthesys™

Prediction
Request models for live deployment

Peruna
Gold-standard

Models

Training

Human Annotation

Analyst

Every tagging round creates better model Creates better predictions Speeds up tagging Focuses effort on key classes of error 15

Understanding alice: the result

Less than 19 hours of human training time total NLP and NER scores comparable to academic best Whole new domain with no rules, linguistic understanding, or special code 16

Making Sense of Patents: Big Data, BIGGER Diversity

Diverse domain of knowledge - all types of patents from electrical and process to biomedical and mechanical. Analyzed the full text and claims all patents after 1976:
• • •

10,945,560 patents. 424,698 assignees 2,522,474 Inventors 17

Making Sense of Patents: The case for Entity Orientation
similarity of Every Patent Every Word Every Entity Every Inventor Every Company Every Technology
(since 1976)
Integrated key structured data into system (inventors, assignees, companies, etc.) Created and trained categories for extracting domain terms with no rules only tagging feedback and example words (lexicons). 18

Just the Beginning
https://www.synthesyscloud.com/

19

a change is here now
Automated Understanding is the next wave of Analytics. It deals with your data (vs. your machine’s data) and how you make decisions. It’s here now, but we’ve only scratched the surface of the value it can create It gives us hope to reclaim our lives from the abuse of attention and the constant worry of uncertainty.
20

a change is here now
Understanding secures

Understanding empowers Understanding creates Hope
21

Questions?
@dreasoning www.digitalreasoning.com Booth #507

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer: Get 4 months of Scribd and The New York Times for just $1.87 per week!

Master Your Semester with a Special Offer from Scribd & The New York Times