You are on page 1of 39

2006 IBM Corporation

Jeff Jonas, Chief Scientist, IBM Entity Analytics


Blogging at www.JeffJonas.TypePad.com

Perpetual Analytics
Data finds the data Relevance finds the user

IBM
Entity
Analytic
Solutions
Business
Unit
or Product
Name

Worked with the gaming industry to help them better


understand who they were doing business with resulting in
the NORA (Non-Obvious Relationship Awareness)
technology

Funded by In-Q-Tel, the CIAs venture capital arm, in 2001

Brought in a professional management team in 2002

Acquired by IBM January 2005

Now the Chief Scientist of the IBM Entity Analytics

2006 IBM Corporation

Moved headquarters to Las Vegas in early 90s

Perpetual Analytics Data finds data and relevance finds the user

Founded Systems Research & Development (SRD) in 1983

My Background

IBM Entity Analytic Solutions

2006 IBM Corporation

no wonder
traditional
information
systems lack so
much
intelligence!

If a .6%
difference
matters this
much

Perpetual Analytics Data finds data and relevance finds the user

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

Persistent Context
is Fundamental to Perpetual Analytics

2006 IBM Corporation

Knowing What You Know and Having an Integrated Picture


is Fundamental to Real-time Understanding

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

! Same, similar, related, dissimilar

! Who, what, where, when

Domains of Context

IBM Entity Analytic Solutions

2006 IBM Corporation

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

The Role of Identity Resolution


Towards Perpetual Analytics.

! Same = Semantic Resolution

! Who = Identities (people and organizations)

Scope of this Presentation

IBM Entity Analytic Solutions

Events

Fraud
Database

Arrest

Internet
Prospect
Inquiry
Database
Partitioned

Sensors

Perpetual Analytics Data finds data and relevance finds the user

Record #B-9103

Randal Smith
DOB: 06/07/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

2006 IBM Corporation

FEATURES:
Mark Randal Smith
123 Main Street
713 731 5577
DOB 06/07/74

Unknowable

Identities

Identities, Events, Sensors and Observations

IBM Entity Analytic Solutions

Record #B-9103

Randal Smith
DOB: 06/17/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Perpetual Analytics Data finds data and relevance finds the user

The Query

Marc R Smith
123 Main St
713 730 5769

Observations

2006 IBM Corporation

Fraud
Database

Prospect
Database

Sensors

Consider the Query Against the Observables

IBM Entity Analytic Solutions

Record #B-9103

Randal Smith
DOB: 06/17/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

Perpetual Analytics Data finds data and relevance finds the user

The Query

Marc R Smith
123 Main St
713 730 5769

Discoverable

IBM Entity Analytic Solutions

2006 IBM Corporation

Fraud
Database

Prospect
Database

Sensors

Record #B-9103

Randal Smith
DOB: 06/17/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Perpetual Analytics Data finds data and relevance finds the user

The Query

Marc R Smith
123 Main St
713 730 5769

Observations

Other Observables are Undiscoverable

IBM Entity Analytic Solutions

2006 IBM Corporation

Fraud
Database

Prospect
Database

Sensors

Record #B-9103

Randal Smith
DOB: 06/07/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

Perpetual Analytics Data finds data and relevance finds the user

FEATURES:
Mark Randy Smith, Randal Smith
123 Main Street
713 731 5577
DOB 06/07/74

Reconstructed
Identities

2006 IBM Corporation

Fraud
Database

Prospect
Database

Sensors

Identity Resolution Reconstructs Context

IBM Entity Analytic Solutions

Record #B-9103

Randal Smith
DOB: 06/07/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

Perpetual Analytics Data finds data and relevance finds the user

Events:
Internet Inquiry
Arrest

Reconstructed
Identities

Feature and Event Reconstruction

IBM Entity Analytic Solutions

2006 IBM Corporation

Fraud
Database

Prospect
Database

Sensors

Mark

Record #B-9103

Randal Smith
DOB: 06/07/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

Perpetual Analytics Data finds data and relevance finds the user

FEATURES:
Mark Randy Smith, Randal Smith
123 Main Street
713 731 5577
DOB 06/07/74

Persistent
Context

2006 IBM Corporation

Fraud
Database

Prospect
Database

Sensors

Constructed Context is Persisted in a Database

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

FEATURES:
Mark Randal Smith
123 Main Street
713 731 5577
DOB 06/07/74

FEATURES:
Mark Randal Smith
123 Main Street
713 731 5577
DOB 06/07/74

Identity

Unknowable

Sensors

Knowable!

Reconstructed
Identity

Multi-Sensor Fusion Re-Constructs the Unknowable

IBM Entity Analytic Solutions

6 Observations

More

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

FEATURES:
Mark Randy Smith, Randal Smith, Randy Smith
123 Main Street, Flat 6 20 Lennox Gardens
713 731 5577, 796 064 03 04
DOB 06/07/74, Passport: 001003429002

More Observations

FEATURES:
Mark Randy Smith, Randal Smith
123 Main Street
713 731 5577
DOB 06/07/74

2 Observations

More Observations = Better Reconstruction

IBM Entity Analytic Solutions

Record #B-9103

Randal Smith
DOB: 06/17/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Perpetual Analytics Data finds data and relevance finds the user

Marc R Smith
123 Main St
713 730 5769

Queries

Now the Un-discoverable

IBM Entity Analytic Solutions

2006 IBM Corporation

Perpetual Analytics Data finds data and relevance finds the user

FEATURES:
Mark Randy Smith, Randal Smith
123 Main Street
713 731 5577
DOB 06/07/74

Persistent
Context

After Context Reconstruction

IBM Entity Analytic Solutions

2006 IBM Corporation

Record #B-9103

Randal Smith
DOB: 06/17/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

Persistent
Context

Perpetual Analytics Data finds data and relevance finds the user

FEATURES:
Mark Randy Smith, Randal Smith
123 Main Street
713 731 5577
DOB 06/07/74

Marc R Smith
123 Main St
713 730 5769

Queries

2006 IBM Corporation

Record #B-9103

Randal Smith
DOB: 06/17/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

Enables a New Paradigm in Real-time Discovery

IBM Entity Analytic Solutions

2006 IBM Corporation

Or, also could be data:


- A new investigation
- A background check
- A new account
- An address change
- Deceased persons

The query could be:


- A user with a question

Perpetual Analytics Data finds data and relevance finds the user

Marc R Smith
123 Main St
713 730 5769

Queries

Additionally All Data is First Treated as Query

IBM Entity Analytic Solutions

Persistent
Context

Perpetual Analytics Data finds data and relevance finds the user

Emile Swelter
San Francisco
12/03/72

Queries

2006 IBM Corporation

Record #B-9103

Randal Smith
DOB: 06/17/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

And Any Query can be Treated as Data

IBM Entity Analytic Solutions

Persistent
Context

Perpetual Analytics Data finds data and relevance finds the user

Emile Swelter
San Francisco
12/03/72

Queries

2006 IBM Corporation

Record #B-9103

Randal Smith
DOB: 06/17/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

In Which Case the Query can Stick (Persist)

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

Persistent
Context

Notable, Stick in the Same Data Space

IBM Entity Analytic Solutions

2006 IBM Corporation

Question answered
when it becomes true!

Persistent
Context

Perpetual Analytics Data finds data and relevance finds the user

Emile Swelter
San Francisco
12/03/72

Queries

2006 IBM Corporation

Emilee Swelter
321 Ovington Place
San Francisco
03/12/72

New Observation

Now, New Observations Answer Persistent Queries

IBM Entity Analytic Solutions

Mark

Record #B-9103

Randal Smith
DOB: 06/07/74
713 731 5577

Record #A-701

Mark Randy Smith


123 Main Street
713 731 5577

Observations

Perpetual Analytics Data finds data and relevance finds the user

FEATURES:
Mark Randy Smith, Randal Smith
123 Main Street
713 731 5577
DOB 06/07/74

Persistent
Context

This is Identity Resolution

IBM Entity Analytic Solutions

2006 IBM Corporation

Watch List
Database

Phone Book
Database

Sensors

2006 IBM Corporation

Handling this in
real-time, at scale
and in a
sustainable manner
is the hard part!

Semantically
reconciled
observations are
necessary to
understanding
context.

Perpetual Analytics Data finds data and relevance finds the user

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

and relevance finds the consumer.

The data finds the data

Perpetual Analytics: The Game Changer

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

If you do not process every


arriving piece of data first like
a query then you will not know
if you hold content that matters
until someone asks.

1st principal

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

If you do not assemble and


persist context on data streams
computational costs for
after-the-fact assembly are
unbearable.

2nd principal

IBM Entity Analytic Solutions

Received data is reconciled to historical holdings and persisted (versus context-on-thefly)

Processing of adds, changes and deletes from source systems

Processing new observations as a queries

Persisting queries as data (as selected and with expirations)

Data and Query Equality

Every row must retain its pedigree (no data survivorship processing)

2006 IBM Corporation

The database end-state is the same despite the arrival order or timing of the data

Perpetual Analytics Data finds data and relevance finds the user

New data corrects previous outcomes improving accuracy over time

! Sequence Neutrality

! Full Attribution

! Tethered to Source Systems

! Persistent Context

Perpetual Analytics Requires

IBM Entity Analytic Solutions

IBM Entity Analytic Solutions

Data Loading Over Time

Reload #12

Perpetual Analytics Data finds data and relevance finds the user

Drift

Reload #11

2006 IBM Corporation

(e.g., Analytics with


Sequence Neutrality)

Stable

(e.g., data warehousing


which requires periodic
reloads to handle data drift)

Unstable

Sequence Neutrality is Critical for Context Stability

Percent of Error

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

! The most complexity is caused by attempting Sequence Neutrality


at scale

! The greatest degree of discovery and intelligence comes from


analytics on data streams (versus batch processing)

! Federated (i.e., context-on-the-fly) matching/linkage architectures


cannot scale

No dependence on training data sets (initial or otherwise)


No merge and purge (data survivorship) processing
No in-memory full data set persistence

! Sustainability requires:

! Deterministic with real-time and self-correcting probabilistic


thresholding is essential to deliver the highest possible accuracy
and scalability

Semantic Reconciliation: 23 Years of Practical Experience

IBM Entity Analytic Solutions

Organizations

Certain discernable objects (e.g., boats, planes, etc.)

Perpetual Analytics Data finds data and relevance finds the user

Significant sequence neutrality processing at ingestion

2006 IBM Corporation

3B rows, 600M resolved entities, >2,000 contextualized observations per second

Sustainability

Scalability

People

! Streaming context of

Related identities (Relationship Resolution)

Same identities (Identity Resolution)

! Context

Currently Available Technology

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

resulting in a more secure and privacy-enhancing way


to deliver multi-party, large-scale perpetual analytic
systems

whereby context is assembled and persisted while the


data remains in a cryptographic form

A new technique that allows n data holders to share


anonymized identity-based observation data

Latest Advance: Anonymous Resolution

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

why would an organization share


information any other way?

If information can be shared in


an anonymized form whereby a
materially similar result can be
achieved

Analytics in the Anonymized Data Space The Future!

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

and Triathlons

Information Management
Privacy
National Security

2006 IBM Corporation

www.JeffJonas.TypePad.com

Blogging About All Of This At:

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

Integrated observations
which are contextually reconciled
in real-time
with sequence neutral learning
where data and hypotheses coexist
creating persistent awareness
toward new levels of active intelligence

The Next Big Leap

IBM Entity Analytic Solutions

2006 IBM Corporation

2006 IBM Corporation

Jeff Jonas, Chief Scientist, IBM Entity Analytics


Blogging at www.JeffJonas.TypePad.com

Perpetual Analytics
Data finds the data Relevance finds the user

IBM
Entity
Analytic
Solutions
Business
Unit
or Product
Name

IBM Entity Analytic Solutions

Perpetual Analytics Data finds data and relevance finds the user

Observations
2006 IBM Corporation

True Population

Complete
Reconstruction
Here!

Saturated Observations = Complete Reconstruction

Unique Objects

Perpetual Analytics Data finds data and relevance finds the user

2006 IBM Corporation

We have a long way to go towards intelligence on


streams before we must resort to off-line
processing

! Relevance of this notion:

! Dreaming deep re-contextualization

! Conscious streaming contextualization

In Contrast: Human Contextualization

IBM Entity Analytic Solutions

You might also like