Professional Documents
Culture Documents
openSAP Sapai1 Week 2 All Slides
openSAP Sapai1 Week 2 All Slides
Business Documents
550B
Estimated invoice document
volume in 2019
100B
Estimated delivery note
volume in 2018
11B
Paper receipts printed in
the UK alone per year
124B
Business emails received
and sent per day
60%
Sales orders come in via
unstructured documents
11B
Paper receipts printed in
the UK alone per year
124B
Business emails received
and sent per day Transform unstructured documents into
structured information with machine learning-
60% based document processing and embed the
Sales orders come in via
unstructured documents information into your business processes for
instant value.
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3
Extracting and Enriching Information from Unstructured Documents
Business document processing
124B
Business emails received Speed
and sent per day Transform unstructured documents into
structured information with machine learning-
60% based document processing and embed the
Sales orders come in via
unstructured documents information into your business processes for Cost
instant value.
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4
62% of accounts payable costs come from labor.
Source: APQC, 2015 “AP Process Cost Per Invoice Processed”
Enrichment
happens
Company Time2Go Company
manually
Supplier ID 009753
Invoice N° 123456
Amount 3,688.00
Currency USD
Date 2013-11-01
John
Amount: 1400
Unit: PC spends much time
Position 1 on searching and
Amount: 1684.00
Description: Apples matching the right
John, information
Clerk Amount: 120
Unit: KG
Position 2
Amount: 2000.00
Description: Nuts
Enrichment
happens
Company Time2Go Company
manually
Supplier ID 009753
Invoice N° 123456
Amount 3684.00
Currency USD
Date 2013-11-01
John
Amount: 1400
Unit: PC can now do more
Position 1 value-creating tasks
Amount: 1684.00
Description: Apples
John,
Clerk Amount: 120
Unit: KG
Position 2
Amount: 2000.00
Description: Nuts
Manual
information
extraction = long Extraction Higher employee
processing time Enrichment efficiency
Manual
information
extraction = long Extraction Higher employee
processing time Enrichment efficiency
Extraction
of customer
invoices
For claiming
insurance: need Enrichment via
to provide a matching with Greater
proof of existing product attraction for
purchase catalog more customers
Customer
Business
Send order confirmation Process
Customer Owner
Excel
Order creation /
Query pending tasks
Slack
Cloud
Business Connector
User
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11
Extracting and Enriching Information from Unstructured Documents
System demo
open@sap.com
Follow all of SAP
www.sap.com/contactsap
Manual
information
extraction = long Extraction Higher employee
processing time Enrichment efficiency
Firewall
INTERNET
SAP Applications
Training … …
REST
Application
REST Manage
Client … …
Models
Non-SAP
Applications … … Inference
Firewall
INTERNET
SAP Applications
Business entity
master data
Company code
master data
▪ Support Employee
master data
Header fields • Invoice number Item fields
• Sender information • PO number • Description
• Receiver information • Employee name • Deductions ▪ Re-trainable on other document types
• Document number • Tax ID • Amounts
• Document date • Taxes • Quantity − Minimum of 5K annotated documents
• Document amount • Currency • Material no.
• Unit price − Restrictions: document type, bounding boxes,
▪ Language rotation, label noise
− ISO-8859-1 – Latin 1 character-encoding scripts
▪ Template based-extraction
− Englisch, German, French, Italian, Dutch
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5
Technical Overview of Document Information Extraction
Model training
Template
▪ Support
Header fields • Invoice number Item fields
• Sender information • PO number • Description
• Receiver information • Employee name • Deductions
• Document number • Tax ID • Amounts
• Document date • Taxes • Quantity
• Document amount • Currency • Material no. Documents Extraction
• Unit price
▪ Language
− ISO-8859-1 – Latin 1 character-encoding scripts
− Englisch, German, French, Italian, Dutch
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6
Technical Overview of Document Information Extraction
Data enrichment demo
1 Data Input: Upload PDF document 2 Predict Header and Item Fields
Class Prediction Class Prediction
Vendor Name ABC Communications Description Professional Services –
Labor Only
Vendor Address 3451 NE Willoughby Blvd.
Sturt, FL 5494 U.S.A Total Gross 220.00
Invoice No. 174221 Quantity 1.00
Date 2016-02-18 Currency Code USD
open@sap.com
Follow all of SAP
www.sap.com/contactsap
124B 60%
Business emails received Sales orders come in via
and sent per day unstructured documents
Reduce manual effort and errors for the classification of business documents as well as speed up
document processing overall by channeling documents based on their type
1
Is this an invoice?
SINGLE
invoice invoice
TWO
1 2
What language is What languages does
Is this an invoice? this? this contain?
SINGLE
1 2
What language is What languages does
Is this an invoice? this? this contain?
SINGLE
3
Is this an invoice? Are there images?
TWO
1 2
What language is What languages does
Is this an invoice? this? this contain?
SINGLE
3 4
Is this an invoice? Are there images? What language is What document type
this? is this?
TWO
Classification
BINARY MULTI-CLASS
5
Characteristics
Invoices
Dunning letters
Sales orders
Large amount of
different Identification of
incoming critical Higher employee
documents documents efficiency
Contract without
certain clause
No missing out
Large pool of Detection of on important
similar contracts critical clauses contracts
open@sap.com
Follow all of SAP
www.sap.com/contactsap
Document Type 1
Document Type 2
Document Type X
Large amount of
different Identification of
incoming critical Higher employee
documents documents efficiency
Firewall
INTERNET
SAP Applications
Training … …
REST
Application
REST Manage
Client … …
Models
Non-SAP
Applications … … Inference
Firewall
INTERNET
SAP Applications
Data Model /
Training
Training Data
Model
Definition
▪ Select dataset
Data Model /
Training
Training Data
Model
Definition Training
▪ Select dataset
Data Model /
Training
Training Data
Model
▪ Accuracy
▪ Precision
▪ Recall
▪ Version
Data Model /
Training
Training Data
Model
Performance Feedback*
▪ Accuracy
Predicted Predicted ▪ Precision
Positive Negative
▪ Recall
Actual
Positive
TP
(True Positives)
FN
(False Negatives) ▪ Version
Actual FP TN
Negative (False Positives) (True Negatives)
*simplified
Data Model /
Training
Training Data
Model
Performance Feedback*
𝑇𝑃 ▪ Accuracy
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
Predicted Predicted 𝑇𝑃 + 𝐹𝑃 ▪ Precision
Positive Negative
𝑇𝑃 ▪ Recall
Actual TP FN 𝑅𝑒𝑐𝑎𝑙𝑙 = ▪ Version
Positive (True Positives) (False Negatives) 𝑇𝑃 + 𝐹𝑁
Actual FP TN
Negative (False Positives) (True Negatives)
*simplified
Model
Activate trained
model for
prediction
Invoices
Dunning Letters
Sales Orders
3 Classify Document
Class Conf
Invoice 0.85
Dunning Letter 0.10
Sales Order 0.05
open@sap.com
Follow all of SAP
www.sap.com/contactsap
Customer # 123456
Subject: Question to my order Business
From: john.travolta@mail.com Entity Order # 321456
Recognition
To: customerservice@shop.com Date 03.03.2020
Bob
can now provide a faster
response and use his time
for value-creating tasks
Bob,
Customer Service
Scan with
mobile phone
application
Faster
recognition of
Many business important
cards collected Recognition of contacts
at trade fairs Recognition of predefined Less manual
etc. text entities work
Scan with
mobile phone
application
Many invoices
Recognition of Fewer errors
collected on
business travels Recognition of predefined Less manual
etc. text entities work
or
Email
Service
Agents
IRPA
Ticket Bot
Processing Recognition of
predefined
Classification Further
Including free entities
communication Higher service
text
with sender quality
Using synergies of various SAP AI Business
Services & applications
open@sap.com
Follow all of SAP
www.sap.com/contactsap
Scan with
mobile phone
application
Many collected
Recognition of Less errors
invoices on
business travels Recognition of pre-defined Less manual
etc. text entities work
Firewall
INTERNET
SAP Applications
Training … …
REST
Application
REST Manage
Client … …
Models
Non-SAP
Applications … … Inference
Firewall
INTERNET
SAP Applications
OAuth 2.0
and more … service Business Entity Recognition
Class 1: Text
Class 2: Text
Class n: Text
Class 1: Text
Class 2: Text
Class n: Text
Data Model /
Training
Training Data
Model
Data Model /
Training
Training Data
Model
Definition Training
▪ Select dataset
Data Model /
Training
Training Data
Model
▪ Capabilities
▪ Removed
Labels
▪ Test
Accuracy
Model
Activate trained
model for prediction
… … …
open@sap.com
Follow all of SAP
www.sap.com/contactsap