You are on page 1of 95

Week 2: Simplify Business Document Processing

Unit 1: Extracting and Enriching Information from


Unstructured Documents
Extracting and Enriching Information from Unstructured Documents
Business document processing

Business Documents

550B
Estimated invoice document
volume in 2019

100B
Estimated delivery note
volume in 2018

11B
Paper receipts printed in
the UK alone per year

124B
Business emails received
and sent per day

60%
Sales orders come in via
unstructured documents

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Extracting and Enriching Information from Unstructured Documents
Business document processing

SAP AI Business Services


Business Documents
Business Document Processing
550B
Estimated invoice document
volume in 2019

100B Extract Enrich Embed


Estimated delivery note
volume in 2018

11B
Paper receipts printed in
the UK alone per year

124B
Business emails received
and sent per day Transform unstructured documents into
structured information with machine learning-
60% based document processing and embed the
Sales orders come in via
unstructured documents information into your business processes for
instant value.
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3
Extracting and Enriching Information from Unstructured Documents
Business document processing

SAP AI Business Services


Business Documents Instant Value
Business Document Processing
550B
Estimated invoice document Automation
volume in 2019

100B Extract Enrich Embed


Estimated delivery note
volume in 2018
Accuracy
11B
Paper receipts printed in
the UK alone per year

124B
Business emails received Speed
and sent per day Transform unstructured documents into
structured information with machine learning-
60% based document processing and embed the
Sales orders come in via
unstructured documents information into your business processes for Cost
instant value.
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4
62% of accounts payable costs come from labor.
Source: APQC, 2015 “AP Process Cost Per Invoice Processed”

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Extracting and Enriching Information from Unstructured Documents
Problem: large amounts of incoming business documents

Enrichment
happens
Company Time2Go Company
manually

Supplier ID 009753

Invoice N° 123456

Amount 3,688.00

Currency USD

Date 2013-11-01
John
Amount: 1400
Unit: PC spends much time
Position 1 on searching and
Amount: 1684.00
Description: Apples matching the right
John, information
Clerk Amount: 120
Unit: KG
Position 2
Amount: 2000.00
Description: Nuts

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Extracting and Enriching Information from Unstructured Documents
Solution: Document Information Extraction

Enrichment
happens
Company Time2Go Company
manually

Supplier ID 009753

Invoice N° 123456

Amount 3684.00

Currency USD

Date 2013-11-01
John
Amount: 1400
Unit: PC can now do more
Position 1 value-creating tasks
Amount: 1684.00
Description: Apples
John,
Clerk Amount: 120
Unit: KG
Position 2
Amount: 2000.00
Description: Nuts

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


Extracting and Enriching Information from Unstructured Documents
Accounts payable vendor invoice extraction

Large amount Document Quick return to


of invoices Information relevant
Extraction process

Manual
information
extraction = long Extraction Higher employee
processing time Enrichment efficiency

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Extracting and Enriching Information from Unstructured Documents
Intelligent order processing

Large amount Document Greater


of orders Information automation of
Extraction the sales order
entry process

Manual
information
extraction = long Extraction Higher employee
processing time Enrichment efficiency

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Extracting and Enriching Information from Unstructured Documents
Proof of purchase

Company Document Faster


offering cash Information response times
back Extraction

Extraction
of customer
invoices
For claiming
insurance: need Enrichment via
to provide a matching with Greater
proof of existing product attraction for
purchase catalog more customers

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Extracting and Enriching Information from Unstructured Documents
System demo – Process overview

Customer

Business
Send order confirmation Process
Customer Owner
Excel

Excel Intelligent RPA

Send order Readable Under threshold

PDF Not Over


Cloud DOX Service readable Business Cloud
Integration threshold Integration
Rules
Create
Other Missing data order

Correction Workflow Approval Workflow Conversational


formats Order declined AI
Open SAP CC
Connectors Configuration

Order creation /
Query pending tasks
Slack

Cloud
Business Connector
User
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11
Extracting and Enriching Information from Unstructured Documents
System demo

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12


Extracting and Enriching Information from Unstructured Documents
What Document Information Extraction can do for your business

Automation Accuracy Speed Cost

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2020 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.
Week 2: Simplify Business Document Processing
Unit 2: Technical Overview of Document Information
Extraction
Technical Overview of Document Information Extraction
Business case

Large amount Document Quick return to


of documents Information relevant
Extraction process

Manual
information
extraction = long Extraction Higher employee
processing time Enrichment efficiency

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Technical Overview of Document Information Extraction
Architecture overview

Firewall
INTERNET
SAP Applications

Cloud Foundry Environment


or Business Suite
AI Business Services
End User
HTTPS trust
SAP Customer
Experience
OAuth 2.0
service Microservices
and more …

Training … …
REST
Application
REST Manage
Client … …
Models
Non-SAP
Applications … … Inference

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Technical Overview of Document Information Extraction
Architecture

Firewall
INTERNET
SAP Applications

Cloud Foundry Environment


or Business Suite
AI Business Services
End User
HTTPS trust
SAP Customer
Experience
OAuth 2.0
service Document Information Extraction
and more …
Training Master Data Client

REST Service Enrichment Management
Application
REST Template-
Client Inference Performance
Based …
Service Feedback Extraction
Non-SAP
Applications

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Technical Overview of Document Information Extraction
Model training

Machine Learning Model Upload to SAP Cloud Platform


▪ Pre-trained model for invoices ▪ Master data enrichment

Business entity
master data

Company code
master data

▪ Support Employee
master data
Header fields • Invoice number Item fields
• Sender information • PO number • Description
• Receiver information • Employee name • Deductions ▪ Re-trainable on other document types
• Document number • Tax ID • Amounts
• Document date • Taxes • Quantity − Minimum of 5K annotated documents
• Document amount • Currency • Material no.
• Unit price − Restrictions: document type, bounding boxes,
▪ Language rotation, label noise
− ISO-8859-1 – Latin 1 character-encoding scripts
▪ Template based-extraction
− Englisch, German, French, Italian, Dutch
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5
Technical Overview of Document Information Extraction
Model training

Machine Learning Model Upload to SAP Cloud Platform


▪ Pre-trained model for invoices ▪ Master data enrichment
▪ Re-trainable on other document types
▪ Template-based extraction

Template

▪ Support
Header fields • Invoice number Item fields
• Sender information • PO number • Description
• Receiver information • Employee name • Deductions
• Document number • Tax ID • Amounts
• Document date • Taxes • Quantity
• Document amount • Currency • Material no. Documents Extraction
• Unit price
▪ Language
− ISO-8859-1 – Latin 1 character-encoding scripts
− Englisch, German, French, Italian, Dutch
© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6
Technical Overview of Document Information Extraction
Data enrichment demo

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


Technical Overview of Document Information Extraction
Inference call

1 Data Input: Upload PDF document 2 Predict Header and Item Fields
Class Prediction Class Prediction
Vendor Name ABC Communications Description Professional Services –
Labor Only
Vendor Address 3451 NE Willoughby Blvd.
Sturt, FL 5494 U.S.A Total Gross 220.00
Invoice No. 174221 Quantity 1.00
Date 2016-02-18 Currency Code USD

2.1 Normalize Texts & Map Fields


Extracted text Class Prediction Conf
abccommunications Vendor Name ABC Communications 0,99
3451newilloughbyblvd Vendor Address 3451 NE Willoughby Blvd. 0,95
sturtfl5494usa Sturt, FL 5494 U.S.A
… … … …

2.2 Enrich Master Data


Vendor Conf Vendor ID

Class Prediction ABC Communications 98% 470


Vendor Name ABC Communications ABC Services 67% 168
DEF Communications 35% 440

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Technical Overview of Document Information Extraction
Inference call demo

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Technical Overview of Document Information Extraction
Summary

Pre-trained Model Train Model Prediction


▪ Global model optimized ▪ Enrich pre-trained ▪ Prediction of header
for invoice documents model with master and item fields with
▪ Language restrictions data information a given probability
▪ Re-training ▪ Master data
possibility for other matching with a
documents given probability

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2020 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.
Week 2: Simplify Business Document Processing
Unit 3: Classifying Unstructured Documents with AI
Classifying Unstructured Documents with AI
Recap: efforts in business document processing

550B 100B 11B


Estimated invoice Estimated delivery Paper receipts printed
document volume in 2019 note volume in 2018 in the UK alone per year

124B 60%
Business emails received Sales orders come in via
and sent per day unstructured documents

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Classifying Unstructured Documents with AI
Document Classification

Classification criteria for


Train a classification
customer-specific models Automatically
model based on
classify new and
already classified
existing documents
Business documents (labeled) documents

Reduce manual effort and errors for the classification of business documents as well as speed up
document processing overall by channeling documents based on their type

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Classifying Unstructured Documents with AI
Classification possibilities
Classification
BINARY MULTI-CLASS

1
Is this an invoice?
SINGLE

Yes, this is an No, this is no


Characteristics

invoice invoice
TWO

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Classifying Unstructured Documents with AI
Classification possibilities
Classification
BINARY MULTI-CLASS

1 2
What language is What languages does
Is this an invoice? this? this contain?
SINGLE

English German English German


Yes, this is an No, this is no
Characteristics

invoice invoice French French


mutual exclusivity! multi-labeling possible!
TWO

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Classifying Unstructured Documents with AI
Classification possibilities
Classification
BINARY MULTI-CLASS

1 2
What language is What languages does
Is this an invoice? this? this contain?
SINGLE

English German English German


Yes, this is an No, this is no
Characteristics

invoice invoice French French


mutual exclusivity! multi-labeling possible!

3
Is this an invoice? Are there images?
TWO

Yes, this No, this Yes, there No, there


is an is no are are none
invoice invoice

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Classifying Unstructured Documents with AI
Classification possibilities
Classification
BINARY MULTI-CLASS

1 2
What language is What languages does
Is this an invoice? this? this contain?
SINGLE

English German English German


Yes, this is an No, this is no
Characteristics

invoice invoice French French


mutual exclusivity! multi-labeling possible!

3 4
Is this an invoice? Are there images? What language is What document type
this? is this?
TWO

Yes, this No, this Yes, there No, there


is an is no are are none English German Invoice Sales order
invoice invoice Dunning
French letter

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


Classifying Unstructured Documents with AI
Classification possibilities

Classification
BINARY MULTI-CLASS
5
Characteristics

Is this an invoice? What language is


this?
TWO

This is an This is no English German


invoice invoice
French

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Classifying Unstructured Documents with AI
From chaos to order

Chaotic Document Responsible Documents are


mailroom Classification department processed
faster

Too many unread


& unprocessed Organization via Classified emails Higher employee
emails Classification Less effort efficiency

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Classifying Unstructured Documents with AI
Enterprise mail inbox

Chaotic Document Classified Faster


email inbox Classification documents document
processing

Invoices

Dunning letters

Sales orders

Large amount of
different Identification of
incoming critical Higher employee
documents documents efficiency

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Classifying Unstructured Documents with AI
Document filtering

Legal Document Classified Quicker


department Classification documents identification &
according
processing
Contract with
certain clause

Contract without
certain clause

No missing out
Large pool of Detection of on important
similar contracts critical clauses contracts

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11


Classifying Unstructured Documents with AI
What Document Classification can do for your business

Automation Accuracy Speed Cost

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2020 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.
Week 2: Simplify Business Document Processing
Unit 4: Technical Overview of Document
Classification
Technical Overview of Document Classification
Business case

Chaotic Document Classified Faster


mail inbox Classification documents document
processing

Document Type 1

Document Type 2

Document Type X

Large amount of
different Identification of
incoming critical Higher employee
documents documents efficiency

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Technical Overview of Document Classification
Architecture overall

Firewall
INTERNET
SAP Applications

Cloud Foundry Environment


or Business Suite
AI Business Services
End User
HTTPS trust
SAP Customer
Experience
OAuth 2.0
service Microservices
and more …

Training … …
REST
Application
REST Manage
Client … …
Models
Non-SAP
Applications … … Inference

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Technical Overview of Document Classification
Architecture

Firewall
INTERNET
SAP Applications

Cloud Foundry Environment


or Business Suite
AI Business Services
End User
HTTPS trust
SAP Customer
Experience
OAuth 2.0
service Document Classification
and more …
Upload Dataset Document

REST Service Upload Management
Application
REST Training Custom- Multiple
Client …
Service Specific Model Deployments
Non-SAP Calculate
Inference
Applications Classification Confidence …
Service Scores

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Technical Overview of Document Classification
Data upload

Training File Requirements


▪ Documents as PDF file
− single document < 25MB
− Single document < 100 pages
▪ Characteristics as JSON file
▪ Supported Languages:
− EN, DE, FR, ES, PT, RU
− JA, KO, ZH

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Technical Overview of Document Classification
Data upload

Training File Requirements Classification Scenarios


▪ Documents as PDF file ▪ Single Characteristic
− single document < 25MB
Binary Classification Multi-Class Classification Multi-Label Classification
− Single document < 100 pages
A A A,B
▪ Characteristics as JSON file
B A,C,D
▪ Supported Languages:
− EN, DE, FR, ES, PT, RU
B … …
− JA, KO, ZH
▪ Multiple Characteristics

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Technical Overview of Document Classification
Data upload

Training File Requirements Upload to SAP Cloud Platform


▪ Documents as PDF file
1. Create dataset
− single document < 25MB
− Single document < 100 pages
2. Upload documents + characteristics
▪ Characteristics as JSON file
▪ Supported Languages:
− EN, DE, FR, ES, PT, RU
− JA, KO, ZH

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


Technical Overview of Document Classification
Model training

Data Model /
Training
Training Data

Model

Definition

▪ Select dataset

▪ Specify model name

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Technical Overview of Document Classification
Model training

Data Model /
Training
Training Data

Model

Definition Training

▪ Select dataset

▪ Specify model name


Trained model
version x

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Technical Overview of Document Classification
Model training

Data Model /
Training
Training Data

Model

▪ Accuracy
▪ Precision
▪ Recall
▪ Version

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Technical Overview of Document Classification
Model training

Data Model /
Training
Training Data

Model
Performance Feedback*
▪ Accuracy
Predicted Predicted ▪ Precision
Positive Negative
▪ Recall
Actual
Positive
TP
(True Positives)
FN
(False Negatives) ▪ Version

Actual FP TN
Negative (False Positives) (True Negatives)

*simplified

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11


Technical Overview of Document Classification
Model training

Data Model /
Training
Training Data

Model
Performance Feedback*

𝑇𝑃 ▪ Accuracy
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
Predicted Predicted 𝑇𝑃 + 𝐹𝑃 ▪ Precision
Positive Negative

𝑇𝑃 ▪ Recall
Actual TP FN 𝑅𝑒𝑐𝑎𝑙𝑙 = ▪ Version
Positive (True Positives) (False Negatives) 𝑇𝑃 + 𝐹𝑁

Actual FP TN
Negative (False Positives) (True Negatives)

*simplified

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12


Technical Overview of Document Classification
Model training

Data Model / Model


Training
Training Data Deployment

Model

Activate trained
model for
prediction

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13


Technical Overview of Document Classification
Model training demo

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14


Technical Overview of Document Classification
Inference call

0 Model Input: Multi-Class Classification

Invoices
Dunning Letters
Sales Orders

1 Data Input: Upload PDF document

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 15


Technical Overview of Document Classification
Inference call

0 Model Input: Multi-Class Classification 0 Trained Model


Invoice Dunning Letter Sales Order
invoice dunning salesorder
Invoices payment overdue payment
Dunning Letters tax reminder shipment
Sales Orders … … …

1 Data Input: Upload PDF document

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 16


Technical Overview of Document Classification
Inference call

0 Model Input: Multi-Class Classification 0 Trained Model


Invoice Dunning Letter Sales Order
invoice dunning salesorder
Invoices payment overdue payment
Dunning Letters tax reminder shipment
Sales Orders … … …

1 Data Input: Upload PDF document


2 Extract the Information from the Document
Extracted text Extracted text Extracted text
abccommunications invoice Itemtotal22000
3451newilloughbyblvd number174221 salestax000
sturtfl5494usa 2182016 totalamountdue22000
… … …

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 17


Technical Overview of Document Classification
Inference call

0 Model Input: Multi-Class Classification 0 Trained Model


Invoice Dunning Letter Sales Order
invoice dunning salesorder
Invoices payment overdue payment
Dunning Letters tax reminder shipment
Sales Orders … … …

1 Data Input: Upload PDF document


2 Extract the Information from the Document
Extracted text Extracted text Extracted text
abccommunications invoice Itemtotal22000
3451newilloughbyblvd number174221 salestax000
sturtfl5494usa 2182016 totalamountdue22000
… … …

3 Classify Document
Class Conf
Invoice 0.85
Dunning Letter 0.10
Sales Order 0.05

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 18


Technical Overview of Document Classification
Inference call demo

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 19


Technical Overview of Document Classification
Summary

Upload Data Train Model Deploy Model Prediction


▪ Create a dataset to ▪ Train classification ▪ Activate trained ▪ Provide data for
store data model model version classification
▪ Upload documents ▪ Create model ▪ Deactivate model ▪ Get classification
with fitting labels version version when results + confidence
▪ Get feedback on necessary scores
model performance

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 20


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2020 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.
Week 2: Simplify Business Document Processing
Unit 5: Locating and Classifying Named Entities in
Unstructured Text
Locating and Classifying Named Entities in Unstructured Text
Moments of truth

The average office worker Employees spend 20 mins on


receives 121 emails and average to complete one
sends about 40 each day. expense report.

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Locating and Classifying Named Entities in Unstructured Text
Problem: long time to understand the context of emails and processing them

Subject: Question to my order


From: john.travolta@mail.com
To: customerservice@shop.com 5-15 minutes
per ticket
Dear Customer Agent,
Create customer
I placed an order on 03rd March 2020
and still have not received it. Could ticket
you please tell me when it will arrive?
My customer number is 123456 and
my order number is 321456.
Thank you & best regards,
John Bob
spends much time on
understanding the
email context and
providing the answer
to the request
Research and
processing in
Bob,
different systems
Customer Service

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Locating and Classifying Named Entities in Unstructured Text
Solution: Business Entity Recognition

Customer # 123456
Subject: Question to my order Business
From: john.travolta@mail.com Entity Order # 321456
Recognition
To: customerservice@shop.com Date 03.03.2020

Dear Customer Agent,

I placed an order on 03rd March 2020


and still have not received it. Could
you please tell me when it will arrive? Possibility for
My customer number is 123456 and automated
my order number is 321456. research &
Create
Thank you & best regards, customer processing in
various
John ticket systems

Bob
can now provide a faster
response and use his time
for value-creating tasks
Bob,
Customer Service

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Locating and Classifying Named Entities in Unstructured Text
Lead creation from business cards

New business Optical Business Easy lead


cards character Entity recognition
recognition Recognition

Scan with
mobile phone
application

Faster
recognition of
Many business important
cards collected Recognition of contacts
at trade fairs Recognition of predefined Less manual
etc. text entities work

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Locating and Classifying Named Entities in Unstructured Text
Travel expenses

Travel receipts Optical Business Easy to


character Entity maintain travel
recognition Recognition expenses

Scan with
mobile phone
application

Many invoices
Recognition of Fewer errors
collected on
business travels Recognition of predefined Less manual
etc. text entities work

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Locating and Classifying Named Entities in Unstructured Text
Shared Services Framework

Shared Service Framework

Incoming Service Ticket Business Entity Responsible Solution


message Intelligence Recognition department provided
quickly

or
Email
Service
Agents

IRPA
Ticket Bot

Processing Recognition of
predefined
Classification Further
Including free entities
communication Higher service
text
with sender quality
Using synergies of various SAP AI Business
Services & applications

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


Locating and Classifying Named Entities in Unstructured Text
System demo

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Locating and Classifying Named Entities in Unstructured Text
What Business Entity Recognition can do for your business

Automation Quality Speed Cost

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2020 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.
Week 2: Simplify Business Document Processing
Unit 6: Technical Overview of Business Entity
Recognition
Technical Overview of Business Entity Recognition
Business case

Travel receipts Optical Business Easy to


Character Entity maintain travel
Recognition Recognition expenses

Scan with
mobile phone
application

Many collected
Recognition of Less errors
invoices on
business travels Recognition of pre-defined Less manual
etc. text entities work

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Technical Overview of Business Entity Recognition
Overall architecture

Firewall
INTERNET
SAP Applications

Cloud Foundry Environment


or Business Suite
AI Business Services
End User
HTTPS trust
SAP Customer
Experience
OAuth 2.0
service Microservices
and more …

Training … …
REST
Application
REST Manage
Client … …
Models
Non-SAP
Applications … … Inference

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Technical Overview of Business Entity Recognition
Architecture

Firewall
INTERNET
SAP Applications

Cloud Foundry Environment


or Business Suite
AI Business Services
End User
HTTPS trust

OAuth 2.0
and more … service Business Entity Recognition

Upload Document Pre-trained



REST Service Storage Models
Application
REST Training Performance Multiple Model
client …
Service Feedback Deployment
Non-SAP Inference Extract Calculate
Applications Entities …
Service Probabilities

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Technical Overview of Business Entity Recognition
Data upload pre-trained entities

Training File Requirements


1. Pre-trained model
▪ Email Business Entity
− Entities: Vendor ID, Invoice number
− Supported languages: DE, EN
▪ Invoice Header
− Entities:
• Sender address • Payment terms • Tax rate
• Sender name • PO reference • Tax ID
• Order date • Total amount • Bank account
• Note number • Description
− Supported languages: DE, EN, FR
2. New training model

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Technical Overview of Business Entity Recognition
Data upload pre-trained entities

Training File Requirements Upload to SAP Cloud Platform


1. Pre-trained model ▪ No upload & training needed
▪ Email Business Entity
− Entities: Vendor ID, Invoice number
− Supported languages: DE, EN
▪ Invoice Header Schema Model Deployment
− Entities:
• Sender address • Payment terms • Tax rate
• Sender name • PO reference • Tax ID
• Order date • Total amount • Bank account
• Note number • Description
− Supported languages: DE, EN, FR
2. New training model

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Technical Overview of Business Entity Recognition
Data upload new entities

Training File Requirements


1. Pre-trained model
2. New training model
▪ No language restrictions
▪ Textual inputs only
▪ Pre-processing necessary
Annotation Preparation (ID + Text + Label)

Class 1: Text
Class 2: Text
Class n: Text

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


Technical Overview of Business Entity Recognition
Data upload new entities

Training File Requirements Annotation Requirement


1. Pre-trained model
Hi Bob,
2. New training model
Can you please
▪ No language restrictions check status of
invoice 12345678
▪ Textual inputs only from 2nd of March?
▪ Pre-processing necessary Thanks, Class Input
John Invoice No 12345678
Annotation Preparation (ID + Text + Label)
SAP SE Date 03.02.
Customer SAP SE
Class 1: Text
Class 2: Text
Class n: Text
{“id”:1,
“text”: “Hi Bob, Can you please<…>”,
”labels”: [ [47,55, “Invoice No”],
[61,73, ”Date”],
[88,94, ”Customer”] ] }

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Technical Overview of Business Entity Recognition
Data upload new entities

Training File Requirements Upload to SAP Cloud Platform


1. Pre-trained model 1. Create dataset
2. New training model
▪ No language restrictions 2. Upload documents with annotation
▪ Textual inputs only
▪ Pre-processing necessary
Annotation Preparation (ID + Text + Label)

Class 1: Text
Class 2: Text
Class n: Text

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Technical Overview of Business Entity Recognition
Model training

Data Model /
Training
Training Data

Model

Definition Model Naming


▪ Select dataset

▪ Not the same name as


pre-trained model
▪ No “sap_” as prefix
▪ Specify model name
▪ Starts with
alphanumerical character
▪ Maximum of 64
characters

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Technical Overview of Business Entity Recognition
Model training

Data Model /
Training
Training Data

Model

Definition Training
▪ Select dataset

▪ Specify model name

Text […] check status of invoice 12345678 from 03/02/2020 […]

Label […] 000000000000000000000000111111110000002222222222 […]

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11


Technical Overview of Business Entity Recognition
Model training

Data Model /
Training
Training Data

Model

▪ Capabilities
▪ Removed
Labels
▪ Test
Accuracy

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12


Technical Overview of Business Entity Recognition
Model training

Data Model / Model


Training
Training Data Deployment

Model

Activate trained
model for prediction

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13


Technical Overview of Business Entity Recognition
Model training demo

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14


Technical Overview of Business Entity Recognition
Inference call

0 Model Input: sap_invoice_header {“text”: “Hello, I would like to know <…>”,


”modelName”:“sap_invoice_header”
“version”: 1}
Entities
• Sender address • Payment terms • Tax rate
• Sender name • PO reference • Tax ID
• Order date • Total amount • Bank account
• Note number • Description 2 Example Inference Input*

1 Inference Input: Upload text payload Hello,


I would like to know the status of purchase order 12345678 of vendor ABC.
Regards, John
Context around digit: • digit
• purchase order • length 8
Hello, • vendor
• status
I would like to know the status of • abc
purchase order 12345678 of
vendor ABC. PO reference: 12345678
Regards, John Confidence: 0.95 *simplified

3 Data Output: JSON file containing entities & confidences


Entity Value Confidence

PO reference 12345678 0.95

… … …

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 15


Technical Overview of Business Entity Recognition
Inference call demo

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 16


Technical Overview of Business Entity Recognition
Summary

Upload Data Train Model Deploy Model Prediction


▪ Use pre-trained models for ▪ Train new entity ▪ Activate trained model ▪ Provide data for entity
specific business problems models version recognition
▪ Upload own entities with ▪ Create model version ▪ Deactivate model ▪ Get machine learning
annotations to train a ▪ Get feedback on model version when results + confidence
custom-specific model performance necessary scores

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 17


Technical Overview of Business Entity Recognition
Summary

Upload Data Train Model Deploy Model Prediction


▪ Use pre-trained models for ▪ Train new entity ▪ Activate trained model ▪ Provide data for entity
specific business problems models version recognition
▪ Upload own entities with ▪ Create model version ▪ Deactivate model ▪ Get machine learning
annotations to train a ▪ Get feedback on model version when results + confidence
custom-specific model performance necessary scores

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 18


Good luck for
the weekly
assignment!

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 19


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2020 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.

You might also like