Welcome to Scribd!

Lesson 10 PDF Recap

Uploaded by

0% found this document useful (0 votes)

4 views5 pages

PDF documents can be categorized as either containing large chunks of text or specific elements, and different activities exist for extracting data from each type. These include Read PDF Text, Read PDF With OCR, and Screen Scraping. Anchors provide a reliable way to extract values despite structural changes using Anchor Base. Best practices include using Read PDF Text over OCR when possible and relying on Anchor Base for stability.

Original Description:

Original Title

Lesson 10 PDF Recap (1)

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

4 views5 pages

Lesson 10 PDF Recap

Uploaded by

Rakkammal Rama

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 5

Search inside document

LESSON 10 – PDF Automation - RECAP

Overview

In this lesson you have learnt about the types of PDF documents and the available methods for
extracting data from such files. We also looked into anchors, a way to deal with unstable
selectors.
Takeaways

You can place PDF activities into 2 categories: one for when processing large chunks of
text or whole documents and one for when focusing on specific text elements.

When looking to extract data from PDF, depending on your file you should choose
one of these 2 activities: Read PDF Text and Read PDF With OCR.

Both activities can run in the background.

Another method of grabbing blocks of text is the Screen Scraping tool.

When looking to extract a certain value from PDF files, you can also use Anchor
Base.
Best practices

Use Read PDF Text instead of Read PDF With OCR when possible since OCR is error
prone.

The Anchor Base method can be more reliable than the others since it can
handle major structural changes in the file.
Useful links

PDF Data Extraction

DIY Exercise 5-1 - Deploy and Secure An API
Document3 pages
DIY Exercise 5-1 - Deploy and Secure An API
Rakkammal Rama
No ratings yet
DIY Exercise 3-1 - Create An API Specification With RAML
Document2 pages
DIY Exercise 3-1 - Create An API Specification With RAML
Rakkammal Rama
No ratings yet
DIY Exercise 4-1 - Implement A REST API Using APIkit
Document4 pages
DIY Exercise 4-1 - Implement A REST API Using APIkit
Rakkammal Rama
No ratings yet
IT Services: Doc. Ref: 535209655.docx Status: Author Last Updated By: Last Updated On: Approved by
Document10 pages
IT Services: Doc. Ref: 535209655.docx Status: Author Last Updated By: Last Updated On: Approved by
Rakkammal Rama
No ratings yet
Extract References From PDF
Document2 pages
Extract References From PDF
Kelli
0% (2)
COBOL
From Everand
COBOL
Sumanta Soren
No ratings yet
LESSON 10 - PDF Automation - RECAP
Document5 pages
LESSON 10 - PDF Automation - RECAP
Rakkammal Rama
No ratings yet
Lesson 10 PDF Recap
Document5 pages
Lesson 10 PDF Recap
Rakkammal Rama
No ratings yet
LESSON 10 - PDF Automation - RECAP
Document5 pages
LESSON 10 - PDF Automation - RECAP
Mithun
No ratings yet
Blurred PDF
Document2 pages
Blurred PDF
mario
No ratings yet
Extration PDF
Document2 pages
Extration PDF
James
No ratings yet
Two Diet Plans For Fat PDF: Thomas A. Phelps and Robert Wilensky
Document9 pages
Two Diet Plans For Fat PDF: Thomas A. Phelps and Robert Wilensky
hgliuy9p8y
No ratings yet
PDF From Scanner
Document8 pages
PDF From Scanner
ewqareqw
No ratings yet
(Revised) How To Copy and Paste Text From Scanned or Secured PDF-1
Document3 pages
(Revised) How To Copy and Paste Text From Scanned or Secured PDF-1
Zain Khalid
No ratings yet
Extracting Bibliography From PDF
Document2 pages
Extracting Bibliography From PDF
Susan
No ratings yet
Edgar Formatting PDF
Document9 pages
Edgar Formatting PDF
fahad Ali
No ratings yet
Creating Accessible PDF's Using Adobe Acrobat Pro
Document2 pages
Creating Accessible PDF's Using Adobe Acrobat Pro
Andrés Trujillo Mateus
No ratings yet
What Is A PDF Portable Document Format Adobe Acrobat
Document6 pages
What Is A PDF Portable Document Format Adobe Acrobat
tr3sky
No ratings yet
Mongo DB
Document3 pages
Mongo DB
proundofyoubaoanh
No ratings yet
Research Paper Conference Format
Document4 pages
Research Paper Conference Format
Kaustubh Vyas gfTvuiUkFV
No ratings yet
Fundamentals of Programming Assignment
Document13 pages
Fundamentals of Programming Assignment
Mashood Ahmad
No ratings yet
Pros and Cons of PDF Files
Document6 pages
Pros and Cons of PDF Files
Fuchs
100% (2)
What Is A PDF File - (And How To Open One)
Document10 pages
What Is A PDF File - (And How To Open One)
ishmum tihami
No ratings yet
Sites For Downloading PDF Files and Much More
Document2 pages
Sites For Downloading PDF Files and Much More
PLIJAM
No ratings yet
PDF Malware
Document7 pages
PDF Malware
Sunil Rangaswamy
No ratings yet
Listen To Your PDF: Reading Out Loud
Document3 pages
Listen To Your PDF: Reading Out Loud
rohitmahali
No ratings yet
Basic Computer Skills - What Is A PDF
Document1 page
Basic Computer Skills - What Is A PDF
ICT UIDAI
No ratings yet
How To Manage Your Collection of PDF Files - Digital Inspiration
Document7 pages
How To Manage Your Collection of PDF Files - Digital Inspiration
jay
No ratings yet
Basic OOPs Interview Questions
Document17 pages
Basic OOPs Interview Questions
Jayant Pawar
No ratings yet
How Do PDF Files Work?
Document2 pages
How Do PDF Files Work?
koushi010
No ratings yet
Database Management Systems Handout
Document72 pages
Database Management Systems Handout
Frtuna Haile
No ratings yet
Extract PDF References
Document2 pages
Extract PDF References
Kelly
No ratings yet
30 Alternatives To Adobe Acrobat
Document4 pages
30 Alternatives To Adobe Acrobat
Jammy Dodger
No ratings yet
OOP Concept Notes
Document6 pages
OOP Concept Notes
You Are Not Wasting TIME Here
100% (2)
Extract PDF Ocg
Document2 pages
Extract PDF Ocg
Jessica
No ratings yet
TC6 PROJECT SYNOPSIS KrishShetty VedantLandge 231106 101402
Document13 pages
TC6 PROJECT SYNOPSIS KrishShetty VedantLandge 231106 101402
Geeta patil
No ratings yet
Debunking The Myth Reassessing The Value of PDF Files
Document2 pages
Debunking The Myth Reassessing The Value of PDF Files
ArthurBrazil
No ratings yet
How To Create PDF Files: Thomson Reuters
Document28 pages
How To Create PDF Files: Thomson Reuters
Aditya Gore
No ratings yet
40+ OOPs Interview Questions and Answers (2024) - InterviewBit
Document32 pages
40+ OOPs Interview Questions and Answers (2024) - InterviewBit
Pardeep swami
No ratings yet
Pdfanno: A Web-Based Linguistic Annotation Tool For PDF Documents
Document5 pages
Pdfanno: A Web-Based Linguistic Annotation Tool For PDF Documents
Maroua Jeong
No ratings yet
Welcome To PDF Expert
Document32 pages
Welcome To PDF Expert
peternguyen3008
No ratings yet
Forensic Analysis of Residual Information in Adobe PDF Files
Document7 pages
Forensic Analysis of Residual Information in Adobe PDF Files
Francisco
No ratings yet
Ocr Thesis PDF
Document6 pages
Ocr Thesis PDF
lisakennedyfargo
100% (2)
What Are PDF Files?: Top 10 Protected PDF Hacks
Document7 pages
What Are PDF Files?: Top 10 Protected PDF Hacks
rwerer4
No ratings yet
PDF Summarizer
Document4 pages
PDF Summarizer
AZHARUDEEN S
No ratings yet
Audit PDF / Read PDF With Peepdf - Analyze & Modify PDF Files
Document2 pages
Audit PDF / Read PDF With Peepdf - Analyze & Modify PDF Files
DongDuongICT
No ratings yet
Vie2w All
Document9 pages
Vie2w All
Vhernie Ehl-jean Cruz
No ratings yet
362160an Easy Way To Combine PDF Files
Document2 pages
362160an Easy Way To Combine PDF Files
combinepdforg18
No ratings yet
PDF Export User Guide v10
Document22 pages
PDF Export User Guide v10
rfffff
No ratings yet
Extract Pages From PDF As Images Linux
Document2 pages
Extract Pages From PDF As Images Linux
Robin
No ratings yet
What Is A PDF - Portable Document Format - Adobe Acrobat
Document8 pages
What Is A PDF - Portable Document Format - Adobe Acrobat
junk
No ratings yet
Extract Paragraphs From PDF
Document2 pages
Extract Paragraphs From PDF
Terrance
No ratings yet
Planning and Designing in Quick Fields
Document14 pages
Planning and Designing in Quick Fields
Marco Sánchez
No ratings yet
Unit-1 (Part-A) Notes of OOPS by Updesh Kumar - Updated
Document29 pages
Unit-1 (Part-A) Notes of OOPS by Updesh Kumar - Updated
viku9267
No ratings yet
Document Analysis of PDF Files Methods R
Document14 pages
Document Analysis of PDF Files Methods R
agbayanimharco0
No ratings yet
PDF Extra Manual
Document3 pages
PDF Extra Manual
ninafm04
No ratings yet
How To Write A Thesis
Document15 pages
How To Write A Thesis
Nouman Shafique
No ratings yet
A Supervised Learning Approach For Heading Detection
Document55 pages
A Supervised Learning Approach For Heading Detection
Muhammad Abyan
No ratings yet
White Paper PDF or XPS: Choose The Right Document Format
Document11 pages
White Paper PDF or XPS: Choose The Right Document Format
igalkap
No ratings yet
Journal of Information Security and Applications: Minoru Kuribayashi, Koksheik Wong
Document10 pages
Journal of Information Security and Applications: Minoru Kuribayashi, Koksheik Wong
gebodih
No ratings yet
Mongodb Homework 5.3 Answer
Document8 pages
Mongodb Homework 5.3 Answer
afmcdeafl
100% (1)
Mongodb Tutorial
Document4 pages
Mongodb Tutorial
Vikram Jha
No ratings yet
How To Open .Pages Files Without Iwork: Instructions
Document1 page
How To Open .Pages Files Without Iwork: Instructions
reza
No ratings yet
PDF Export App - User Manual: An Efficient PDF Converter App For All Your Needs
Document18 pages
PDF Export App - User Manual: An Efficient PDF Converter App For All Your Needs
Acu
No ratings yet
DIY Exercise 5-2 - Use An API Implementation As Its Own Proxy
Document2 pages
DIY Exercise 5-2 - Use An API Implementation As Its Own Proxy
Rakkammal Rama
No ratings yet
My Data: Items A Items B Items C Items D Items E Items F Items G Items H
Document3 pages
My Data: Items A Items B Items C Items D Items E Items F Items G Items H
Rakkammal Rama
No ratings yet
Ibm MQ SSL
Document18 pages
Ibm MQ SSL
Rakkammal Rama
No ratings yet
IT Services: Doc. Ref: 535209657.docx Status: Author Last Updated By: Last Updated On: Approved by
Document10 pages
IT Services: Doc. Ref: 535209657.docx Status: Author Last Updated By: Last Updated On: Approved by
Rakkammal Rama
No ratings yet
Mule Certification Dump
Document19 pages
Mule Certification Dump
Rakkammal Rama
No ratings yet
IT Services: Doc. Ref: 535209657.docx Status: Author Last Updated By: Last Updated On: Approved by
Document10 pages
IT Services: Doc. Ref: 535209657.docx Status: Author Last Updated By: Last Updated On: Approved by
Rakkammal Rama
No ratings yet
IT Services: Doc. Ref: 535209652.docx Status: Author Last Updated By: Last Updated On: Approved by
Document11 pages
IT Services: Doc. Ref: 535209652.docx Status: Author Last Updated By: Last Updated On: Approved by
Rakkammal Rama
No ratings yet
DMG Mrs GCCP Query v0.1
Document11 pages
DMG Mrs GCCP Query v0.1
Rakkammal Rama
No ratings yet
IT Services: Doc. Ref: 535209651.docx Status: Author Last Updated By: Last Updated On: Approved by
Document9 pages
IT Services: Doc. Ref: 535209651.docx Status: Author Last Updated By: Last Updated On: Approved by
Rakkammal Rama
No ratings yet