You are on page 1of 5

PDF Automation

If you using pdf then download the package “UiPath.PDF.Activities” from manage package

The PDF pack contains activities designed to extract data from PDF and store them into string variables. The data can be
extracted from the entire document or from a range of pages specified under the Range property found in each of the activities

PDF Data Extraction


Extract group of data :- Read PDF Text, Read PDF with OCR, Screen Scraping
Extract Specific data

Note
Before starting the process we need to enable PDF settings Actual Size
1. Open the PDF Short cut key (ctrl+1) or
2. Go to Edit Go to View>Go to Zoom>select actual size
3. Click on Preferences
4. Go to Reading
5. Selected Inter reading order from document (recommended)
6. Override the reading order....it should be uncheck
7. Page vs document selected should be "Read the entire document"
8. Confirm before tagging.... it should be Uncheck
9. Go to Accessibility  and check should be for option Use document
structure.....
10. Checkbox should be checked for option Enable assistive.....
11., Click on"Ok"
Read PDF Text
Reads all characters from a specified PDF file and stores them in a string variable.

Properties
File
FileName - The path of the PDF file to be read. Only String variables and strings are supported.
Password - The password of the PDF file, if necessary. Only String variables and strings are supported.
Input
PreserveFormatting - If selected, this option maintains the formatting of the file after the extraction is completed. The default
value is False.
Range - The range of pages that you want to read. You can specify a single page (e.g. "7"), a range of pages (e.g. "7-12"), or a
complex range, (e.g. "2-5, 7, 15-End" or "All"). Only string variables and strings are supported. The default value is "All".
Output
Text - The extracted string. This field supports only String variables.
Common
DisplayName - The display name of the activity.
Misc
Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
Read PDF With OCR
Reads all characters from a specified PDF file and stores it in a string variable by using OCR technology.
OCR
Stands Optical Character Recognition/Reader and it used to extract data from images and scanned documents.
OCR Engines:
1. Microsoft OCR
2. Tesseract OCR
3. Google Cloud Vision OCR
4. Omni OCR and many more

Properties
Common
DisplayName - The display name of the activity.
File
FileName - The path of the PDF file to be read. Only String variables and strings are supported.
Password - The password of the PDF file, if necessary. Only String variables and strings are supported.
Input
DegreeOfParalelism - Specifies how many, if any, pages to be analyzed in parallel. The -1 value uses the maximum number of logical
on the machine, while specifying a positive value uses that number of logical processors. By default, this property is set to -1.
ImageDpi (Digital Pixel)- The DPI used for the OCR process. By default, this field is set to 150.
Range - The range of pages that you want to read. If the range isn't specified, the whole file is read. You can specify a single page (e.
range of pages (e.g. "7-12"), or a complex range, (e.g. "2-5, 7, 15-End" or "All"). Only string variables and strings are supported. The
value is "All".
Misc
Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
Export PDF Page as Image
Creates an image from a page in a specified PDF file.

Extract Images from PDF


Extracts images from a specified PDF file.

Get PDF Page Count


Provides the total number of pages in a PDF file.

Join PDF Files


Joins multiple PDF files stored in an array of strings into a single PDF file.
FileList - An array of strings containing the full paths of all the PDF files you want to join into a single one. This field supports only
arrays of strings.
OutputFileName - The name of the resulting PDF file where the specified files are joined. This field supports only strings and
String variables.
Manage PDF Password
Changes the password of a specified PDF file. Or if you don’t have a password to your PDF

FileName - The path of the PDF file whose password you want to manage. This field supports only strings and String variables.
NewOwnerPassword - The new owner password you want to set for the specified PDF file, if necessary. This password is used for
setting certain document restrictions. This field supports only strings and String variables.
NewUserPassword - The new user password you want to set for the specified PDF file, if necessary. This password is required to
open the document in Read Only mode. This field supports only strings and String variables.
OldOwnerPassword - The old password of the owner, used for modifying certain document restrictions. This field supports only
strings and String variables.
OldUserPassword - The old password of the user. This password is required to open the document in Read Only mode. This field
supports only strings and String variables.
OutputFileName - The path where you want to save the resulting PDF file with the new password. If a file exists at the specified
path, it is overwritten. This field supports only strings and String variables.

You might also like