You are on page 1of 3

What is PDF?

A Breakdown of the Various


PDF Content Types

PDF - Portable Document Format


In recent years, the PDF file format has emerged as a standard for sharing documents between users or
posting information to the Internet.
All PDF files are structured the same way, with a Header, Content, and Footer for each page of the
document. The Header indicates that a new page is starting. Likewise, the Footer indicates that the page
has ended. The Content is the part of the file that contains information viewable by the user.
This structure is the same for all PDF files and is illustrated below:
Page 1
PDF Header
Content
PDF Footer

Page 2
PDF Header
Content
PDF Footer

Page 3
PDF Header
Content
PDF Footer

This setup differs from a traditional word processing file (such as a Microsoft Word .doc file) that
contains a single column of information and is only paginated when printed. In the PDF file, each page
is a holding area for that pages specific block of information.
Where one PDF file may differ from another is in the format of the content. Many different formats
may be used including formatted text, unformatted ASCII text, raster images, vector images, or any
combination of these. It is the nature of this content that determines how the final PDF file will function.
Many terms have been created and used to describe the PDF files resulting from different content types.
Some of these are:

PDF Normal
PDF Image
True PDF
Wrapped PDF
PDF Image + Text
PDF-wrapped TIFF

PDF Normal
Also known as True PDF and Real PDF, these documents represent the ideal PDF files for most
applications. These documents have been created and published using PDF software. The content includes
the original formatted text of the document. Tables in the document are also usually published as
formatted text. Graphics or pictures will usually appear as cut images inserted into the text.
The structure of this type of document is illustrated below:
Page 1

Page 2

Page 3

PDF Header

PDF Header

PDF Header

Formatted text of the


document, including
graphics and tables

Formatted text of the


document, including
graphics and tables

Formatted text of the


document, including
graphics and tables

PDF Footer

PDF Footer

PDF Footer

PDF Normal documents allow the user to search text and copy/paste into other files. And, because most
of the information in these files is text, the file size is greatly reduced making these files easy to use and
ideal to exchange.

PDF Images
The PDF Image is also called the Wrapped PDF or the PDF Wrapped TIFF. In these files, the content is
simply an image file. The image file could be in many formats (GIF, TIFF, JPG, etc.), and of many subjects (scanned page, picture, graphic design). The most common use is a scanned page in TIFF format.
To create a PDF Image file from TIFF images, PDF creation software is used to insert the PDF Header
and Footer information around the image to make it a PDF page. This process of wrapping the image
with the PDF information is why these are often referred to as Wrapped PDFs.
The structure of the file is illustrated below:
Page 1
PDF Header
TIFF Image
PDF Footer

Page 2
PDF Header

Page 3
PDF Header

TIFF Image

TIFF Image

PDF Footer

PDF Footer

Text searching and text copy/paste functions are not available with this type of PDF file because the only
information they contain is image information. Although a scanned page may appear to contain text, it is
actually just a bitmap of that text and not the text itself. Because of the large size of image files, the file
size of the resulting PDF files can be quite large. As such, the files can occasionally be difficult to use.
PDF Image + Text
This file type represents a compromise between PDF Normal and PDF Image files.
To make these files, the author begins with a hardcopy document. The document is scanned to get a TIFF
image making it similar to the PDF Image document described above. But, the scan is then run through
Optical Character Recognition (OCR) software such as OmniPage to capture the text of document and
the position of the text on the page. This text information is then added to the content part of the file.
The illustration below shows the structure of this type of PDF file:
Page 1
PDF Header
TIFF Image
OCR Text
PDF Footer

Page 2
PDF Header
TIFF Image
OCR Text
PDF Footer

Page 3
PDF Header
TIFF Image
OCR Text
PDF Footer

When these files are viewed, the user sees the image on the screen. However, the text in the background is
available for text searching and copy/paste functions. Because these files contain both the image and text
information, their file size is even larger than that of PDF Image files.

Copyright 2003 ScanSoft, Inc. All Rights Reserved. The ScanSoft logo, Productivity Without Boundaries and OmniPage are
trademarks or registered trademarks of ScanSoft, Inc. in the United States and/or other countries. All other company names or product
names may be the trademarks of their respective owners.