The Pros and Cons of PDF Files

By

Andrew Downie

Paper presented at the Round Table Conference on Print Disability Sydney, 2004

The Pros and Cons of PDF Files

Abstract
PDF (Portable Document Format) files were once completely inaccessible to people using screen readers. Over the past several years, they have undergone several steps along the evolutionary path towards accessibility. Under the best circumstances, a PDF file can now be a very effective resource for people using screen readers. The presentation will begin by summarising developments which have made PDFs more broadly accessible. These include improvements to software by such diverse groups as Adobe, Microsoft and screen reader producers. The potential benefits of reading PDF files via a screen reader will be discussed. These include reliable page structure and access to document bookmarks and hypertext links. Limitations when using current software will also be addressed. These include no access to font style information and the potential for poorly constructed documents. Importantly, issues arising from use of older (not very old) screen readers will be considered

3 The Pros and Cons of PDF Files

Background
PDF files are created, either directly or indirectly, by a number of Adobe products. They can also be created by software from other companies, including directly from the current Apple Macintosh operating system. The "portable" in the name refers to their portability across a wide range of computer platforms. That is, a PDF file will retain its layout, regardless of the computer used to display it. As will be discussed further later, not all PDFs are created equal. they can now be created in a manner which will make them largely accessible to people using screen readers. In a worst case, however, they can be created to produce a jumbled mess of characters. Adobe Acrobat is the major piece of software associated with PDFs. With it, files can be created, modified, annotated, bookmarked and even turned into a multimedia a extravaganza. This software is not free. PDFs can, however, be read with the free Adobe Reader. One issue surrounding PDFs causes considerable confusion in relation to accessibility. This is the feature in Acrobat which allows paper documents to be scanned and saved as image-only files. While representing a quick way of getting paper-based material into an electronic format, an image-only PDF is not accessible to screen readers. This type of file is not the subject of the following discussion until specifically addressed in the Image-Only PDF Files Section. As will be shown in that discussion, even image-only PDFs which were once the archetype of inaccessibility can, with the right tools, have a claim to salvation.

From Inaccessible to Largely Accessible
Less than ten years ago, PDF files were completely inaccessible to people using screen readers. Under the right circumstances (discussed later), they are now largely accessible and can offer some advantages over other formats. Let's briefly trace the accessibility history of this file format. In doing so, I encourage those of you who have had unpleasant PDF experiences to apply as open a mind as you can to the subject. The first significant step was when Adobe released a plugin for version 4 of Acrobat Reader to be used by screen readers. Having mastered a less than mnemonic set of commands, the user could navigate a document by page, bookmark or hypertext link. Text attributes were not (and are still not) provided and such layouts as multiple column and tables caused real frustration. With the release of Acrobat Reader Version 5 accessibility was integrated into the product without the need for a plugin. This was subject to a couple of constraints. Firstly, the user had to be using a MSAA (Microsoft Active Accessibility) enabled screen reader. Secondly, the full version of Acrobat Reader (a smaller one without

4 The Pros and Cons of PDF Files

accessibility was offered) had to be used. In this release, Adobe adopted coding developed by Microsoft to make products more accessible to screen readers. Quite good access was available, provided the PDF file was "tagged". I'll say more about this process later. Files which are not tagged can still cause major problems. A properly constructed file, though, could be navigated in very much the same way as a file in other Windows applications. An access barrier previously not encountered now became an issue. One of the features of PDFs which is appealing to some authors and publishers is a raft of security settings. Files protected against copying were often also protected against reading by people using screen readers. While facilities were built into Acrobat to overcome this, files prepared with older versions remained blocked. Further, the default setting in Acrobat V5 was not to allow access by screen readers. Version 5 also included some useful features for people not relying on screen readers. Text size could be adjusted over a wide range and colours could also be altered to meet individual preference. The current product from Adobe for reading PDFs is Adobe Reader V6. Note the change of name from Acrobat to Adobe Reader. The rationale for this seems to be that this product now handles a variety of electronic books and not just normal PDF files. A significant inclusion is the ability of the Reader to utilise synthetic speech to read documents aloud. Given the limited options – by page or entire document – this won't excite people using screen readers. It does, however, have potential for people who have reading difficulties and who may want to augment text with speech output. The other major fix is that files protected with older versions of Acrobat can now be read with current screen readers. The default setting when applying security with Acrobat V6 is, as it should have been originally, to allow screen reader access. Work at Adobe has not been in isolation. Screen reader developers have also been refining their products to provide better access to PDFs. Hypertext links and bookmarks are now valuable features. Tables can also be read with the same precision as those on web pages.

Current Limitations
Text attribute information is still not available. Identification of headings and paragraphs is unreliable. To get a reasonable level of access one must be using an up-to-date screen reader – certainly one released no more than a year ago and preferably one of the very latest offerings. And then there's the vexed issue of whether the PDF file has been properly constructed.

Producing Tagged PDF Files
In some ways accessibility to PDFs parallels access to web pages some years ago. Over the past several years, improved tools and increased knowledge has allowed developers of web pages to produce highly accessible sites. While still eminently

5 The Pros and Cons of PDF Files

possible to produce very ugly (in several senses) pages, quite good access can now generally be expected. With the release of Acrobat V5, Adobe began producing extensive documentation on how to create accessible PDF files. A great deal of information is available at http://access.adobe.com, covering preparation from various sources. As mentioned earlier, the crucial issue is to tag the file. Depending on how the PDF is produced, this may be an automatic or manual process. The former is clearly the most desirable option and, so far as I can tell, the only one for a person using a screen reader. What follows is a very brief introduction to the relatively new (and widely unknown) science of accessible PDF creation.

Accessible PDF Files From Word
A very easy way of producing a tagged PDF file is to create it from Microsoft Word (you must be using Word 9 or later). When Acrobat (the one you pay for) is installed onto a Windows-based computer with Word already installed, an extra item is added to the top menu in Word. This allows the setting of Acrobat preferences and the conversion of Word files to PDF files with ease. However, and it's a fairly big "however", it's really important to construct the Word file correctly to get a well constructed PDF file. The key to this process is to use Word styles. For example, do not simply change the font size and type for headings. Instead, select a heading style for each heading. Similarly, if information is to be presented in tabular format, use a table rather than simply tabbing between items. Space between paragraphs should be determined by the paragraph style, not by repeated presses of the Enter key. If this all sounds a bit tedious, be warned that it is becoming important not just for producing PDF files but for preparation of material to be displayed in the increasingly popular XML format.

Creating Accessible PDF Files in Acrobat
Acrobat V5 and later offers facilities for tagging files. In Acrobat 6, this can be done automatically. My so far limited experimentation suggests varying results, probably subject to the manner in which the file was created. Acrobat also allows manual tagging, including addition of alternative text for images. This process requires sight and some knowledge for best results. Both Acrobat and Adobe Reader offer an accessibility checker. From within Acrobat, problems can be rectified. While a file cannot be altered in Adobe Reader, it offers a feature which can help with poorly constructed files. The "reading order" can be adjusted for screen reader output and this can sometimes turn garbled text into an accurate representation of the author's intentions.

6 The Pros and Cons of PDF Files

Image­Only PDF Files
As mentioned earlier, these files are created by scanning paper documents into Acrobat. No optical character recognition (OCR) takes place, each scanned page simply appearing in the file as an image of the original. As screen readers cannot read pictures, this type of presentation is completely inaccessible. But now to the good news. It is now, subject to the quality of the original paper document, easy to convert an image-only file into one containing quite readable text. Some, and only some, of the available options are mentioned here. Acrobat V6 contains a Paper Capture facility. Having loaded an image-only PDF file into Acrobat, simply run Paper Capture to produce text. Commercial OCR software will now also convert image-only files to text. Two which do this very well are OmniPage and Finereader. As already cautioned, the quality of print on the original paper document will affect the OCR process. A smudged, faded document will look smudged and faded when viewed as an image-only file and there is a high likelihood of poor results when applying OCR software.

Conclusion
Accessibility of PDF files has improved markedly over the past decade or so. In the best circumstances, these files can be a valuable resource to people using screen readers and other adaptive equipment. Navigation by hypertext links, bookmarks and specific page references can allow quick and easy access to material without risk of altering the original file. The improvements are due to efforts on the part of large companies such as Adobe and Microsoft and to very small ones which produce screen readers. On the other hand, partly due to the need for further work to yield more information to readers using synthetic speech or Braille output and partly because much more education of people producing PDF files is needed, there are still shortcomings. Apart from anything else, people using screen readers need a current, or nearly current, version to get best results. It is important that those who enjoy using PDFs and those who endure the experience continue to provide feedback to software developers. It is also at least as important to increase substantially the level of education regarding the need to construct PDF files correctly. If a far greater proportion of files are produced in accordance with Adobe's recommendations, PDF will not be so widely viewed among people using screen readers as a nasty three-letter word.