P. 1
Digital Document Management

Digital Document Management

4.63

|Views: 620|Likes:
Published by Beaupix Studio
This presentation describes how to make your paper-based documents searchable by digitizing and applying optical character recognition software. It is very useful for people who have a large volume of paper documents. Not only it saves space and money, but it also saves a lot of time organizing the file and searching for documents when you need them.
This presentation describes how to make your paper-based documents searchable by digitizing and applying optical character recognition software. It is very useful for people who have a large volume of paper documents. Not only it saves space and money, but it also saves a lot of time organizing the file and searching for documents when you need them.

More info:

Categories:Types, Business/Law
Published by: Beaupix Studio on Feb 04, 2009
Copyright:Attribution Non-commercial No-derivs

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF or read online from Scribd
See more
See less

09/18/2010

pdf

Make your paper file cabinets searchable

How to manage paper and electronic documents effectively Presented by Beaupix Studio Fashion, commercial, advertising, headshot, and wedding photography in Boston, Massachusetts.

Make your paper file cabinets searchable
How to manage paper and electronic documents effectively

Revision 1.1

Ryuji Suzuki

Need awesome headshots in Boston?

License

©2009 by Ryuji Suzuki. All rights reserved. This work is licensed under the Creative Commons Attribution–Noncommercial–No Derivative Works 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. This work was previously presented at a workshop hosted by the MIT Libraries.
Make paper file cabinets searchable Want to make eye-catching advertising photography in Boston?

Make paper file cabinets searchable

Want to make eye-catching advertising photography in Boston?

My Background
• Photographer −→Terabytes of images, boxes of negatives. • A wide range of interests: information sponge, news junkie. • Have used search technology since mid 1990s. • Used to have many file cabinets. • Still have 4 big book shelves.

Make paper file cabinets searchable

Need a skilled commercial photographer in Boston?

How do we store information?
• Loose paper documents in folders or binders • Magazines, journals, newsletters • Books • Electronic files (PDF, HTML, RTF, DOC, etc.)

Make paper file cabinets searchable

Need a nice wedding photographer in Boston?

Paper problems

• Bulky. Heavy. −→×mobility • Not easily searchable −→×usability ×accessibility • Can be sorted and indexed for search −→ ×requires upfront investment • Creates “digital divide” −→ Stay in the easy world
Make paper file cabinets searchable Planning a fashion-oriented event in Boston?

Solution
1. Digitize paper documents 2. Make the documents searchable 3. Manage the digital version with database and local search engine

Make paper file cabinets searchable

Wedding photographer for fashion-conscious people in Boston

Before moving on. . .

• The idea of paperless office failed in the past. • Paper is not at fault; the way we use it is. • What roles does “paper” have? How do they compare to computer system?
Make paper file cabinets searchable Portrait photographer in Boston?

Input
keyboard mouse tablet stylus microphone camera

Storage
hard drive USB drive CD, DVD network storage

Processing
computer database search engines

Output

CRT LCD speech synthesizer

Make paper file cabinets searchable

Want to make eye-catching advertising photography?

Input

Storage

Output

Pen

Paper

Human Brain + Physical movement of paper
Processing

Make paper file cabinets searchable

Creative and experienced headshot photographer in Boston

Paper: strength
In small scale, paper is: • Cheap • Portable • Human-friendly, human-readable, human-writable −→ Paper is good as an i/o device in small scale.
Make paper file cabinets searchable All images worth price charged.

Paper: weakness
In large scale, paper is: • Bulky (low information density) • Not searchable (lacks computational power) • Not easy to distribute or back up −→ Paper is a poor storage device.
Make paper file cabinets searchable Enhance your advertising with photography!

Hybrid solution
• Keep paper as an input/output media • Digitize paper-based documents • Make documents searchable • Stay in digital domain for storage and search
Make paper file cabinets searchable Want to make eye-catching advertising photography in Boston?

Input
keyboard mouse tablet stylus microphone camera

Storage
hard drive USB drive CD, DVD network storage OCR Scanner

Processing
computer database search engines

Output

CRT LCD speech synthesizer

Printer Paper

Recycle

pen

Make paper file cabinets searchable

Need a skilled commercial photographer in Boston?

OCR: Optical Character Recognition
• Recognizes text in the scanned images and generate searchable plain text. • Accuracy is less than 100%. • Insufficient to use alone, but useful as an auxiliary data for search engines to find keyword match. • Solution: keep the scanned image and overlay transparent layer of OCR text.
Make paper file cabinets searchable Need a nice wedding photographer in Boston?

Paper Scanner PDF image OCR engine PDF image + text
Facsimile image + searchable text (transparent overlay) in a single PDF file Human-readable facsimile image

Computer can perform keyword search on these!
Make paper file cabinets searchable

Need a nice wedding photographer in Boston?

OCR: Recommendation (1)
OCR process is computationally intensive; it takes a long time to finish on a large number of documents. Common strategies: 1. Run OCR in a batch when you leave office. 2. Run OCR in background. Option 1 is more primitive but easy to recover in case of errors. Option 2 is more convenient when processing a small amount of document at a time.
Make paper file cabinets searchable Planning a fashion-oriented event in Boston?

OCR: Recommendation (2)
Some OCR engines resample and reencode the image data−→Quality loss. File size may increase if the reencoder use a poor set of encoding parameters. Short answer: The OCR engine in Adobe Acrobat is pretty good. Use it without re-encoding. Note: Acrobat lets you control major parameters. Many other softwares decide them for you, poorly!

Make paper file cabinets searchable

Wedding photographer for fashion-conscious people in Boston

Advantages of the hybrid solution
• Search instantly: papers, invoices, client case files, etc. • Unify management of paper and digital documents. • Archived information gets used more. • Freedom to collect more information. • Gain mobility. Retrieve your floor and hallway space. • Avoid frustration at jammed photocopiers. • Lose fax machines. Submit p-set from your bedroom.
Make paper file cabinets searchable Portrait photographer in Boston?

What you need
1. A document scanner ($300–500 for personal models, $1500–3000 for networked “group” scanner) 2. Adobe Acrobat for OCR 3. A document search engine or database software • Spotlight (Mac OS) • DEVONthink (Mac Application) • Windows options: google desktop, etc.
Make paper file cabinets searchable Want to make eye-catching advertising photography?

Document scanners
• Fujitsu ScanSnap S510M (Mac), S510 (Windows) • Fujitsu ScanSnap S300M (Mac), S300 (Windows) • Canon imageFORMULA DR–2010C, DR–2050C • Canon imageFORMULA ScanFront 220 (Network Scanner) • HP Scanjet 7800, N6010, N7710 Key factors: OS, footprint, price, interface (USB, network, USB memory), scan speed, feeder capacity, multi-feed detection, bundled software.
Make paper file cabinets searchable Creative and experienced headshot photographer in Boston

Workflow
1. Roughly organize papers in categories by date, volume, access frequency, value, and subject. 2. Scan each document, one folder per category. 3. At the end of the day, batch process the PDF files to make them searchable (OCR on Acrobat). 4. Take the documents in to your favorite document database/search engine.
Make paper file cabinets searchable All images worth price charged.

Prioritize Work
• Stuff you don’t need at all−→Recycle. • Stuff you keep but probably won’t use−→Don’t invest more work than necessary. • Stuff you need and probably will use−→Organize and scan simultaneously. • Stuff you use frequently−→Invest more work to organize. • Handwritten stuff−→requires some manual sorting and indexing.
Make paper file cabinets searchable Enhance your advertising with photography!

Workload estimate
• Initial phase: digitize existing paper documents. – One drawer of a file cabinet takes about 4–6 hours, once you get used to the task. – Important documents should be double checked for page skipping, etc., before destroying the original documents. • Once file cabinets are removed, daily work and maintenance are minimal.
Make paper file cabinets searchable Want to make eye-catching advertising photography in Boston?

Once you’ve gone digital (1)
This will change the way you work in your office. You will be: • scanning documents, not photocopying, for records. • printing necessary pages or sections of a document, not the whole document. • finished work goes back into the computer, not a pile on your desk or file cabinet; recycle paper often.
Make paper file cabinets searchable Want to make eye-catching advertising photography in Boston?

Once you’ve gone digital (2)
When you rearrange your office, consider to get: • a desktop document scanner • a small document printer within your office • one, two or three 24" or 30" LCD screens • an extra fast, network-connected document scanners (if users grow)

Make paper file cabinets searchable

Need a skilled commercial photographer in Boston?

Some numbers
• Each file cabinet drawer ≈ 3GB. • A 32GB USB thumb drive ≈ 10+ drawers. • A 500GB pocket hard drive ≈ 137+ drawers. • An internal 1.5TB hard drive ≈ 465+ drawers. −→ Digital management of paper documents scales well for individual offices
Make paper file cabinets searchable Planning a fashion-oriented event in Boston?

Remember!
• Backup often. • Beware of information security (secure your computer, encrypt sensitive documents). • Secure your office (anyone can search and steal your documents quickly). • Consider having a backup document scanner of the same model.
Make paper file cabinets searchable Wedding photographer for fashion-conscious people in Boston

Extra Topics

Make paper file cabinets searchable

Wedding photographer for fashion-conscious people in Boston

Books
Commercial books in print: • Buy in PDF form if available • Cut the binding and scan through document scanner Out-of-print books, valuable books: • Use a book scanner (OpticBook series)
Make paper file cabinets searchable Portrait photographer in Boston?

Make paper file cabinets searchable

Want to make eye-catching advertising photography?

OpticBook 3600
• The only solution available for individual users. • Scanning area comes very close to the edge. • Functions are well designed. • A bit too slow. • Software runs only on Windows. • Software is buggy.
Make paper file cabinets searchable Creative and experienced headshot photographer in Boston

Oversize pages

• Carrier sheets (ScanSnap models) • Scan piecewise and stitch them (photomerge). • Tabloid flatbed scanners (expensive) • Take photographs (tricky!) • Reduction photocopy (wasteful)
Make paper file cabinets searchable All images worth price charged.

Non-OCRable material
• Handwritten notes • Texts of minor languages • Photographs of non-textual items What to do? • Use stamps, stickers or notepads with pre-printed keywords • Manually tag with keywords
Make paper file cabinets searchable Enhance your advertising with photography!

Sync your computers
Digital environment allows you to have your home directory synchronized among multiple computers: • Home computer • Lab/office computer • Laptop −→ Synchronize files all directions (use Unison).
Make paper file cabinets searchable Want to make eye-catching advertising photography in Boston?

Questions?
Contact Ryuji Suzuki through the website. This author is active in the field of photography and imaging technology consulting. Footnote advertising in this presentation is for Beaupix Studio in Boston, Massachusetts. Beaupix Studio serves the field of fashion, commercial, advertising, glamour, headshot, wedding photography. Please visit http://beaupix.com.

Make paper file cabinets searchable

Want to make eye-catching advertising photography in Boston?

Batch OCR with Acrobat
Create an OCR batch profile (do it once): 1. Advanced −→ Document Processing −→ Batch Processing 2. New Sequence: give your OCR profile a name 3. Add “Recognize Text Using OCR” from left panel. 4. Click on the “Edit” button, select “English (US)” and “Searchable Image (Exact)” from the pulldown menu. This setting does not resample the original image. 5. Ok to save the profile setting. Set output options as desired. Run OCR: 1. Advanced −→ Document Processing −→ Batch Processing 2. Select your OCR profile 3. Run it

Make paper file cabinets searchable

Need a skilled commercial photographer in Boston?

Compress PDF file size with Acrobat
For b&w documents, a highly efficient data compression called JBIG2 is available in Acrobat. Select Advanced −→ PDF Optimizer. Select “Images” in the left panel, and look at “Monochrome Images” section. Choose “Bicubic Downsampling to” 150 to 400dpi (depending on desired image quality) and “JBIG2” and “Lossy.” This is the most efficient data compression for binary b&w images from a scanner. In contrast, color and grayscale images can only use JPEG compression, and it does not offer nearly as efficient compression as JBIG2. Therefore, in the interest of disk space, choose b&w document mode whenever it is sufficient, as scanned color and grayscale documents will take much larger disk space.

Another useful function is Document −→ Optimize Scanned PDF. This function used to work very well in Acrobat 7 Professional, but not in Acrobat 9 Pro, due to change in the implementation of the edge processing. If your document contains very wide margin, this may still work well. If it does, you gain very compact PDF file size.

Make paper file cabinets searchable

Need a nice wedding photographer in Boston?

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->