You are on page 1of 39

Make your paper le cabinets searchable

How to manage paper and electronic documents effectively Presented by Beaupix Studio Fashion, commercial, advertising, headshot, and wedding photography in Boston, Massachusetts.

Make your paper le cabinets searchable


How to manage paper and electronic documents effectively

Revision 1.1

Ryuji Suzuki

Need awesome headshots in Boston?

License

2009 by Ryuji Suzuki. All rights reserved. This work is licensed under the Creative Commons AttributionNoncommercialNo Derivative Works 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. This work was previously presented at a workshop hosted by the MIT Libraries.
Make paper le cabinets searchable Want to make eye-catching advertising photography in Boston?

Make paper le cabinets searchable

Want to make eye-catching advertising photography in Boston?

My Background
Photographer Terabytes of images, boxes of negatives. A wide range of interests: information sponge, news junkie. Have used search technology since mid 1990s. Used to have many le cabinets. Still have 4 big book shelves.

Make paper le cabinets searchable

Need a skilled commercial photographer in Boston?

How do we store information?


Loose paper documents in folders or binders Magazines, journals, newsletters Books Electronic les (PDF, HTML, RTF, DOC, etc.)

Make paper le cabinets searchable

Need a nice wedding photographer in Boston?

Paper problems

Bulky. Heavy. mobility Not easily searchable usability accessibility Can be sorted and indexed for search requires upfront investment Creates digital divide Stay in the easy world
Make paper le cabinets searchable Planning a fashion-oriented event in Boston?

Solution
1. Digitize paper documents 2. Make the documents searchable 3. Manage the digital version with database and local search engine

Make paper le cabinets searchable

Wedding photographer for fashion-conscious people in Boston

Before moving on. . .

The idea of paperless ofce failed in the past. Paper is not at fault; the way we use it is. What roles does paper have? How do they compare to computer system?
Make paper le cabinets searchable Portrait photographer in Boston?

Input
keyboard mouse tablet stylus microphone camera

Storage
hard drive USB drive CD, DVD network storage

Processing
computer database search engines

Output

CRT LCD speech synthesizer

Make paper le cabinets searchable

Want to make eye-catching advertising photography?

Input

Storage

Output

Pen

Paper

Human Brain + Physical movement of paper


Processing

Make paper le cabinets searchable

Creative and experienced headshot photographer in Boston

Paper: strength
In small scale, paper is: Cheap Portable Human-friendly, human-readable, human-writable Paper is good as an i/o device in small scale.
Make paper le cabinets searchable All images worth price charged.

Paper: weakness
In large scale, paper is: Bulky (low information density) Not searchable (lacks computational power) Not easy to distribute or back up Paper is a poor storage device.
Make paper le cabinets searchable Enhance your advertising with photography!

Hybrid solution
Keep paper as an input/output media Digitize paper-based documents Make documents searchable Stay in digital domain for storage and search
Make paper le cabinets searchable Want to make eye-catching advertising photography in Boston?

Input
keyboard mouse tablet stylus microphone camera

Storage
hard drive USB drive CD, DVD network storage OCR Scanner

Processing
computer database search engines

Output

CRT LCD speech synthesizer

Printer Paper

Recycle

pen

Make paper le cabinets searchable

Need a skilled commercial photographer in Boston?

OCR: Optical Character Recognition


Recognizes text in the scanned images and generate searchable plain text. Accuracy is less than 100%. Insufcient to use alone, but useful as an auxiliary data for search engines to nd keyword match. Solution: keep the scanned image and overlay transparent layer of OCR text.
Make paper le cabinets searchable Need a nice wedding photographer in Boston?

Paper Scanner PDF image OCR engine PDF image + text


Facsimile image + searchable text (transparent overlay) in a single PDF le Human-readable facsimile image

Computer can perform keyword search on these!


Make paper le cabinets searchable

Need a nice wedding photographer in Boston?

OCR: Recommendation (1)


OCR process is computationally intensive; it takes a long time to nish on a large number of documents. Common strategies: 1. Run OCR in a batch when you leave ofce. 2. Run OCR in background. Option 1 is more primitive but easy to recover in case of errors. Option 2 is more convenient when processing a small amount of document at a time.
Make paper le cabinets searchable Planning a fashion-oriented event in Boston?

OCR: Recommendation (2)


Some OCR engines resample and reencode the image dataQuality loss. File size may increase if the reencoder use a poor set of encoding parameters. Short answer: The OCR engine in Adobe Acrobat is pretty good. Use it without re-encoding. Note: Acrobat lets you control major parameters. Many other softwares decide them for you, poorly!

Make paper le cabinets searchable

Wedding photographer for fashion-conscious people in Boston

Advantages of the hybrid solution


Search instantly: papers, invoices, client case les, etc. Unify management of paper and digital documents. Archived information gets used more. Freedom to collect more information. Gain mobility. Retrieve your oor and hallway space. Avoid frustration at jammed photocopiers. Lose fax machines. Submit p-set from your bedroom.
Make paper le cabinets searchable Portrait photographer in Boston?

What you need


1. A document scanner ($300500 for personal models, $15003000 for networked group scanner) 2. Adobe Acrobat for OCR 3. A document search engine or database software Spotlight (Mac OS) DEVONthink (Mac Application) Windows options: google desktop, etc.
Make paper le cabinets searchable Want to make eye-catching advertising photography?

Document scanners
Fujitsu ScanSnap S510M (Mac), S510 (Windows) Fujitsu ScanSnap S300M (Mac), S300 (Windows) Canon imageFORMULA DR2010C, DR2050C Canon imageFORMULA ScanFront 220 (Network Scanner) HP Scanjet 7800, N6010, N7710 Key factors: OS, footprint, price, interface (USB, network, USB memory), scan speed, feeder capacity, multi-feed detection, bundled software.
Make paper le cabinets searchable Creative and experienced headshot photographer in Boston

Workow
1. Roughly organize papers in categories by date, volume, access frequency, value, and subject. 2. Scan each document, one folder per category. 3. At the end of the day, batch process the PDF les to make them searchable (OCR on Acrobat). 4. Take the documents in to your favorite document database/search engine.
Make paper le cabinets searchable All images worth price charged.

Prioritize Work
Stuff you dont need at allRecycle. Stuff you keep but probably wont useDont invest more work than necessary. Stuff you need and probably will useOrganize and scan simultaneously. Stuff you use frequentlyInvest more work to organize. Handwritten stuffrequires some manual sorting and indexing.
Make paper le cabinets searchable Enhance your advertising with photography!

Workload estimate
Initial phase: digitize existing paper documents. One drawer of a le cabinet takes about 46 hours, once you get used to the task. Important documents should be double checked for page skipping, etc., before destroying the original documents. Once le cabinets are removed, daily work and maintenance are minimal.
Make paper le cabinets searchable Want to make eye-catching advertising photography in Boston?

Once youve gone digital (1)


This will change the way you work in your ofce. You will be: scanning documents, not photocopying, for records. printing necessary pages or sections of a document, not the whole document. nished work goes back into the computer, not a pile on your desk or le cabinet; recycle paper often.
Make paper le cabinets searchable Want to make eye-catching advertising photography in Boston?

Once youve gone digital (2)


When you rearrange your ofce, consider to get: a desktop document scanner a small document printer within your ofce one, two or three 24" or 30" LCD screens an extra fast, network-connected document scanners (if users grow)

Make paper le cabinets searchable

Need a skilled commercial photographer in Boston?

Some numbers
Each le cabinet drawer 3GB. A 32GB USB thumb drive 10+ drawers. A 500GB pocket hard drive 137+ drawers. An internal 1.5TB hard drive 465+ drawers. Digital management of paper documents scales well for individual ofces
Make paper le cabinets searchable Planning a fashion-oriented event in Boston?

Remember!
Backup often. Beware of information security (secure your computer, encrypt sensitive documents). Secure your ofce (anyone can search and steal your documents quickly). Consider having a backup document scanner of the same model.
Make paper le cabinets searchable Wedding photographer for fashion-conscious people in Boston

Extra Topics

Make paper le cabinets searchable

Wedding photographer for fashion-conscious people in Boston

Books
Commercial books in print: Buy in PDF form if available Cut the binding and scan through document scanner Out-of-print books, valuable books: Use a book scanner (OpticBook series)
Make paper le cabinets searchable Portrait photographer in Boston?

Make paper le cabinets searchable

Want to make eye-catching advertising photography?

OpticBook 3600
The only solution available for individual users. Scanning area comes very close to the edge. Functions are well designed. A bit too slow. Software runs only on Windows. Software is buggy.
Make paper le cabinets searchable Creative and experienced headshot photographer in Boston

Oversize pages

Carrier sheets (ScanSnap models) Scan piecewise and stitch them (photomerge). Tabloid atbed scanners (expensive) Take photographs (tricky!) Reduction photocopy (wasteful)
Make paper le cabinets searchable All images worth price charged.

Non-OCRable material
Handwritten notes Texts of minor languages Photographs of non-textual items What to do? Use stamps, stickers or notepads with pre-printed keywords Manually tag with keywords
Make paper le cabinets searchable Enhance your advertising with photography!

Sync your computers


Digital environment allows you to have your home directory synchronized among multiple computers: Home computer Lab/ofce computer Laptop Synchronize les all directions (use Unison).
Make paper le cabinets searchable Want to make eye-catching advertising photography in Boston?

Questions?
Contact Ryuji Suzuki through the website. This author is active in the eld of photography and imaging technology consulting. Footnote advertising in this presentation is for Beaupix Studio in Boston, Massachusetts. Beaupix Studio serves the eld of fashion, commercial, advertising, glamour, headshot, wedding photography. Please visit http://beaupix.com.

Make paper le cabinets searchable

Want to make eye-catching advertising photography in Boston?

Batch OCR with Acrobat


Create an OCR batch prole (do it once): 1. Advanced Document Processing Batch Processing 2. New Sequence: give your OCR prole a name 3. Add Recognize Text Using OCR from left panel. 4. Click on the Edit button, select English (US) and Searchable Image (Exact) from the pulldown menu. This setting does not resample the original image. 5. Ok to save the prole setting. Set output options as desired. Run OCR: 1. Advanced Document Processing Batch Processing 2. Select your OCR prole 3. Run it

Make paper le cabinets searchable

Need a skilled commercial photographer in Boston?

Compress PDF le size with Acrobat


For b&w documents, a highly efcient data compression called JBIG2 is available in Acrobat. Select Advanced PDF Optimizer. Select Images in the left panel, and look at Monochrome Images section. Choose Bicubic Downsampling to 150 to 400dpi (depending on desired image quality) and JBIG2 and Lossy. This is the most efcient data compression for binary b&w images from a scanner. In contrast, color and grayscale images can only use JPEG compression, and it does not offer nearly as efcient compression as JBIG2. Therefore, in the interest of disk space, choose b&w document mode whenever it is sufcient, as scanned color and grayscale documents will take much larger disk space.

Another useful function is Document Optimize Scanned PDF. This function used to work very well in Acrobat 7 Professional, but not in Acrobat 9 Pro, due to change in the implementation of the edge processing. If your document contains very wide margin, this may still work well. If it does, you gain very compact PDF le size.

Make paper le cabinets searchable

Need a nice wedding photographer in Boston?

You might also like