Many people know that OCR stands for ‘optical character recognition,’ or if they don’t know that then they know that OCR is what you do to a scanned document to make it text-searchable. When you buy a new scanner like the Fujitsu ScanSnap it’ll come with OCR software, and most people get the idea that they should OCR all documents that they scan. I don’t recommend this, and don’t know many “paperless experts” who recommend it.
First of all, even if your scanner comes with standalone OCR software you should just use the OCR function built into Acrobat. Each new version of Acrobat has improved on this OCR functionality, so just use Acrobat. It’s always better to use one piece of software to do as many things as you can; it’ll save you learning time and troubleshooting headaches.
Now when should you use Acrobat to do OCR?
To understand when to use OCR you need to appreciate a couple of things about OCRing documents. First, the process takes a couple of seconds for each page that you want to OCR. And by “a couple of seconds” I mean between 5 and 10 seconds, depending on your computer. That doesn’t seem like much, but it is and here’s why.
You can’t scan in a new document while the last one is being OCR’d. So if you are scanning in your morning mail and you have 10 pieces of mail, you’d have to wait for each piece to be OCR’d before you could scan in the next one. Without doing OCR it might take you only 3 minutes to scan in those 10 pieces of mail (because you’ll scan them in quickly and then go back and name them and file them after all the scanning is done). If you do OCR the process could take closer to 10 minutes.
True, you could scan in all the pieces and then batch OCR them, and this could be done while you were at lunch or having a coffee break. But, then we get to the next problem with OCRing documents: it makes the file-sizes a little bigger. Again, no big deal if you’re talking about a few documents. But, over time you’ll be talking about lots of documents, and that will chew up storage space.
But what about searchability? Isn’t that the reason to OCR documents? And if you don’t OCR the mail you scan then you won’t be able to search it later, right?
Right, but the reality is that you won’t be searching for mail very often, if ever, using text searching. Novices embarking on the paperless path have no idea what their tendencies will be so they are loathe to forego any option that might seem helpful. Experts know that the only documents that we tend to search (and which are therefore worth OCRing) are documents that are produced or received in a litigation case.
The mail and routine correspondence simply isn’t worth the extra time (and extra file size) to justify OCRing.
When it comes to OCRing case documents you want to do a couple of things. First, you want to do the OCR as soon as possible after you’ve scanned them in. Create a copy of the file(s) you’re going to OCR as backup first, in case something blows up. Second, learn to OCR one document at a time before you take up “batch processing.”
Acrobat will probably prompt you to save your OCR’d document as new file with “OCR” appended to the file name. I don’t keep two versions of documents if I can avoid it. So I just overwrite the old one with the OCR’d version (and if the new file is free from corruption or problems I delete my copy that I created as a backup).
Once you get comfortable with OCRing one document at a time, you’ll want to learn to do “batch processing” so that you can OCR a folder full of documents. And you’ll want to do this at night or when you’ll be away from your computer for a long time. You’ll need to babysit the first part of the process to respond to any initial prompts (but your prior experience will make you aware of how many prompts you’ll receive, so you’ll know when it’s safe to leave).
In the past it was critical to OCR documents before attempting to bates-stamp them. In many versions of Acrobat that may still be true. I always OCR before bates-stamping as part of my workflow, and will continue to do this. I recommend you do the same to avoid any possible problems.
One last confession: I do sometimes OCR non-case documents, but only to solve a small annoyance. Sometimes the pages of a document that was scanned by someone else will be skewed to one side. I like my documents to be orderly and level, so I will run OCR on those documents to make them “straighten up.”
For more posts on OCRing, click here.