What Is OCR for Receipts and Invoices?
When you scan a paper receipt or invoice, the resulting PDF is an image — your computer sees pixels, not text. That means you cannot search it, copy an amount from it, or import it into QuickBooks, Xero, or any other accounting software without retyping everything by hand.
OCR (Optical Character Recognition) reads the image and converts each printed character into real, selectable text. After running OCR, the PDF looks identical but now contains a hidden text layer — every vendor name, date, line item, and total becomes copyable and searchable. This is the first step in any paperless accounting workflow.
- 1.Expense reports. Copy receipt amounts directly into your expense report instead of squinting at a faded thermal printout and typing numbers manually.
- 2.Accounts payable. OCR extracts invoice numbers, vendor names, amounts, and due dates from scanned supplier invoices — eliminating manual data entry and the errors that come with it.
- 3.Tax preparation. Accountants and bookkeepers scan boxes of receipts at year-end. OCR makes every document searchable by vendor, date, or amount — so finding the Home Depot receipt from March takes seconds, not 20 minutes.
- 4.Audit trails. Financial auditors need to reference source documents quickly. Searchable PDFs satisfy audit requirements while saving hours of manual document retrieval.
For a broader introduction to how OCR works, see our guide on What Is OCR.
How to OCR Receipts and Invoices (Step by Step)
Scan or photograph the receipt or invoice
Use a flatbed scanner at 300 DPI, or photograph the document with your phone. Save it as a PDF. For phone scans, run the file through Phone Scan Cleanup first to flatten contrast and remove shadows.
Upload to OCR Scanner and run OCR
Open PDF.it's OCR Scanner tool, upload your scanned PDF, select the document language, and click the OCR button to start text recognition.
Copy or export the extracted text
Download your searchable PDF. Open it and use Ctrl+F or Cmd+F to search for amounts, vendor names, or dates. Convert to Excel or Word for direct import into your accounting software.
Manual Entry vs. OCR vs. Native Digital Invoice
| Method | Time per Document | Error Risk | Searchable |
|---|---|---|---|
| Manual data entry | 3–10 minutes | High (typos, transpositions) | No |
| OCR (scanned PDF) | Under 30 seconds | Low (verify totals) | Yes |
| Native digital PDF | Instant (no OCR needed) | None | Yes |
If a supplier sends you a PDF invoice by email that was generated by their software (not scanned), it already has selectable text. Run OCR only on documents that started as paper or were photographed.
Getting the Best Scan Quality for Receipts
Thermal receipt paper — the shiny paper most cash register receipts are printed on — fades within months and is notoriously difficult to photograph cleanly. These tips make a significant difference:
- ✓ Scan thermal receipts within a few weeks of purchase while the ink is still dark. Faded receipts reduce OCR accuracy significantly.
- ✓ Use a flatbed scanner at 300 DPI for the most consistent results. Phone cameras introduce perspective distortion and uneven lighting, especially on curled receipts.
- ✓ Place the receipt flat. Curl the edges down or place a light book on top for 30 seconds before scanning. Shadows from curled edges cause OCR misreads.
- ✓ Run phone-scanned receipts through Phone Scan Cleanup before OCR. This tool automatically flattens contrast, removes background shadows, and straightens the image.
For deeper guidance on scan quality, see our OCR Accuracy Tips guide.
Troubleshooting Common OCR Problems
Numbers are being misread (8 becomes 0, 1 becomes I)
This is caused by low scan resolution or a faded original. Rescan at 300 DPI or higher. If you are working from a phone photo, run the file through Phone Scan Cleanup before re-running OCR. Always verify totals against the original before entering them in your accounting software.
OCR produced garbled text on part of the page
Garbled output usually means that section of the scan had a shadow, fold, or stain obscuring the text. Check the original image: if you can read the problem area by eye, the scan was the issue. Rescan with better lighting or use your phone's built-in document scanner (Notes on iPhone, Google Drive on Android) which applies automatic perspective correction.
The PDF already looks correct but text is still not selectable
Some PDFs are locked with restrictions that prevent text selection even after OCR. Use Unlock PDF to remove the restriction, then re-run OCR Scanner. If the file has no password, the "no text" behavior simply means it is image-based — OCR will fix it.