PDF ToolsRuns in your browser · files never uploaded

PDF OCR

Scanned PDF → searchable text

4.6· 51 votes

Your file never leaves this browser. Everything runs on your device — no uploads, no server storage, no retention.How it works →

PDF OCR

Run optical character recognition on a PDF or image, entirely in your browser — no upload, no third-party API. Uses Tesseract.js, downloads the selected language model once (~10 MB, cached for future runs) and then works fully offline. Rasterises PDFs at 2× for accuracy, shows per-page progress, and produces plain-text output ready for copy / download.

How to use

Drop an image or PDF.
Pick the language (English, Spanish, French, German, Italian, Portuguese).
Wait while the model downloads on first use, then watch per-page progress.
Copy or download the extracted text.

This tool requires JavaScript to interact. Enable JavaScript to use it in your browser.

Drop an image or PDF

JPG, PNG, WEBP or PDF · up to 50 MB · processed locally

LanguageModel downloads on first use (~10 MB per language, cached).

What is PDF OCR?

How do I use PDF OCR?

Drop an image or PDF.
Pick the language (English, Spanish, French, German, Italian, Portuguese).
Wait while the model downloads on first use, then watch per-page progress.
Copy or download the extracted text.

When should I use PDF OCR?

OCR is for getting text out of pictures of text. For rendering PDF pages as images (without extracting text), use PDF to JPG. For plain-text PDF → text, most PDFs have a real text layer and `pdf-lib` extraction is faster; OCR is the fallback for scans.

Frequently asked

What is OCR?

Optical Character Recognition — extracting machine-readable text from images or scanned documents. Turns a picture of text into text you can search, copy, or edit.

Which languages are supported?

English, Spanish, French, German, Italian, and Portuguese out of the box. Other languages can be added via the Tesseract language pack picker (downloaded on demand).

How accurate is it?

Very accurate on clean, high-contrast scans (95%+). Handwriting, low-res scans, and heavily rotated pages drop accuracy significantly — deskew your scan first for best results.

Does it run on-device?

Yes. Tesseract.js runs the full OCR pipeline in your browser. The language model downloads once (~10 MB per language) and is cached locally.

Is my file uploaded anywhere?

No. Everything runs in your browser. Your files never leave your device, and there is no server component for this tool.

Related in PDF Tools