dockerfiles/tesseract
kev 6eb55fbf39 update tesseract 2022-02-16 17:17:58 +08:00
..
data update tesseract 2019-12-07 13:15:29 +08:00
Dockerfile update tesseract 2022-02-16 17:17:58 +08:00
README.md update tesseract 2022-02-16 17:17:58 +08:00
docker-compose.yml update tesseract 2022-02-16 17:17:58 +08:00

tesseract

Tesseract is an Open Source OCR engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API. It supports a wide variety of languages.

Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page.

Quick Start

$ alias tesseract='docker run --rm -u $(id -u):$(id -g) -v `pwd`:/data -w /data vimagick/tesseract'

$ tesseract input.png output -l eng --psm 3
$ cat output.txt
The (quick) [brown] {fox} jumps!

$ tesseract chinese.jpg stdout -l chi_tra --psm 8 --oem 0
學習