Tesseract: Hacking Tesseract V0.04

By the way, if you define TEXT_PROGRESS you will get a period ('.') when tesseract finds a seam between words, which gives you a good idea that it DID NOT hang.

If you ALSO define TEXT_VERBOSE, key functions in tesseract will print one character that shows you what is going on, ie: what is tesseract doing at any point. See next section for what those letters are and what they mean.

There is also a separate file that has Stack traces for some interesting/common functions RUNNING, see How Tesseract Works: Procedure stack traces Procedure stack traces. Together with TEXT_VERBOSE, these will give you a way to play with tesseract without neccessarily being a C++ wizard, per se :-)

What do all those letters for TEXT_VERBOSE mean?

The input text was the tesseract License (See testing/Run_Tests.sh for more details):

This package contains the Tesseract Open Source OCR Engine.
Orignally developed at Hewlett Packard Laboratories Bristol and
at Hewlett Packard Co, Greeley Colorado, all the code
in this distribution is now licensed under the Apache License:

** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.

Again, please note that different output is generated using different fonts because the letters in the image will 'interfere' differently and the word-spacing will differ. Also, different fonts have different features so that phase will also differ!

[blah blah]

(gdb) r
gkTesseract Open Source OCR Engine
Using LIBTIFF
Opened and reading 'testing/image_2helvR18.tif'...
Recognizing page
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeer
lqmmmmmmmmmmmmmnxlmmmmmmmmmmmmmlllmmmmmmmmmmmmmummmmmmmmmmmmmjtttttttttttttt
tttttttttttttttttttttttttttttttr
pppoooopppoooooopppooopppppppppspppppppppppppppppppppopppooopppsppppppoor
pppspppppppppppppppooopppppppppsppppppppppppspppspppppppppppppppoopppspppppp
ppppppsspppr
pppppppppsppppppppppppspppspppppppppppppppspppoopppoopppsspppppppppppppppppp
pppr
pppppppppppppppopppppppppppppppppppppr
pppppppppspppppppppppppppppppppppppppopppoopppppppppor
pppppppppopppppppppppppppppppppopppsppppppopppppppppppppppppppppr
ppppppsoppppppppppppppppppppppppppppppppppppr
ppppppspppppppppppppppppppppppppppppppppppppppsssspppppppppppppppppppppppppp
ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppr
ppppppopppoppppppppppppppppppppppppppppppsppppppspppr
pppppppppspppppppppppppppppppppppppppppppppppppppppppppppppppooor
ppppppoopppooppppppopppopppppppppoppppppspppspppppppppppppppr
pppppppppppppppppppppspppppppppppppppppppppsspppr
pppppppppppppppppphhhpppssppphpppspppsssspppppppppppppppppphhhpppsppphhpppss
ppppppppphhpppsppphpppspppsspppppphpppspppssspppppppppppppppppphhpppsppppppp
pphpppsssppphpppsppphpppspppsspppppphpppspppssspppppppppppppppppphpppsppphpp
phpppssspppppphpppsppphpppssppphhhppphhppphhppphhpppssppphpppssssppphhppphhp
ppssssppppppppphpppssssppphpppssppphhhpppssppphhppphhpppshpppppphpppssppphpp
phpppssppphpppspppppphhhpppppphpppssppphhppphhpppshhpppsppphhpppppphpppssppp
hhpppsppphppphpppsppppppppppppppppppppppppppppppppppppspppsppppppppppppsssss
sssssspppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppphppphhhh
hpppsppphhhpppppphhpppsppphhppphhpppssssppppppppphhppphhpppppphpppsppphpppsp
pphpppppphpppsppphppphhhhhhhpppppphhpppspppshhhppphpppsssssppphpppssppphhppp
shpppssppphhppphhhpppsssppphppphppphhpppssppphhzzzpppspppspppsspppsspppssppp
sppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
ppppppppppppppphzzzpppspppspppspppsppppppppppppppppppppppppppppppppppppppppp
ppppppphzzzpppspppsssppppppppppppppppppppppppppphzzzpppspppspppspppspppppppp
pppppppppppppppppppppppppppppppppppppppphzzzzzzppppppppphzzzppphpppssppphzzz
pppspppsspppsspppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
ppppppppppppppppppppppppphzppppppppppppppppppppppppppppppppppppppppppppppppp
pppppppppppppppppppppppppppppppppppppppppppphzzzpppspppspppppppppppppppppppp
phzzzpppsppppppppppppppphzzzpppssppphzzzpppsspppppphzzzpppspppsssspppppppppp
pppppppphzzzpppspppssssppppppppphzpppspppsssppppppppppppppppppppppppppphzzpp
phpppssppphzzzzpppsppppppppppppppphzzzpppspppppppppppppppppppppppphzzzpppspp
ppppppppppppppppppppppppppppppphzzzzzzpppspppspppppppppppppppppppppppppppppp
pppppppppppppppppphzzzpppspppppppppppppppppppppppppppppppppppppppppppppppppp
ppppppphzzvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvy
vyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvy
vyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvyvy
vyvyvyvyvyvyvyvyvyvyvyvyvyvya

Program exited normally.

Hacking Tesseract V0.05

1.03f4

Introduction to hacking Tesseract v1.04

How Tesseract Works: What's going on

Entry Points

Heuristics

Tips and Hints

Segmenters

Working out how Tesseract works

What do all those letters for TEXT_VERBOSE mean?

Links to utilities/projects/fun hacks for tesseract-ocr files