Settings - OCR

The following settings can be found at Settings->Engine->OCR:

EngineName

Description

Defines the engine to be used for OCR.

Example

Tesseract

Tesseract

CharSet

Description

Indicates the character set to be used for recognition. This is particularly relevant for analyzing controls which can only display a limited number of characters. For controls that can only display IPv4 addresses the character set can, for instance, be limited to "0123456789". The same applies to date controls.

Example

0-9

a-z

A-Z

<>-+*#~!.:,I'"/\

ConfigFile

Description

Path to an optional configuration file for Tesseract. This file can for instance be used to specify a user dictionary.

Example

%TRICENTIS_ALLUSERS_APPDATA%\OCR

DumpImage

Description

If this setting is enabled, the result of the analyzed image is saved under %TRICENTIS_ALLUSERS_APPDATA%\OCR. This setting is solely used for debugging purposes.

Example

Off(0)

Flip

Description

Indicates whether an image is mirrored before recognition.

Example

Off(0)

Language

Description

Indicates the language to be used for recognition. The language code is determined by the abbreviation of a training file from the tessdata directory (see below). The training file used contains the sample for recognition and is thus crucial for the recognition rate. Individual characters that are not recognized correctly, e.g. an ö that is underlined, can be trained as described below. After successful training, the abbreviation of the new training file can now be entered here.

Example

eng

MonochromBrightnessFactor

Description

This setting is required in order to convert an image to a black and white image. The setting UseMonochrom must also be enabled. It indicates how dark the color of a pixel must be for it to be a black pixel in the resulting image. All the pixels whose value exceeds a specific brightness appear as white pixels in the resulting image. The lower the value is, the darker the pixel must be to be recognized as black. This setting can be used to eliminate an interfering (but bright) background and to deliver a pure black and white image to Tesseract.

Example

0.7

RemoveLineBreaks

Description

If this setting is set to 1, the line breaks (CRLFs) are removed from the recognized text.

Example

On(1)

Rotation

Description

Indicates the degree to which the image is to be rotated before analysis. For example, if the value 90 is entered, the text that is read is shown on the screen, rotated by 90 degrees.

Example

0

ScaleFactor

Description

Indicates the factor by which the image is to be enlarged before recognition. It is also possible to enter floating-point numbers.

Example

3

SegmentationMode

Description

Indicates which SegmentationMode is used. The SegmentationMode defines the way the characters are handled in the image to be analyzed. The value is set to 7 by default. This means that the recognition of a text that only has one line is optimized (which is the case for most labels of controls such as buttons etc.).

The following SegmentationModes are available to Tesseract:

OsdOnly = 0

AutmaticSegmentationWithOsd = 1

AutomaticSegmentationWithoutOsd = 3

SingleColumnVariableTextSize = 4

UniformTextBlockVertical = 5

UniformTextBlock = 6

SingleLine = 7

SingleWord = 8

SingleWorldInCircle = 9

SingleCharacter = 10

Example

Single Line(7)

SetCharSpacing

Description

If this value is greater than 0, an attempt is made to set the spacing between the characters to the value specified. The new value in pixels is then equal to the new character spacing. For this setting to work, only those images should be delivered, which are free of icons and additional graphic elements such as frames or lines. The MonochromBrightnessFactor setting is also used here to recognize which pixels are part of a character (black pixels).

Example

-1

UseInversion

Description

Inverts an image before analysis (color values are converted to their inverse). Black is thus white (and vice versa).

Example

Off(0)

UseMonochrom

Description

Indicates whether an image is to be converted to a black and white image before analysis.

Example

On(1)

Textract

DumpImage

Description

If this is enabled, the analyzed image is saved with the result under %TRICENTIS_ALLUSERS_APPDATA%\OCR. This setting is used exclusively for the debugging purposes.

Example

Off(0)