WIN32 I OCR components

OCR stands for Optical Character Recognition. This is where graphic characters are recognized in an image. Recognition is based on patterns and probabilities. This function can, for instance, be used if the control's text cannot be identified via the technical properties of the object.

Since both the quality of the source material as well as the technical constraints vary, there is no guarantee of success with this method.

  1. Enable the OCR component in the Settings dialog (see setting "UseOCR").

  2. In Tosca Wizard, use the property OCRText to identify the control. Simply copy the property from the Control Properties tab and define this in the Automation Properties tab as the Technical ID (see chapter "Control Properties").

OCRText property in Tosca Wizard

Configure the OCR settings if necessary. You can find them in the Settings dialog at Settings->Engine->OCR->Tesseract, or Settings->Engine->OCR->Textract (see chapter "Settings - OCR").

Every configuration parameter can be overwritten by a parameter in the Objectmap of a control. This is done by creating a new parameter with the prefix OCR and the corresponding configuration parameter. This value overwrites the value that has been specified in the Settings dialog.

OCR param on the Module level

Tesseract training

Under certain circumstances, it may be necessary to train Tesseract to enhance character recognition.

In the illustration below the OK button may possibly cause recognition problems if this is recognized as GK. Training can help Tesseract recognize it correctly.

Problematic button

Please contact Tricentis Support for further details on Tesseract training.