Scan PDF files

When you design your Bot, you use the following types of Modules (see "Use Modules"):

This chapter describes how to create Modules by scanning PDF files.

During the scan, you select text, images, or tables in your PDF file. RPA Studio turns the selected element into a ModuleAttribute.

In the Steps that you create from the Module, you can then perform the following actions:

Open the scan interface

To open the PDF Scan interface, follow the steps below:

  1. In RPA Studio, right-click a Bot or a folder and select Scan from the context menu, or click on Scan in the RPA menu.

  2. In the subsequent dialog, click on PDF document.

  3. In the next dialog, select the PDF file that you want to scan and click on Open.

You can also access the scan in the following ways:

This opens the PDF scan window:

PDF Scan interface with the PDF preview on the left

Create Modules

You create Modules by scanning the PDF file and adding the controls .

You can create controls using following two ways:

Specify a control by absolute position

To scan your PDF file by specifying an area based on its absolute position and create a Module, follow the steps below:

  1. In the PDF Scan window, select what you want to steer:

    • To select text, click on the menu button Text.

    • To select an image, click on the menu button Image.

    • To select a table, click on the menu button Table.

  2. In the PDF preview, highlight what you want to steer with your mouse.

    The PDF Scan turns the highlighted area into a ModuleAttribute and displays it on the right side of the window. To change the default name, double-click it and enter a new name.

    If you chose a wrong type for your ModuleAttribute, for instance Table instead of Image, right-click the highlighted area in the PDF view. Then select the correct type from the context menu.

New ModuleAttribute HeaderLine

  1. For the type Table, perform the following, additional actions:

    • Open the Content View by clicking on the menu button Show Content Preview.

    • Specify row or column headers, if applicable.

    • Adjust cell margins with the slider Adjust cell borders, if necessary.

Content View for a table

  1. If you want to scan another PDF file, click on Scan New Document. Then select the file in the subsequent dialog and repeat the scan process.

  2. To save your Module, click Close.

Specify a control by relative position

You can create a control based on its position relative to another page element. To do so, you need to define the following elements:

  • The anchor control, which is the page element you use to define the position of the target control.

  • The target control, which is the element you want to steer by its relative position to the anchor control.

RPA Studio identifies the anchor control by searching the PDF document for the text or image that you define. Once it locates the anchor, it calculates the location of the target in relation to the anchor.

If you want to use anchor control, note the following requirements:

  • Target and anchor need to be on the same page.

  • The anchor control has to be uniquely identifiable in the PDF document.

  • You can't use a table or a repetitive area as an anchor control.

  • Once you have turned an area into an anchor control, you cannot change it back to an ordinary area.

To create anchor controls, follow the steps below:

  1. In the PDF Scan window, specify at least two areas on the same page by their absolute position.

  2. In the PDF View, right-click the control that you want to use as the anchor and select one of the following options in the context menu:

    • To use it as a text anchor, select Use as Text Anchor for...

    • To use it as an image anchor, select Use as Image Anchor for...

The PDF Scan indicates the relation between two controls with an arrow pointing from the anchor to the target.

Use HeaderLine as the anchor control for SampleTextLine

Configure the accuracy for finding anchor controls

To identify a target control, RPA Studio searches for an exact match of the text or image that you defined as the anchor. You can configure a lower level of accuracy for finding the anchor control in the following cases:

  • You specify an image-type control as anchor.

  • You specify a text-type control but the underlying area is stored in the PDF file as an image. In this case, RPA Studio uses optical character recognition (OCR) to recognize text, which converts an image to text format by recognizing typed characters.

To configure the accuracy for finding an anchor control, follow the steps below:

  1. In RPA Studio, select the Module whose accuracy you want to configure.

  2. Click the ModuleAttribute that represents the anchor control.

  3. Expand the Properties pane by clicking on the Properties button on the top right corner of your window.

  4. In the Properties pane, set the value of the parameter Accuracy to an integer between 0 to 100. This number represents the accuracy level of the match.

Edit anchor controls

To edit an anchor, right-click the area you have turned into anchor control in the PDF Scan window. In the subsequent dialog box, choose one of the following options:

  • To change the type of an anchor control, select Text Anchor or Image Anchor.

  • To add a target control to the anchor control, select Add/Remove target controls and select the target control that you want to add.

  • To remove a target control from the anchor control, select Add/Remove target controls and select the target control that you want to remove.

Scan repetitive areas

You can scan the same area on multiple pages of a document. For instance if you want to check that the same area on all pages contains a certain text.

To do so, follow the steps below:

  1. Highlight the respective area on the first page on which it appears.

  2. Right-click the highlighted area in the PDF view and select Repetitive Area from the context menu.

When you test your Bot, the Bot checks the content of the scanned area on the first page on which it appears as well as on all subsequent pages.

Result of the Step Check Footer

Additional options

Additionally, the PDF Scan menu offers the following options:

Option

Description

Zoom Fit

Adapt the PDF preview to the screen size.

Select Document Language

Select a document language. This is important if the text in your PDF contains language-specific special characters.

Rescan a PDF

You can rescan a PDF if you want to add new controls to an existing Module, or if you want to modify controls.

To do so, right-click the Module that you want to rescan, select Rescan from the context menu, and create modules.

If you have modified controls, the rescan overwrites the existing ModuleAttribute. For each new control that you add during the rescan, RPA Studio creates a new ModuleAttribute.

Note the following:

  • You can't delete ModuleAttributes during a rescan.

  • You cannot rescan Modules which contain anchor controls.

  • You cannot create anchor controls during rescan.

Was this information helpful?

Tricentis RPA Studio Manual 2020.2 © Tricentis GmbH