You are here:
|
Scan PDF files
When you design your Bot, you use the following types of Modules (see "Use Modules"):
-
Modules provided in the RPA subset
-
Modules you create by scanning
This chapter describes how to create Modules by scanning PDF files.
During the scan, you select text, images, or tables in your PDF file. RPA Studio turns the selected element into a ModuleAttribute.
In the Steps that you create from the Module, you can then perform the following actions:
Open the scan interface
To open the PDF Scan interface, follow the steps below:
-
In RPA Studio, right-click a Bot or a folder and select Scan from the context menu, or click on Scan in the RPA menu.
-
In the subsequent dialog, click on PDF document.
-
In the next dialog, select the PDF file that you want to scan and click on Open.
You can also access the scan in the following ways:
-
Via the Add dialog as described in "Scan your application and add the Module as a Step".
-
When you exchange Placeholders with Steps as described in "Create Placeholders".
This opens the PDF scan window:
PDF Scan interface with the PDF preview on the left
Create Modules
You create Modules by scanning the PDF file and adding the controls .
You can create controls using following two ways:
-
Specify a control using its absolute position, i.e. its location in your PDF file.
-
Specify a control using its relative position, i.e. its location in relation to an anchor element in your PDF file.
-
Specify a control by highlighting the repetitive area.
Specify a control by absolute position
To scan your PDF file by specifying an area based on its absolute position and create a Module, follow the steps below:
-
In the PDF Scan window, select what you want to steer:
-
To select text, click on the menu button Text.
-
To select an image, click on the menu button Image.
-
To select a table, click on the menu button Table.
-
-
In the PDF preview, highlight what you want to steer with your mouse.
The PDF Scan turns the highlighted area into a ModuleAttribute and displays it on the right side of the window. To change the default name, double-click it and enter a new name.
If you chose a wrong type for your ModuleAttribute, for instance Table instead of Image, right-click the highlighted area in the PDF view. Then select the correct type from the context menu.
New ModuleAttribute HeaderLine
-
For the type Table, perform the following, additional actions:
-
Open the Content View by clicking on the menu button Show Content Preview.
-
Specify row or column headers, if applicable.
-
Adjust cell margins with the slider Adjust cell borders, if necessary.
-
Content View for a table
-
If you want to scan another PDF file, click on Scan New Document. Then select the file in the subsequent dialog and repeat the scan process.
-
To save your Module, click Close.
Specify a control by relative position
You can create a control based on its position relative to another page element. To do so, you need to define the following elements:
-
The anchor control, which is the page element you use to define the position of the target control.
-
The target control, which is the element you want to steer by its relative position to the anchor control.
RPA Studio identifies the anchor control by searching the PDF document for the text or image that you define. Once it locates the anchor, it calculates the location of the target in relation to the anchor.
If you want to use anchor control, note the following requirements:
-
Target and anchor need to be on the same page.
-
The anchor control has to be uniquely identifiable in the PDF document.
-
You can't use a table or a repetitive area as an anchor control.
- Once you have turned an area into an anchor control, you cannot change it back to an ordinary area.
To create anchor controls, follow the steps below:
-
In the PDF Scan window, specify at least two areas on the same page by their absolute position.
-
In the PDF View, right-click the control that you want to use as the anchor and select one of the following options in the context menu:
-
To use it as a text anchor, select Use as Text Anchor for...
-
To use it as an image anchor, select Use as Image Anchor for...
-
The PDF Scan indicates the relation between two controls with an arrow pointing from the anchor to the target.
Use HeaderLine as the anchor control for SampleTextLine
Configure the accuracy for finding anchor controls
To identify a target control, RPA Studio searches for an exact match of the text or image that you defined as the anchor. You can configure a lower level of accuracy for finding the anchor control in the following cases:
-
You specify an image-type control as anchor.
-
You specify a text-type control but the underlying area is stored in the PDF file as an image. In this case, RPA Studio uses optical character recognition (OCR) to recognize text, which converts an image to text format by recognizing typed characters.
To configure the accuracy for finding an anchor control, follow the steps below:
-
In RPA Studio, select the Module whose accuracy you want to configure.
-
Click the ModuleAttribute that represents the anchor control.
-
Expand the Properties pane by clicking on the Properties button on the top right corner of your window.
-
In the Properties pane, set the value of the parameter Accuracy to an integer between 0 to 100. This number represents the accuracy level of the match.
Edit anchor controls
To edit an anchor, right-click the area you have turned into anchor control in the PDF Scan window. In the subsequent dialog box, choose one of the following options:
-
To change the type of an anchor control, select Text Anchor or Image Anchor.
-
To add a target control to the anchor control, select Add/Remove target controls and select the target control that you want to add.
-
To remove a target control from the anchor control, select Add/Remove target controls and select the target control that you want to remove.
Scan repetitive areas
You can scan the same area on multiple pages of a document. For instance if you want to check that the same area on all pages contains a certain text.
To do so, follow the steps below:
-
Highlight the respective area on the first page on which it appears.
-
Right-click the highlighted area in the PDF view and select Repetitive Area from the context menu.
When you test your Bot, the Bot checks the content of the scanned area on the first page on which it appears as well as on all subsequent pages.
Result of the Step Check Footer
Additional options
Additionally, the PDF Scan menu offers the following options:
Option |
Description |
---|---|
Zoom Fit |
Adapt the PDF preview to the screen size. |
Select Document Language |
Select a document language. This is important if the text in your PDF contains language-specific special characters. |
Rescan a PDF
You can rescan a PDF if you want to add new controls to an existing Module, or if you want to modify controls.
To do so, right-click the Module that you want to rescan, select Rescan from the context menu, and create modules.
If you have modified controls, the rescan overwrites the existing ModuleAttribute. For each new control that you add during the rescan, RPA Studio creates a new ModuleAttribute.
Note the following:
-
You can't delete ModuleAttributes during a rescan.
-
You cannot rescan Modules which contain anchor controls.
-
You cannot create anchor controls during rescan.