Text Files

To create different tests for text files, you have to perform the following steps:

  1. Load your text files.

  2. Run your tests on the files. Upon testing, Tosca loads text files into an SQLite database.

  3. Run Pre-Screening tests on your text files to prevent loading the wrong data.

To do so, you can use the following Modules which are included in the Tosca Data Integrity Modules And Samples.tsu subset and located in the folder Modules->Data Integrity Testing:

  • File Load into Caching Database

  • Defined File Tests against Caching Database

  • Complete Row by Row Comparison

  • Metadata Comparison

  • JSON/XML File Load into Caching Database

File Load into Caching database

Use the File Load into Caching Database Module to load the content of a text file into an SQLite database. You can use this Module to load multiple files.

Make sure to configure the path of the SQLite caching database in the Settings dialog in Tosca Commander (see chapter "Settings - Tosca Data Integrity").

In this example, you load the file SalesLT.Address.SemiColon.txt into the table Address.

The columns of the file are separated by a semicolon. The first row contains the header names.

Load a character-separated file

The File Load into Caching Database Module contains the following attributes:

ModuleAttribute

Description

Optional

Table Name

Enter the name of the SQLite table into which Tosca DI loads the data.

 

Keep Existing Table Entries

Set to True to add the data of the text file to the existing table. Otherwise, Tosca DI drops the table before loading new data.

X

File Name

Enter the full file path and file name of the text file you want to load. You can load multiple files from the same directory into the same table.

You can use wildcards, if needed. For instance, you could enter C:\temp\myExcelFile*.csv.

 

File Mode

Define whether the columns have a fixed length or are separated by commas.

 

Column Delimiter/Length Info

Indicate the starting point for each column in a comma-separated list or the separator symbol in character-separated files.

 

Row Separator

Specify the character that indicates a new row. Use either Windows \r\n or Unix \n.

The default value is \r\n or \n.

X

Columns To Process

Specify a semicolon-separated list of columns that should be loaded.

By default, Tosca DI loads all columns.

X

Column Renaming

Rename columns via file.

Specify the full file path to a text or CSV file with the column mappings. The file has to start with the header row Current Column Name;Mapped Column Name followed by one line for each column that you want to rename. Example:

Current Column Name;Mapped Column Name

Name1;First Name

Name2;Last Name

X

Column Renaming - <Current Name>

Rename columns manually.

In the Name column, specify the name of the column you want to rename. In the Value column, specify the new name.

X

Header Row

Set to True to indicate whether the file has a header row. By default, there is no header row.

In case the header is on a different row, specify the row number using #<n> instead of True.

Only single-row headers are supported.

X

Skip Lines Starting With

Specify a semicolon-separated list of values that should be skipped.

X

Only Lines Starting With

Define a semicolon-separated list of characters that indicate a valid row.

For instance, process lines that start with _,- or < are specified as _;-;<

X

Cell Settings - All Columns - Option

Apply one of the actions described below to all columns.

X

Cell Settings - Single Columns - <Name>

Apply one of the actions described below to a specified column. To specify a column, replace <Name> with the column name.

X

Load Error Behavior - Ignore Load Errors

Define whether load errors are ignored.

The default value is False.

X

Load Error Behavior - Max Errors

Define the maximum number of errors before Tosca aborts execution.

The default value is 100.

X

Load Error Behavior - File Name

Specify the file that errors are logged to. Enter a full path and file name. Tosca DI overwrites any existing file with the same name.

By default, Tosca DI doesn't log load errors.

X

Actions for Cell Settings

You can use the following actions Cell Settings - All Columns and Cell Settings - Single Columns:

Action

Description

Trim

Remove all leading and trailing whitespace characters.

Default scope: HeaderAndData

Trim[<character>]

Remove all leading and trailing occurrences of the specified character.

Default scope: HeaderAndData

Replace <character> with the character you want to remove.

For example: To remove a ", enter the " four times: Trim[""""].

TrimStart

Remove all leading whitespace characters.

Default scope: HeaderAndData

TrimStart[<character>]

Remove all leading occurrences of the specified character.

Default scope: HeaderAndData

Replace <character> with the character you want to remove.

For example: To remove a ", enter the " four times: Trim[""""].

TrimEnd

Remove all trailing whitespace characters.

Default scope: HeaderAndData

TrimEnd[<character>]

Remove all trailing occurrences of the specified character.

Default scope: HeaderAndData

Replace <character> with the character you want to remove.

For example: To remove a ", enter the " four times: Trim[""""].

Replace[<search string>][<replace string>]

Replace all occurrences of the first string with the second string.

Default scope: Data

Substring[<start index>]

Extract a part of a longer string. The extract starts at the defined start index position and runs until the end of the string.

Default scope: Data

For example: Substring[9] with input Project Manager returns Manager.

Substring[<start index>][<length>]

Extract a part of a longer string. The extract starts at the defined start index position and contains the number of characters specified in length.

Default scope: Data

For example: Substring[9][3] with input Project Manager returns Man.

Right[<length>]

Extract a part of a longer string. The extract runs from the end of the string towards the beginning and contains the number of characters specified in length.

Default scope: Data

For example: Right[7] with input Project Manager returns Manager.

Lowercase

Convert the string to lowercase using the currently active locale.

Default scope: Data

For example: Lowercase with input Project Manager returns project manager.

Lowercase[Culture:<culture name>]

Convert the string to all lowercase. Lowercase[Culture:<culture name>] uses the given culture name to create a new locale.

Default scope: Data

For example: Lowercase[Culture:zh-Hans] uses the culture information of "Chinese(simplified)" to convert uppercase to lowercase characters.

Uppercase

Convert the string to uppercase using the currently active locale.

Default scope: Data

For example: Uppercase with input Project Manager returns PROJECT MANAGER.

Uppercase[Culture:<culture name>]

Convert the string to all uppercase. Uppercase[Culture:<culture name>] uses the given culture name to create a new locale.

Default scope: Data

For example: Lowercase[Culture:en-us] uses the culture information of "English - United States" to convert lowercase to uppercase characters.

To change the default scope of an action, you can add a Scope parameter:

  • To apply the action to the header row only, add the parameter [Scope:Header].

  • To apply the action to all data rows but not the header, add the parameter [Scope:Data].

  • To apply the action to the header and all data rows, add the parameter [Scope:HeaderAndData] .

In this example, you want to replace all instances of the string CustomerDataAustria with the string CustomerDataUSA.

You want to apply this change to the header and all data rows. To do so, you define the following action:

Replace[<CustomerDataAustria>][<CustomerDataUSA>][Scope:HeaderAndData]

Specify which Module to use in your tests

If you have more than one File Load into Caching Database Module in your workspace, you need to specify which Module to use in your tests. Otherwise, the system uses the first Module it finds.

This also applies, if the File Load into Caching Database Modules have different names. Tosca DI does not use the name to check for applicable Modules but the configuration parameter SpecialExecutionTask.

To specify the Module you want to use, you have to create a configuration parameter for it. To do so, follow the steps below: 

  1. Right-click the Module and select Create Configuration Param from the mini toolbar.

Create a configuration parameter via the mini toolbar

  1. Name the new configuration parameter DIWizardUseInAutocreate.

  2. Set its Value to True.

To switch Modules between tests, delete the configuration parameter from one Module and add it to another.

Make sure that only one File Load into Caching Database Module at a time has the DIWizardUseInAutocreate value set to True. Otherwise, Tosca DI uses the first Module it finds.

Defined File Tests against Caching Database

The Defined File Tests against Caching Database Module allows you to perform predefined tests on tables you created with the File Load into Daching Database Module.

The following table shows the test types in the Wizard and the respective test specified in the Module:

Wizard Test Type

Module Test

Has No Empty Values

NoEmptyValue

Field Type

IsNumeric

Min Value

MinValue

Max Value

MaxValue

Sum

Sum

Value Range

ValueRange

Min Length

MinLength

Max Length

MaxLength

Exact Length

Length

Is Unique

IsUnique

Row Count

RowCount

In addition, the following tests are available in the Module only:

  • GreaterThan

  • GreaterThanOrEqualsTo

  • LessThan

  • LessThanOrEqualsTo

  • Between

  • BetweenOrEquals

  • Occurance

  • DoesColumnNameExist

The Defined File Tests against Caching Database Module contains the following attributes:

ModuleAttribute

Description

Table Name

Specify the name of the table you want to test.

Column

Specify the name of the column you want to analyze.

Tosca DI tests all rows within that column.

Test

Specify which test you want to use.

Param

Use this parameter if the test type requires an additional input.

For example: For a RowCount test, enter the number for the expected amount of rows.

Value

Verify the overall result. If the test passes, the message is OK.

This ModuleAttribute requires the ActionMode Verify.

In this example, Tosca DI verifies the amount of rows of a column.

The AddressID column in the Address table contains 450 rows.

Check the row count

You can find further examples in the Tosca Data Integrity Modules And Samples.tsu subset.

The examples are located in the folder Sample tests->Sorted by use case->Pre Screening Tests->Against files.