Text Files
All operations in this section describe how to create different tests for text files.
-
Load your text files.
-
Run your tests on the files. Upon testing, Tosca loads text files into an SQLite database.
-
Run Pre-Screening tests on your text files to prevent loading the wrong data.
The following Modules are included in the Tosca BI Modules And Samples.tsu subset and located in the folder Modules->BI - DWH/BigData:
-
File Load into Caching Database
-
Defined File Tests against Caching Database
-
Complete Row by Row Comparison
-
Metadata Comparison
-
JSON/XML File Load into Caching Database
File Load into caching database
Use the File Load into caching database Module to load the content of a text file into an SQLite database. You can load multiple files with the File Load into caching database Module.
Configure the path of the SQLite caching database in the Settings dialog in Tosca Commander (see chapter "Settings - Tosca BI").
This example loads the file SalesLT.Address.SemiColon.txt into the table Address. The columns of the file are separated by a semicolon. The first row contains the header names.
Loading character-separated file |
The File Load into caching database Module contains the following attributes:
ModuleAttribute |
Description |
Optional |
---|---|---|
Table Name |
The name of the SQLite table into which Tosca BI loads the data. |
|
Keep Existing Table Entries |
If set to True, the data of the text file is added to the existing table. Otherwise, the table is dropped before loading new data. |
X |
File Name |
Full file path and file name of the text file you want to load. You can use wildcards, if needed. For instance, you could enter C:\temp\myExcelFile*.csv. |
|
File Mode |
Defines whether the columns have a fixed length or are separated by commas. |
|
Column Delimiter/Length Info |
Indicates the starting point for each column in a comma-separated list or the separator symbol in character-separated files. |
|
Columns To Process |
Allows you to specify a semicolon-separated list of columns which should be loaded. By default, Tosca BI loads all columns. |
|
Column Renaming - <Current Name> |
Allows you to rename columns. The value in the Name column indicates the column to rename, the value in the Value column indicates the new name. |
|
Header Row |
Indicates with True whether the file has a header row. By default, there is no header row. In case the header is on a different row, specify the row number using #<n> instead of using True. Only single-row headers are supported. |
|
Skip Lines Starting With |
Allows you to specify a semicolon-separated list of values that are skipped. |
X |
Only Lines Starting With |
Allows you to define a semicolon-separated list of characters that indicate a valid row. E.g.: process lines that start with _,- or < are specified as _;-;< |
X |
Cell Settings - All Columns - Option |
Applies one of the actions below to all columns. |
X |
Cell Settings - Single Columns - <Name> |
Applies one of the actions below to a specified column. To specify a column, replace <Name> with the column name. |
X |
Load Error Behavior - Ignore Load Errors |
Defines whether load errors are ignored. The default value is False. |
X |
Load Error Behavior - Max Errors |
Maximum number of errors before Tosca aborts execution. The default value is 100. |
X |
Load Error Behavior - File Name |
Errors are logged to this file. A full path and file name are required. An existing file of the same name is overridden. By default, load errors are not logged. |
X |
Available actions for Cell Settings
The following actions are available for Cell Settings - All Columns and Cell Settings - Single Columns:
Action |
Description |
---|---|
Trim |
Removes all leading and trailing whitespace characters. Default scope: HeaderAndData |
Trim[<character>] |
Removes all leading and trailing occurrences of the specified character. Default scope: HeaderAndData Replace <character> with the character you want to remove. To remove a ", enter the " four times, e.g. Trim[""""]. |
TrimStart |
Removes all leading whitespace characters. Default scope: HeaderAndData |
TrimStart[<character>] |
Removes all leading occurrences of the specified character. Default scope: HeaderAndData Replace <character> with the character you want to remove. To remove a ", enter the " four times, e.g. Trim[""""]. |
TrimEnd |
Removes all trailing whitespace characters. Default scope: HeaderAndData |
TrimEnd[<character>] |
Removes all trailing occurrences of the specified character. Default scope: HeaderAndData Replace <character> with the character you want to remove. To remove a ", enter the " four times, e.g. Trim[""""]. |
Replace[<search string>][<replace string>] |
Replaces all occurrences of the first string with the second string. Default scope: Data |
Substring[<start index>] |
Extracts a part of a longer string. The extract starts at the defined start index position and runs until the end of the string. Default scope: Data For example: Substring[9] with input Project Manager returns Manager |
Substring[<start index>][<length>] |
Extracts a part of a longer string. The extract starts at the defined start index position and contains the number of characters specified in length. Default scope: Data For example: Substring[9][3] with input Project Manager returns Man |
Right[<length>] |
Extracts a part of a longer string. The extract runs from the end of the string towards the beginning and contains the number of characters specified in length. Default scope: Data For example: Right[7] with input Project Manager returns Manager |
Lowercase |
Converts the string to lowercase using the currently active locale. Default scope: Data For example: Lowercase with input Project Manager returns project manager |
Lowercase[Culture:<culture name>] |
Converts the string to all lowercase. Lowercase[Culture:<culture name>] uses the given culture name to create a new locale. Default scope: Data For example: Lowercase[Culture:zh-Hans] uses the culture information of "Chinese(simplified)" to convert uppercase to lowercase characters. |
Uppercase |
Converts the string to uppercase using the currently active locale. Default scope: Data For example: Uppercase with input Project Manager returns PROJECT MANAGER |
Uppercase[Culture:<culture name>] |
Converts the string to all uppercase. Uppercase[Culture:<culture name>] uses the given culture name to create a new locale. Default scope: Data For example: Lowercase[Culture:en-us] uses the culture information of "English - United States" to convert lowercase to uppercase characters. |
To change the default scope of an action, add a Scope parameter:
-
To apply the action to the header row only, add the parameter [Scope:Header]
-
To apply the action to all data rows but not the header, add the parameter [Scope:Data]
-
To apply the action to the header and all data rows, add the parameter [Scope:HeaderAndData]
We want to replace all instances of the string CustomerDataAustria with the string CustomerDataUSA. This change should be applied to the header and all data rows. To do so, we define the following action: Replace[<CustomerDataAustria>][<CustomerDataUSA>][Scope:HeaderAndData] |
Specify which File Load into caching database Module to use in your tests
If you have more than one File Load into caching database Module in your workspace, you need to specify which Module should be used in your tests. Otherwise, the system uses the first Module it finds.
This is also true if the File Load into caching database Modules have different names, since the system uses the configuration parameter SpecialExecutionTask to check for applicable Modules.
To specify a Module, you need to create a configuration parameter for it:
-
Right-click the respective Module and select Create Configuration Param from the mini toolbar.
Create a configuration parameter via the mini toolbar
-
Name the newly created configuration parameter BIWizardUseInAutocreate.
-
Set its Value to True.
You can switch Modules between tests by deleting the configuration parameter from one Module and adding it to another.
Make sure that only one File Load into caching database Module at a time has the BIWizardUseInAutocreate value set to True. Otherwise, the system uses the first Module it finds.
Defined File Tests against caching database
The Defined File Tests against caching database Module allows you to perform predefined tests on tables you created with the File Load into caching database Module.
The following table shows a mapping of the test types in the wizard and the respective test specified in the Module:
Wizard Test type |
Module Test |
---|---|
Has No Empty Values |
NoEmptyValue |
Field Type |
IsNumeric |
Min Value |
MinValue |
Max Value |
MaxValue |
Sum |
Sum |
Value Range |
ValueRange |
Min Length |
MinLength |
Max Length |
MaxLength |
Exact Length |
Length |
Is Unique |
IsUnique |
Row Count |
RowCount |
In addition, the following tests are available in the Module only:
-
GreaterThan
-
GreaterThanOrEqualsTo
-
LessThan
-
LessThanOrEqualsTo
-
Between
-
BetweenOrEquals
-
Occurance
-
DoesColumnNameExist
The Defined File Tests against caching database Module contains the following attributes:
ModuleAttribute |
Description |
---|---|
Table Name |
Name of the table which will be tested. |
Column |
Name of the column to analyze. All rows within that column are tested. |
Test |
The test used. |
Param |
Use this parameter when the test type requires an additional input, e.g. when doing a RowCount test, enter the number for the expected amount of rows. |
Value |
You can verify the overall result. If the test passes, the message is OK. This ModuleAttribute requires the ActionMode Verify. |
In this example, Tosca BI verifies the amount of rows of a column. The AddressID column in the Address table holds 450 rows.
Checking the row count |
You can find further examples in the Tosca BI Modules And Samples.tsu subset. The examples are located in the folder Sample tests->Sorted by use case->Pre Screening Tests->Against files. |