Text Pre-processing

Home  Previous  Next

Objective

The aim of this tool is to process multi-line, XML or HTML content and convert it to "line-by-line" text in order to be able to run string searches and/or numeric value searches on this content and then analyze and monitor the result.

Most often, it is not possible to run string or numeric value searches in multi-line, XML or HTML content. For instance, the output of commands is in paragraph form, and as it is not possible to perform normal string or numeric value searches on paragraphs, it limits the monitoring and analyzing capabilities of such content.

Monitoring Studio allows you to transform the paragraphs of the text into single lines for easy parsing with the String Search and Numeric Values Extraction tools. String Search and Numeric Value Extraction objects are then created from the Text Transform object (and not from the original parent object.

A brief example to illustrate the usefulness of this tool

Example of converting multi-line records to single lines

The "ipconfig /all" command under Windows reports various information about each network card, and each "paragraph" is about one network card:

EX_TextPreProcessCmd

The aim here is to detect any disconnected cards. So we add a monitoring instance for the OS command ipconfig/all. But as the text is in paragraphs, a direct string search will not get the desired result in this case – which is why we run the Text pre-processing tool to convert the multi-line text to single lines.

In the screenshot below, the "ipconfig /all" command is executed and its output is pre-processed to transform its paragraphs into single lines, which in turn enables an efficient parsing with a String search that looks for "disconnected" network cards.

EX_TextPreProcess_4aTreeView

Method (summary)

Right-click on the object whose content you require to preprocess in order to run string or numeric value searches on the output.

Select the type of pre-processing you wish to perform:

Convert multi-line records to single records
Convert XML to CSV i.e. comma separated values
Extraction of text from HTML
Text processing through an external command

Result

The text is transformed as per your selection. Text Transform objects have only one text parameter, TransformResult, which is the result of the text transformation done on the original content (a file, the output of a command, an Web request, etc.).

You can run string and/or numeric value searches on this "pre-processed" output.

Create or edit a Text Preprocessing object

To create a new Text Pre-Processing object, right-click the Application/Container icon in the PATROL Console and select KM Commands > New > Text Preprocessing....

WIZ_TextPreProcess_1Welcome

Text Pre-Processing Wizard — Welcome Page

To edit an existing Text preprocessing object, right-click the Text Preprocessing icon in the PATROL Console and select KM Commands > Edit.

A - Convert Multi-line records to single lines

Once you have defined the text source containing paragraphs that you need to parse (a LOG file, a Command Line analysis, etc.), right-click this object > KM Commands > New > Text Pre-Processing….

The example of the OS command shown above is taken here. In the output of the "ipconfig /all" command, we identify the paragraphs corresponding to a network interface.

Step 1: Select the "Convert multi-line records into single lines" option

WIZ_TextPreProcess_1WelcomeLine

Text Pre-Processing Wizard: Convert multi-line records into single lines — Conversion Type Selection

Click Next.

Step 2: Define the first and/or last lines of the paragraphs.

In our example each network card paragraph starts with "NEW" and ends with "END".

WIZ_TextPreProcess_2Definition

Text Pre-Processing Wizard: Convert multi-line records into single lines — Start/End of line definition

This RegExp marks the beginning of a new record: Enter the word/regular expression or string that marks the beginning of a new record. Include the first line in the result: Check the case to include the first line in the result.

NoteUse the first field if you can provide a regular expression that identifies the first line of each paragraph. Please note that this regular expression can match with any part of the first line of each paragraph.

This RegExp marks the end of a new record: Enter the word/regular expression or string that marks the end of a new record

Include the first line in the result: Check the case to include the last line in the result.

Note Use the second text field if you can provide a regular expression that identifies the last line of each paragraph. Again, this regular expression can match with any part of the last line of each paragraph.

Concatenation of multiple lines into a single line using this separator: Leave the semicolon to separate each record or enter the character you wish as a separator.

Click Next.

Note You can specify only a regular expression that identifies the beginning of a new paragraph (record). In this case, Monitoring Studio skips the content until it finds a line matching with the specified criteria. The text that follows this line (and optionally including this first line) is concatenated in a single line by using the specified separator, until Monitoring Studio finds another line that matches with the specified regular expression. Each line in the original content that matches with this regular expression produces a new line in the result content. The same is true for the regular expression that marks the end of a paragraph (or record).

Tip If you specify both regular expressions to identify the beginning and the end of a record, Monitoring Studio will only take into account the text content that is in between lines that matches these regular expressions (i.e. between the start line and the end line). Lines in the original text between a line matching the end marker and the next line matching the beginning marker will be skipped and not integrated in the text result.

Step 3: Monitoring Studio settings

You arrive at the last dialogue box with the newly-created object display name and internal identifier. You can change the label as well as the ID.

WIZ_TextPreProcess_3ObjectName

Text Pre-Processing Wizard: Convert multi-line records into single lines — Object Identification

Click Finish and at the next discovery, the parameter TransformResult will have the output.

Step 4: Run a string search on the "transformed" output and/or extract numeric values from it.

B - Convert XML to CSV (comma separated values) ( see example)

Right-click the file object > KM Commands > New > Text Pre-Processing….

Step 1: Select "Convert XML to CSV"

WIZ_TextPreProcess_1WelcomeCSV

Text Pre-Processing Wizard: Convert XML to CSV — Conversion Type Selection

Click Next.

Step 2: Define the record, the sub-objects and properties

WIZ_TextPreProcess_2DefinitionCSV

Text Pre-Processing Wizard: Convert XML to CSV — Object Definition

This XML tag defines a record: Enter the XML tag that defines the record

Include sub-objects and properties defined for the XML tag: Enter its sub-objects and properties

Concatenation of sub-objects and properties into a single line: Enter a separator to concatenate

Click Next.

Step 3: Monitoring Studio settings

You arrive at the last dialogue box with the newly-created object display name and internal identifier. You can change the label as well as the ID.

WIZ_TextPreProcess_3ObjectNameCSV

Text Pre-Processing Wizard: Convert XML to CSV — Object Identification

Click Finish and at the next discovery, the parameter TransformResult will have the output.

Step 4: Run a string search on the "transformed" output and/or extract numeric values from it.

C - Extract text from HTML

This allows you to extract text from an HTML source and then run string or numeric value searches on the output.

Right-click on a an instance with an HTML source/file (HTML file monitoring or web request, web farm etc) > New > Text pre-processing.

Step 1: Select "Extract text from HTML"

WIZ_TextPreProcess_1WelcomeHTML

Text Pre-Processing Wizard: Extract text from HTML — Conversion Type Selection

Click Next.

Step 2: Panel confirms extraction of text from HTML source

WIZ_TextPreProcess_2DefinitionHTML

Text Pre-Processing Wizard: Extract text from HTML — Extraction Confirmation

There are no options to select here, as Monitoring Studio is simply going to transform the HTML source by removing the HTML tags. It then displays the output in the parameter TransformResult.

Click Next.

Step 3: Monitoring Studio settings

WIZ_TextPreProcess_3ObjectNameHTML

Text Pre-Processing Wizard: Extract text from HTML — Object Identification

You arrive at the last dialogue box with the newly-created object display name and internal identifier. You can change the label as well as the ID. Click Finish and at the next discovery, the parameter TransformResult displays the output.

Step 4: Run a string search on the "transformed" output and/or extract numeric values from it.

D - Text processing through an external command

Some text inputs (files, output of commands, Web requests, etc.) may need to be transformed in a special way in order to be parsed with Monitoring Studio’s String Searches and Numeric Value Extractions. If the built-in text transformation features of Monitoring Studio cannot handle such "specially formatted" text, you then have the possibility to process the content through a custom script or utility that performs the required transformation (i.e. it simplifies to process the content and makes it "string search-ready").

The main advantage of processing the text through an external command feature is that it enables you to customize the processing of almost any source of information important to your application.

Right click the File/Command Line/Web request/etc. object > New > KM commands > Text Pre-processing…

Step 1: Select " Text processing through an external command "

WIZ_TextPreProcess_1WelcomeEXT

Text Pre-Processing Wizard: Text processing through an external command — Conversion Type Selection

Click Next.

Step 2: Specify OS command to be executed to transform the text

WIZ_TextPreProcess_2DefinitionEXT

Text Pre-Processing Wizard: Text processing through an external command — Command Definition

Command to be executed: Enter the command

Note The principle is very similar to the "pipe" mechanism of the UNIX shell except that the content is not passed directly but is stored in a temporary file and then the result needs to be stored in another temporary file.

NoteHence the command line you specify needs to take the %{INPUTFILE} macro as an argument (the %{INPUTFILE} macro is replaced by the real temporary input file location at run time) as well as %{OUTPUTFILE}.

The output of the command must match this RegExp to be considered as successful: Enter a RegExp to avoid typical path problems such as getting ""... not found" error messages instead of the properly transformed text

NoteIf your command line redirects its output to %{OUTPUTFILE}, the validation regular expression is likely to fail because the standard output is empty and thus matches with nothing. Use a validation regular expression only if your command line is able to produce both the %{OUTPUTFILE} and some text to its standard output.

Step 3: Monitoring Studio settings

WIZ_TextPreProcess_3ObjectNameEXT

Text Pre-Processing Wizard: Text processing through an external command — Object Identification

You arrive at the last dialogue box with the newly-created object display name and internal identifier. You can change the label as well as the ID. Click Finish and at the next discovery, the parameter TransformResult displays the output.

Step 4: Run a string search on the "transformed" output and/or extract numeric values from it.

All text preprocessing objects are instances of the SW_Transform class.


See Also

Command Line analysis

File monitoring and analysis

Numeric Value extraction

SW_HTTP_REQUESTS

SW_STRINGS