|
Create an Advanced ezParse Rule |
Select File Types & Parsing Rules from the HOME ribbon.
From the File groups list control, select Text Based Files. Select TXT from the File extension list, then select Add from the Rules section of this dialog panel . Type ID Based Rule when requested to do so and then press OK.
Now that you’ve created a Rule called ID Based Rule, double-click it, or highlight the rule and select Edit Rule.
In the resulting dialog, we want to enter a value for the Start Tag and one for the End Tag.
The Start Tag expression we wish to add is ^[^= ]* = " (see below for an explanation)
The End Tag should be "
Syntax |
Description |
^ |
This is the regular expression syntax for the beginning of a segment |
[^= ] |
The square brackets indicate a range of characters. The ^ inside the square brackets indicate a range of excluded characters. i.e. read this as any character excluding an equals or a space. |
[^= ]* |
The asterisk indicates any number of times. i.e. read this as any number of characters excluding an equals or a space. |
= " |
This matches simple text space, equals, space, followed by a double quote. |
In the File Preview section, browse to the file IDBasedFiles.txt and press Preview.
The pink colour code indicates a section of text identified as the Start Tag. The green colour code indicates the localizable text and the yellow colour code indicates the End Tag.
The next step is to set the ID. The pink colour coded text contains the ID. To indicate to CATALYST which part of the StartTag is the ID, click on the Complete Regular Expression
In the resulting dialog, check the option Segments have IDs and cycle through the numbers to see the effect of changing this value. The ID can be anywhere in the complete Regular Expression.
i.e. it could be
part of the Start Tag,
part of the Localizable Text,
part of the End Tag.
It is also possible to parse Memos and Maximum Length values identifying portions of the regular expression in brackets.
Just like identifying the ID, tick the Segments have Max Length and/or Segments have Memo checkbox and select the parentheses pair which includes the relevant regular expression.
What we need to do is introduce another pair of braces surrounding the ID within the Start Tag, i.e. ^([^= ]*) = "
With this new pair in place, identifying the group as 2, highlights the correct section of the Regular Expression as the ID.
.
Press OK to exit the Edit Method Advanced Settings dialog
Press Preview to ensure that rule has behaved correctly. The purple colour coding indicates the piece of the segment that has been identified as the ID
|
The preview of the original file is colour coded to help debug ezParse rules. A different colour is used for each element in a matching rule so you can easily spot when rules mismatch content in your file. Purple coded text represents the ID Pink colour indicates the Start Tag Green is the localizable text Yellow is the End Tag |
The ezParse rule is now complete and can be used to extract text from any file with a similar format. Press OK to close the Edit Methods dialog and OK again to save the rule on your machine.