Developing Regular Expressions |
Regular Expressions can be used in many of the search and filter functions throughout Alchemy CATALYST. They can also be used when developing ezParse rules.
A regular expression is a string that describes or matches a set of strings, according to certain syntax rules. Many programming languages support regular expressions for string manipulation. A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the set containing the three strings Handel, Händel, and Haendel can be described by the pattern "H(ä|ae?)ndel". Herein lies the power of using Regular Expression, especially when developing ezParse rules.
Alchemy CATALYST uses the standard boost Regular Expression parser. However it has been extended to allow nesting of expressions. This enhances their suitability to develop complex parsers for proprietary file formats.
The following syntax and guidelines should be followed when developing your own regular expressions.
Characters with no special meaning match a single occurrence of that character. To match a character with a special meaning, type the "\" (backslash) character before it. For example, typing "a" will find the letter "a", but not "A". To find the "[" character, type "\[" as "[" is a special character.
The "?" character will find a single instance of any character. For example, typing "a?" will find "at", "as", "a1", and so on.
The "[]" characters can be used to specify a range of characters. For example, typing "[aeiou]" will find any string that contains a vowel. Typing [0-9] will find any string that contains a digit.
The "^" character, used in conjunction with, "[]" matches everything except the specified range. For example, typing "[^0-9]" will find any string that does not contain a digit.
The "*" character matches zero or more occurrences of the previous regular expression. For example, typing "A[a-z]*" will find any single word beginning with "A".
The "^" character placed before a regular expression will only match if the expression occurs at the start of a string. For example, typing "^A[a-z]*" will find any string that begins with a word beginning with "A".
The "$" character placed after a regular expression will only match if the expression occurs at the end of a string. For example, typing "ing\.$" will find all strings that end in "ing" followed by a full stop.
To search for a character that would normally have a special meaning in regular expression, it must be escaped. This means that you need to tell the regular expression parser not to consider it a regular expression character, but a plain text character instead.
e.g. if a dot normally denotes 'any character' in regular expression, then to search for an actual dot character, we must escape it. To do so, place a backslash in front of the character, i.e. \.