Regular Expressions

From Vectivus
Jump to: navigation, search


Regular expressions are characters and meta-characters that are used to identify parts of text. Using meta-characters, pattern matching can be very specific or generalized. They are used to find and take actions on text. The following is not a comprehensive list, but does cover the majority of operations that are used when parsing logs/text/files.

Specifying Position

These are used to specify a position within a string or line.

Start of a line/string
End of a line/string
Start of a string
End of a string

Specifying Characters

These are used to specify a particular type of character.

Any character except a newline
Control character
Digit character (e.g. 0-9)
Non-digit character
New line character
Octal digit
Carriage return character
Whitespace character (tab/space/etc)
Non-whitespace character
Tab character
Hexadecimal digit

Specifying POSIX Character Classes

These are alternative nomenclatures for specifying character types under the POSIX standard.

Uppercase characters [A-Z]
Lowercase characters [a-z]
Any digit character [0-9]
Any space character (space/tab/etc)
Any uppercase or lowercase alphabetical character [A-Za-z]
Any uppercase, lowercase, or digit character [A-Za-z0-9]
Any punctuation character
Any hexadecimal digit
Any control character

Specifying Quantity

These are used to specify how many times the preceding pattern has to match. For example:


Matches a Social Security Number (SSN) format either with or without dashes.

Zero or more instances
One or more instance
Zero or one instance (only, not more than)
Exactly NUMBER instances
NUMBER or more instances
NUMBER_A to NUMBER_B instances

Specifying Logic

These are used to specify how matching occurs and are used to make more complex patterns

[ ]
Specify a range
Single character in the range inclusive between "A" and "M" (e.g. "A", "B", "C", "D", ... "K", "L", "M")
Single digit in the range inclusive between 1 and 4 (e.g. 1, 2, 3, 4)
Single character that is either "A" or "B"
Single character that is either "A" or "B" or "C"
Single character that is not "A" and not "B" and not "C"