Regular Expressions

From Vectivus
Jump to: navigation, search

Overview

Regular expressions are characters and meta-characters that are used to identify parts of text. Using meta-characters, pattern matching can be very specific or generalized. They are used to find and take actions on text. The following is not a comprehensive list, but does cover the majority of operations that are used when parsing logs/text/files.

Specifying Position

These are used to specify a position within a string or line.

^
Start of a line/string
$
End of a line/string
\A
Start of a string
\Z
End of a string

Specifying Characters

These are used to specify a particular type of character.

.
Any character except a newline
\c
Control character
\d
Digit character (e.g. 0-9)
\D
Non-digit character
\n
New line character
\O
Octal digit
\r
Carriage return character
\s
Whitespace character (tab/space/etc)
\S
Non-whitespace character
\t
Tab character
\w
Word
\W
Non-word
\x
Hexadecimal digit

Specifying POSIX Character Classes

These are alternative nomenclatures for specifying character types under the POSIX standard.

[:upper:]
Uppercase characters [A-Z]
[:lower:]
Lowercase characters [a-z]
[:digit:]
Any digit character [0-9]
[:space:]
Any space character (space/tab/etc)
[:alpha:]
Any uppercase or lowercase alphabetical character [A-Za-z]
[:alnum:]
Any uppercase, lowercase, or digit character [A-Za-z0-9]
[:punct:]
Any punctuation character
[:xdigit:]
Any hexadecimal digit
[:cntrl:]
Any control character

Specifying Quantity

These are used to specify how many times the preceding pattern has to match. For example:

\d{3}-?\d{2}-?\d{4}

Matches a Social Security Number (SSN) format either with or without dashes.

*
Zero or more instances
+
One or more instance
 ?
Zero or one instance (only, not more than)
{NUMBER}
Exactly NUMBER instances
{NUMBER,}
NUMBER or more instances
{NUMBER_A, NUMBER_B}
NUMBER_A to NUMBER_B instances

Specifying Logic

These are used to specify how matching occurs and are used to make more complex patterns

[ ]
Specify a range
[A-M]
Single character in the range inclusive between "A" and "M" (e.g. "A", "B", "C", "D", ... "K", "L", "M")
[1-4]
Single digit in the range inclusive between 1 and 4 (e.g. 1, 2, 3, 4)
(A|B)
Single character that is either "A" or "B"
[ABC]
Single character that is either "A" or "B" or "C"
[^ABC]
Single character that is not "A" and not "B" and not "C"