Some Unix utilities, such as grep, and editors, such as vi, use regular expressions as search strings. A regular expression is a sequence of characters which follows certain rules of interpretation. Some of the characters are taken literally, while others are taken to mean special things.
A regular expression matches the longest possible string, starting as close as possible to the beginning of the line. Expressions which include spaces need delimiters.
Delimiter: A character which marks the start and end of a regular expression. The "/" is frequently
used as a delimiter, but almost any other character may also serve as a delimiter.
Example: /this is a regular expression/
The special characters:
. (period) | Match any single character. Example: /c.t/ matches cat, cot, cut Example: /t.k.n/ matches token, taken, tikin, etc. |
* | Repeat the preceding regular expression zero or more times. In this case, regular expression means the
character (regular or special) immediately before the *, or a larger expression if the * follows the
delimiter.
Example: /yz*/ matches y, yz, yz, yazz, etc. Example: /(.*)/ matches as long a string as possible between parentheses. |
^ | Force the match to occur only at the beginning of the line. Example: /^function/ matches function as the first item on a line. |
$ | Force the match to occur only at the end of a line. Example: /terminator$/ matches terminator at end of a line. |
[ ] | Define a character class (a set of characters) as acceptable matches for any single character. One
occurrence of the character class will match one character in the text; if you want to match a series of
letters, use the character class followed by an *. Example: /[Aa][Bb][Cc]/ matches ABC, abc, Abc, aBc, abC, ABc, aBC but does not match AaBbCc Example: /[Aa]*[Bb]*[Cc]*/ matches AAaaBC, AbbbBc,AAbbCcccc,etc. That is, any combination of A's,B's, and C's, regardless of case. |
A group of characters with contiguous ASCII codes, such as a-z, A-Z, 0-9, can be defined as a character class as: [a-z] [A-Z] [0-9]. This sort of definition may be combined with an explicit list of other characters: [a-zA-Z{}0-9] defines a character class containing capital and small letters, digits, and the left and right curly bracket.
When you want to find a special character as itself, you must quote it with a backslash.
Example: /[0-9][0-9]\.[0-9][0-9]/ will match 03.98, 45.76, etc.
To repeat a regular expression longer than a single character:
/\(th[ai][ts]\)*/ matches this, that, or thisthatthisthat
In replacement strings in vi and sed substitute commands, an & means the string that you searched for, and \n means the bracketed regular expression beginning with the nth \(.
Example: s/dollar/&s/ changes dollar to dollars
Example: s/\ ([0-9]\)\(Cost\)/\2\1/
changes 3Cost to Cost3, 5Cost to Cost5, etc.