A regular expression is a pattern or sequence of characters that has regular characters and meta-characters. The pattern serves as a template to find (and possibly replace) a desired arrangement in a body of text.
Here is a rough history of regular expressions:
g/re/p. Based upon his work in ed, Ken went on to make grep, the command line utility of UNIX, which is typically used like this: grep -i dog animals.txt. In DOS, the rough equivalent would be findstr /i dog animals.txt. (Enter findstr -? to find out more.)The VBScript RegExp object implements regular expressions in a slightly different way than does the JavaScript/JScript RegExp object. Both of these RegExp objects are modeled after regular expressions in PERL.
Most characters in a regular expression will look for themselves. EG: /geo/ will find george and gorgeous. However some characters have special meaning in regular expression patterns. Here is the basic list of these metacharacters.
\ | () [] {} ^ $ * + ? .
Assertions are sections of patterns that match themselves. Atoms are non-zero width assertions.
Quantifiers say how many of the atom immediately preceding should match in a row. The quantifiers are *, +, ?, and {}. EG: /hi{2}/ matches hii while /(hi){2}/ matches hihi.
Flags are not part of the pattern, but affect the application of the pattern
| Character | Description |
|---|---|
| \ |
Escapes, i.e. marks the next character as special, a literal, a back reference, or an octal. EG: "n" is "n", but "\n" is a newline. An escape of particular note is "\\" |
| ^ |
(1) Anchors at start, i.e. matches at the beginning of target string. If RegExp.Multiline is set, then also matches after "\n" or "\r". EG: "^a" matches the first a in "ana" but not the second. (2) In sets, this means not the set. EG: "[^x-z]" matches any character except for "x, "y", or "z". |
| $ |
Anchors at end, i.e. matches at the end of target string. If RegExp.Multiline is set, then also matches before "\n" or "\r". EG: "a$" matches the second a in "ana" but not the first. |
| . |
Matches any 1 character except characters related to new lines: [\n\r\u2028\u2029]. EG: "bo." matches " |
| * |
Quantifier: Matches the preceding sub-expression 0 or more times. Same as {0,}. EG: "bo*" matches "b", "bo", "boo", "booo", "boooo". |
| + |
Quantifier: Matches the preceding sub-expression 1 or more times. Same as {1,}. EG: "bo+" matches " |
| ? |
(1) Quantifier: Matches the preceding sub-expression 0 or 1 times. Same as {0,1}. EG: "bo?" matches "b", "bo", " (2) If used immediately after one of the other quantifiers (*, +, ., and {}), then makes the pattern non-greedy. EG: "X.+X" matches "XHello world.X Xfoo barX", while "X.+?X" matches "XHello world.X" (3) Used in the look ahead assertions: (?=), (?!), and (?:). |
| {n} |
Matches the preceding sub-expression n times. EG: "bo{2}" matches " |
| {n,} |
Matches the preceding sub-expression n or more times. EG: "bo{2,}" matches " |
| {n,m} |
Matches the preceding sub-expression n to m times. EG: "bo{2,3}" matches " |
|
(pattern) \(pattern\) (Latter in POSIX) |
(1) Used in a mathematical fashion for grouping, scoping, and setting precedence. EG: "dais(y|ies)" is the same as "daisy|daisies". (2) Matches the pattern and captures/remembers/parenthesizes it. Captured matches can be retrieved into the Matches collection (VBScript) or the $0 ... $9 backreference properties (JavaScript) or the $0 .. $99 backreference properties (PERL) or the \n (POSIX). EG: "/<(.*)>.*<\/\1>/" matches paired elements like "<p>hi</p>". EG: "/^(.)(.).*\2\1$/" matches strings like "ABcdedBA". |
| (?:pattern) |
Look ahead assertion: Matches the pattern. In spite of parentheses, this does not capture. EG: "|a" is not valid but "(?:)|a" is. |
| pattern1(?=pattern2) |
Look ahead assertion: Matches the pattern1 if it is followed by pattern2. In spite of parentheses, this does not capture. EG: "Win (?=95|98)" matches "Windows" of "Windows 98" but not "Windows" of "Windows 2000". |
| pattern1(?!pattern2) |
Look ahead assertion: Matches the pattern1 if it is not followed by pattern2. In spite of parentheses, this does not capture. EG: "Win (?!95|98)" matches "Windows" of "Windows 2000" but not "Windows" of "Windows 98". |
| x|y |
Seperates alternatives. Matches x or y. EG: "g|food" matches "g" or "food". "(g|f)ood" matches "good" or "food". |
| [xyz] |
Positive character set matches any character enclosed. EG: "[ab]" matches "ab" of "abcd". |
| [^xyz] |
Negative character set matches any character not enclosed. EG: "[^ab]" matches "cd" of "abcd". |
| [x-z] |
Positive range of characters. EG: "[x-z]" matches any "x, "y", or "z". |
| [^x-z] |
Negative range of characters. EG: "[^x-z]" matches any character except for "x, "y", or "z". |
| \b |
Matches a word boundary, i.e. position between a character and whitespace. EG: "er\b" matches "er" in "hover x" but not the "er" in "Ebert". |
| \B |
Matches a non-word boundary, i.e. position between a character and a character. EG: "er\B" matches "er" in "Ebert" but not the "er" in "hover x". |
| \cx |
Matches a control character x, where x is A-Z or a-z. EG: "\cM" matches ctrl+M (carriage return character). |
|
\d [:digit:] (latter in POSIX) |
Matches a digital character. Same as [0-9]. |
| \D | Matches a non-digital character. Same as [^0-9]. |
| \f | Matches a form-feed character. Same as [\x0c\cL]. |
| \n | Matches a newline character. Same as [\x0a\cJ]. FYI: EOLs by sys: Win \r\n; Unix \n; Mac \r. |
| \r | Matches a carriage return character. Same as [x0d\cM]. FYI: EOLs by sys: Win \r\n; Unix \n; Mac \r. |
|
\s [:space:] (latter in POSIX) |
Matches a whitespace character. Same as [\t\n\v\f\r] or [\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]. |
| \S | Matches a non-whitespace character. Same as [^\t\n\v\f\r] or [^\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]. |
| \t | Matches a tab character. Same as [\x09\cI]. |
| \v | Matches a vertical tab character. Same as [x0b\cK]. |
| \w | Matches a word character. Same as [A-Za-z0-9_];. |
| \W | Matches a non-word character. Same as [^A-Za-z0-9_]. |
| [:alnum:] | Matches alphanumeric characters in POSIX. Same as [A-Za-z0-9]. |
| [:alpha:] | Matches alphabet characters in POSIX. Same as [A-Za-z]. |
| [:blank:] | Matches space and tab in POSIX. Same as [ \t]. |
| [:cntrl:] | Matches control characters in POSIX. Same as [\x00-\x1F\x7F]. |
| [:graph:] | Matches graphical or visible characters in POSIX. Same as [\x21-\x7E]. |
| [:lower:] | Matches a lowercase character in POSIX. Same as [a-z]. |
| [:print:] | Matches graphical or visible characters and space in POSIX. Same as [\x20-\x7E]. |
| [:punct:] | Matches punctuation characters and space in POSIX. Same as [!"#$%&'()*+,-./:;<=>?@[\\\]_`{|}~]. |
| [:upper:] | Matches an uppercase character in POSISX. Same as [A-Z]. |
| [:xdigit:] | Matches any characters used in hexadecimal digits in POSISX. Same as [A-Fa-f0-9]. |
| \n |
If integer n is preceded by at least n captured (parenthesized) matches, then back references the captured matches. EG: "one(,)\stwo\1" matches "one, two" in "one, two, three". EG: "/<(.*)>.*<\/\1>/" matches paired elements like "<p>hi</p>". EG: "/^(.)(.).*\2\1$/" matches strings like "ABcdedBA". Else if n is an octal number, then matches an octal character. In VBScript, n must be between 1-3 digits (0-777). EG: "\132" matches "Z". |
| \on |
Matches an ASCII octal character code n. JavaScript only. EG: "\x5a" matches "Z". |
| \xn |
Matches an ASCII hexadecimal character code n, where n has 2 digits. EG: "\x5a" matches "Z". |
| \un |
Matches a Unicode hexadecimal character code n, where n has 4 digits. EG: "\u00A2" matches "?". |
| \0 | Matches NUL or NULL PROMPT. Same as [\u0000]. |
There are six basic rules that regular expressions apply in order.
/ar/ in Cart.
Cart // Does "Cart" match? ... NO Cart // Does "Car" match? ... NO Cart // Does "Ca" match? ... NO Cart // Does "C" match? ... NO Cart // Does "" match? ... NO Cart // Does "art" match? ... NO Cart // Does "ar" match? ... YES
|).| Assertion | Description |
|---|---|
| ^ | Matches at the beginning of the string. |
| $ | Matches at the end of the string. |
| \b |
Matches a word boundary (between \w and \W), when not inside []. |
| \B | Matches a non-word boundary. |
Numbers
/^\d+$/./^-?\d+$/.
/^-?\d+\.\d+$/./^-?\d+(\.\d+)?$/./^-?\d+(\.\d{1,3})?$/.Tag related as in HTML, XHTML, XML, etc.
/<(.|\n)+?>/
/(<([^>]+)>)/ig
s/ / /g{\<img([^>]*[^/])}{\>}\1 /\2onclick="[^"]*"<td[^>]*>Miscellany
/(George|Julia|Connie|York|Amy) Hernandez/.str = "George Hernandez";
newstr = str.replace(/(\S+)\s(\S+)/, "$2 $1");
s/(\S+)\s+(\S+)/$2\, $1/
str.replace(/(\w+)\s([\w\s]*)/, "$2 $1"), would make 20060728t1503 My File part1of2.txt into My File part1of2_20060728t1503.txt. I've used this variation (([\w\-]+\s\w+)\s([\w\s]*)) to change 2006-07-28 1503 My File part1of2.txt into My File part1of2_2006-07-28 1503.txt.
str.replace(/(.*)\s(\w*)(.\w{3})/, "\2 \1"), would make My File part1of2 200607281503.txt into 200607281503 My File part1of2.txt./^[yY]/./^(yes|YES|Yes)$/.
/^\[^ \t:\]+:/.
NNN-NN-NNNN)./^\d{3}-\d{2}-\d{4}$/.[^\x20-\x7E],, /, and _.):
/[^\w\s\+\,\-\.\/\@\_]/
/[--]/
2008-03-29 16:10:18Z