A regular expression is a pattern or sequence of characters that has regular characters and metacharacters. The pattern serves as a template to find (and possibly replace) a desired arrangement in a body of text.
Here is a rough history of regular expressions:
g/re/p. Based upon his work in ed, Ken went on to make grep, the command line utility of UNIX, which is typically used like this: grep -i dog animals.txt. In DOS, the rough equivalent would be findstr /i dog animals.txt. (Enter findstr -? to find out more.)The VBScript RegExp object implements regular expressions in a slightly different way than does the JavaScript/JScript RegExp object. Both of these RegExp objects are modeled after regular expressions in PERL.
Most characters in a regular expression will look for themselves. EG: /geo/ will find george and gorgeous. However some characters have special meaning in regular expression patterns. Here is the basic list of these metacharacters.
\ | () [] {} ^ $ * + ? .
Assertions are sections of patterns that match themselves. Atoms are non-zero width assertions.
Quantifiers say how many of the atom immediately preceding should match in a row. The quantifiers are *, +, ?, and {}. EG: /hi{2}/ matches hii while /(hi){2}/ matches hihi.
Flags are not part of the pattern, but affect the application of the pattern. They can be combined. EG: ig.
Here is the syntax for regular expressions in JavaScript.
var RegExpLiteral = /pattern/[flag];
var RegExpObject = new regExp("pattern"[, "flag"]);
| Character | Description |
|---|---|
| \ | Escapes, i.e. marks the next character as special, a literal, a back reference, or an octal.
EG: "n" is "n", but "\n" is a newline. An escape of particular note is "\\" |
| ^ | (1) Anchors at start, i.e. matches at the beginning of target string. If RegExp.Multiline is set, then also matches after "\n" or "\r". EG: "^a" matches the first a in "ana" but not the second. (2) In sets, this means not the set. EG: "[^x-z]" matches any character except for "x", "y", or "z". |
| $ | Anchors at end, i.e. matches at the end of target string. If RegExp.Multiline is set, then also matches before "\n" or "\r". EG: "a$" matches the second a in "ana" but not the first. |
| . | Matches any 1 character except characters related to new lines: [\n\r\u2028\u2029]. EG: "bo." matches " |
| * | Quantifier: Matches the preceding sub-expression 0 or more times. Same as {0,}. EG: "bo*" matches "b", "bo", "boo", "booo", "boooo". |
| + | Quantifier: Matches the preceding sub-expression 1 or more times. Same as {1,}. EG: "bo+" matches " |
| ? | (1) Quantifier: Matches the preceding sub-expression 0 or 1 times. Same as {0,1}. EG: "bo?" matches "b", "bo", " (2) If used immediately after one of the other quantifiers (*, +, ., and {}), then makes the pattern non-greedy. EG: "X.+X" matches "XHello world.X Xfoo barX", while "X.+?X" matches "XHello world.X" (3) Used in the look ahead assertions: (?=), (?!), and (?:). |
| {n} | Matches the preceding sub-expression n times. EG: "bo{2}" matches " |
| {n,} | Matches the preceding sub-expression n or more times. EG: "bo{2,}" matches " |
| {n,m} | Matches the preceding sub-expression n to m times. EG: "bo{2,3}" matches " |
| (pattern) \(pattern\) (Latter in POSIX) |
(1) Used in a mathematical fashion for grouping, scoping, and setting precedence. EG: "dais(y|ies)" is the same as "daisy|daisies". (2) Matches the pattern and captures/remembers/parenthesizes it. Captured matches can be retrieved into the Matches collection (VBScript) or the $0 ... $9 backreference properties (JavaScript) or the $0 .. $99 backreference properties (PERL) or the \n (POSIX). EG: "<(.*)>.*<\/$1>" matches paired elements like "<p>hi</p>". EG: "(.)(.).*$2$1" matches strings like "ABcdedBA". |
| (?:pattern) | Look ahead assertion: Matches the pattern. In spite of parentheses, this is zero-width and does not capture. EG: "|a" is not valid but "(?:)|a" is for empty string or a. |
| pattern1(?=pattern2) | Positive look ahead assertion: Matches the pattern1 if it is followed by pattern2. In spite of parentheses, this is zero-width and does not capture. EG: "Win (?=95|98)" matches "Windows" of "Windows 98" but not "Windows" of "Windows 2000". |
| pattern1(?!pattern2) | Negative look ahead assertion: Matches the pattern1 if it is not followed by pattern2. In spite of parentheses, this is zero-width and does not capture. EG: "Win (?!95|98)" matches "Windows" of "Windows 2000" but not "Windows" of "Windows 98". |
| (?<=pattern1)pattern2 | Positive look behind assertion: Matches the pattern2 if it is preceded by pattern1. In spite of parentheses, this is zero-width and does not capture. EG: "(?<=Satur|Sun)day" matches "day" of "Sunday" but not "day" of "Monday". |
| (?<!pattern1)pattern2 | Negative look behind assertion: Matches the pattern2 if it is not followed by pattern1. In spite of parentheses, this is zero-width and does not capture. EG: "(?<!Satur|Sun)day" matches "day" of "Monday" but not "day" of "Sunday". |
| x|y | Seperates alternatives. Matches x or y. EG: "g|food" matches "g" or "food". "(g|f)ood" matches "good" or "food". |
| [xyz] | Positive character set matches any character enclosed. EG: "[ab]" matches "ab" of "abcd". |
| [^xyz] | Negative character set matches any character not enclosed. EG: "[^ab]" matches "cd" of "abcd". |
| [x-z] | Positive range of characters. EG: "[x-z]" matches any "x", "y", or "z". |
| [^x-z] | Negative range of characters. EG: "[^x-z]" matches any character except for "x", "y", or "z". |
| \b | Matches a word boundary, i.e. position between a character and whitespace. EG: "er\b" matches "er" in "hover x" but not the "er" in "Ebert". |
| \B | Matches a non-word boundary, i.e. position between a character and a character. EG: "er\B" matches "er" in "Ebert" but not the "er" in "hover x". |
| \cx | Matches a control character x, where x is A-Z or a-z. EG: "\cM" matches ctrl+M (carriage return character). |
| \d [:digit:] (latter in POSIX) |
Matches a digital character. Same as [0-9]. |
| \D | Matches a non-digital character. Same as [^0-9]. |
| \f | Matches a form-feed character. Same as [\x0c\cL]. |
| \n | Matches a newline character. Same as [\x0a\cJ]. FYI: EOLs by sys: Win \r\n; Unix \n; Mac \r. |
| \r | Matches a carriage return character. Same as [x0d\cM]. FYI: EOLs by sys: Win \r\n; Unix \n; Mac \r. |
| \s [:space:] (latter in POSIX) |
Matches a whitespace character. Same as [\t\n\v\f\r ] or [\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]. |
| \S | Matches a non-whitespace character. Same as [^\t\n\v\f\r ] or [^\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]. |
| \t | Matches a tab character. Same as [\x09\cI]. |
| \v | Matches a vertical tab character. Same as [x0b\cK]. |
| \w | Matches a word character. Same as [A-Za-z0-9_];. |
| \W | Matches a non-word character. Same as [^A-Za-z0-9_]. |
| [:alnum:] | Matches alphanumeric characters in POSIX. Same as [A-Za-z0-9]. |
| [:alpha:] | Matches alphabet characters in POSIX. Same as [A-Za-z]. |
| [:blank:] | Matches space and tab in POSIX. Same as [ \t]. |
| [:cntrl:] | Matches control characters in POSIX. Same as [\x00-\x1F\x7F]. |
| [:graph:] | Matches graphical or visible characters in POSIX. Same as [\x21-\x7E]. |
| [:lower:] | Matches a lowercase character in POSIX. Same as [a-z]. |
| [:print:] | Matches graphical or visible characters and space in POSIX. Same as [\x20-\x7E]. |
| [:punct:] | Matches punctuation characters and space in POSIX. Same as [!"#$%&'()*+,-./:;<=>?@[\\\]_`{|}~]. |
| [:upper:] | Matches an uppercase character in POSISX. Same as [A-Z]. |
| [:xdigit:] | Matches any characters used in hexadecimal digits in POSISX. Same as [A-Fa-f0-9]. |
| \n | If integer n is preceded by at least n captured (parenthesized) matches, then back references the captured matches. EG: "one(,)\stwo\1" matches "one, two" in "one, two, three". EG: "/<(.*)>.*<\/\1>/" matches paired elements like "<p>hi</p>". EG: "/^(.)(.).*\2\1$/" matches strings like "ABcdedBA". Else if n is an octal number, then matches an octal character. In VBScript, n must be between 1-3 digits (0-777). EG: "\132" matches "Z". |
| \on | Matches an ASCII octal character code n. JavaScript only. EG: "\x5a" matches "Z". |
| \xn | Matches an ASCII hexadecimal character code n, where n has 2 digits. EG: "\x5a" matches "Z". |
| \un | Matches a Unicode hexadecimal character code n, where n has 4 digits. EG: "\u00A2" matches "?". |
| \0 | Matches NUL or NULL PROMPT. Same as [\u0000]. |
There are six basic rules that regular expressions apply in order.
/ar/ in Cart.
Cart // Does "Cart" match? ... NO Cart // Does "Car" match? ... NO Cart // Does "Ca" match? ... NO Cart // Does "C" match? ... NO Cart // Does "" match? ... NO Cart // Does "art" match? ... NO Cart // Does "ar" match? ... YES
|).| Assertion | Description |
|---|---|
| ^ | Matches at the beginning of the string. |
| $ | Matches at the end of the string. |
| \b |
Matches a word boundary (between \w and \W), when not inside []. |
| \B | Matches a non-word boundary. |
^\d+$
^-?\d+$
^-?\d+\.\d+$
^-?\d+(\.\d+)?$
^-?\d+(\.\d{1,3})?$
Tag as in HTML, XML, etc.
/<(.|\n)+?>/
/(<([^>]+)>)/ig
s/ / /g
{\<img([^>]*[^/])}{\>}\1 /\2
onclick="[^"]*"
<td[^>]*>
^\r?\n?$
[ \t]+$
str.replace(/^(\t*) {4}/g, "$1\t")
(?<=\S)( {2,}|\t+| +\t+|\t+ +)(?=\S)
str.replace(/(\S)(</li>)(\t*)/g, "$1\r\n$3$2")\b \b(George|Julia|Connie|York|Amy) Hernandez.
str = "George Hernandez";
newstr = str.replace(/(\S+)\s(\S+)/, "$2 $1");
s/(\S+)\s+(\S+)/$2\, $1/
str.replace(/(\w+)\s([\w\s]*)/, "$2 $1"), would make 20060728t1503 My File part1of2.txt into My File part1of2_20060728t1503.txt. I've used this variation (([\w\-]+\s\w+)\s([\w\s]*)) to change 2006-07-28 1503 My File part1of2.txt into My File part1of2_2006-07-28 1503.txt.
str.replace(/(.*)\s(\w*)(.\w{3})/, "\2 \1"), would make My File part1of2 200607281503.txt into 200607281503 My File part1of2.txt.
^[yY].
(yes|YES|Yes)
^\[^ \t:\]+:.
[^\x0-\x7F]
,\s*[\}\|\]]
^\d{5}([\-]\d{4})?$
NNN-NN-NNNN).^\d{3}-\d{2}-\d{4}$
^(\d{3}[-]?){1,2}(\d{4})$
[ \w\,\'\.\#\-]{1,50}
var match, mypar, myparDefault = 0;
mypar = myparDefault;
match = document.URL.match(/[?&]mypar=(\d+)/);
if (match && match[1]) {
mypar = parseInt(match[1], 10) || myparDefault;
}
\b(?=\w{6}\b)\w{0,3}dog\w*
^.*(?=.{8,30})(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[\W])(?!.*[\s]).*$^.*(?=.{6,30})(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9\W]).*$
/^([\w\-]+)(\.[\w\-]+)*@([\w\-]+\.){1,5}([A-Za-z]){2,4}$//^(?!\.)(?>\.?[a-zA-Z\d!#$%&'*+\-\/=?^_`{|}~]+)+@((?!-)[a-zA-Z\d\-]+(?!<-)\.)+[a-zA-Z]{2,}$/
Page Modified: (Hand noted: 2008-03-29 16:10:18Z) (Auto noted: 2013-02-18 15:55:28Z)