A Web Developer’s Introduction to Regular Expressions (REGEX)

January 8th, 2007 by Joe

According to Wikipedia, “a regular expression (abbreviated as regexp or regex, with plural forms regexps, regexes, or regexen) is a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are used by many text editors and utilities to search and manipulate bodies of text based on certain patterns.”

When teamed up with javascript 1.2+ (present in NS 4+, Mozilla, and MSIE 4+) regular expressions can prove to be a VERY powerful form validation tool. ASP.NET includes a REGEX Validator which can render client-side regex validation on input fields very easily. (Obviously, you do not need to use ASP.NET to accomplish your field validation, this is just here for reference.)

To get started in regex, here are a few items of interest.
Useful Rules:

Modifier/Assertion Description Equivalent
^ Matches at the begining of the string
$ Matches at the end of the string
\b Matches a word boundary (between \w and \W) when not inside []
\B Matches a non-word boundary
. Represents “any character” (single character wild-card)
g Do global pattern matching
i Do case-insensitive pattern matching
m Not supported by javascript 1.2!
s
x
{m,n} Must occur at least m times, but not more than n times
{m,} Must occur at least m times
{m} Must occur exactly m times
* Must occur 0 or more times {0,}
+ Must occur 1 or more times {1,}
? Must occur 0 or 1 times {0,1}

Useful Patterns:

Character Matches Equivalent
\n Line-feed (LF)
\r Carriage return (CR)
\t Tab
\v Vertical tab
\f Form feed (FF)
\d A digit [0-9]
\D A non-digit [^0-9]
\w A word (alphanumeric) character [a-zA-Z_0-9]
\W A non-word (alphanumeric) character [^a-zA-Z_0-9]
\s A whitespace character [\t\v\n\r\f]
\S A non-whitespace character [^\t\v\n\r\f]

Useful Characters:

Meta-Character Regular Character
\\ \
\$ $
\| |
\( (
\) )
\{ {
\^ ^
\$ $
\* *
\+ +
\? ?
\. .

Useful Regular Expressions:

DataType regex Accepts Rejects
whitespace /^\s+$/ spaces, tabs, etc. 0 a
single letter /^[a-zA-Z]$/ a A 0 . -
alpha string /^[a-zA-Z]+$/ aBbB a1 23
single digit /^\d/ 1 00 1a 12
single alphanumeric /^([a-zA-Z]|\d)$/ 0 a 00 1a 12
alphanumeric string /^[a-zA-Z0-9]$/ a1b2 2.3 1.a
integer /^\d+$/ 0 23 3.4
signed integer /^(+1-)?\d+$/ -1 +2 -1.2 +3.4
floating-point number /^((\d+(\.\d*)?)1((\d*\.)?\d+))$/ 1.1 0.9 .8 -1.1
email address /^.+\@.+\..+$/ a@b.c a@b

Related Links:

Posted in Internet, Web Development

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

About Greener Living thru Technology

JoeLevi.com is the personal web log of Joe Levi -- an ASP.NET Web Developer by trade and by hobby. Joe's love of technology isn't just limited to the web, he's also interested in green and environmentally friendly technology and technological solutions. If it has to do with technology, improving the quality of life, geek humor, tech politics, self-defense, environmental stewardship, or anything related, you'll probably find it at www.JoeLevi.com.

Site statistics:
Average: ~1.3 P/V; Visits: ~3,000; Pageviews: ~3,600; Google PR: 4; TechnoratiAuthority: 17; Technorati Rank: 487,964





Watch the latest videos on YouTube.com