Home > Linux > Regular expression notes

Regular expression notes

# 12 Metacharacters
[ { ( ) \ ^ $ . | ? * +

? there is zero or one of the preceding element. For example, colou?r matches both “color” and “colour”.
* there is zero or more of the preceding element. For example, ab*c matches “ac”, “abc”, “abbc”, “abbbc”, and so on.
+ there is one or more of the preceding element. For example, ab+c matches “abc”, “abbc”, “abbbc”, and so on, but not “ac”.

– Metachars do not need to be escaped in []
\w=[a-zA-Z0-9_],\d=[0-9],\s=[\t\r\n ] (whitespaces)
^ negates in a []

# Match “1.5” “1a5” “1%5”
/1.5/g

# Match 1 to 4 occurrences of “a”
/a{1,4}/g

# Match “a” 0+ times; will match empty string (e.g. space)!
/a{0,}/g
/a*/g

# Match “a” 1+ times
/a+/g

# Match 0 to 1; if no “a” then it can match an empty string because of “?”
# The question mark indicates there is zero or one of the preceding element
/a?/g
colou?r matches both “color” and “colour”

# Match “”
//g
eg. ”

foo

” # Would match this whole string
# If you want to match only the ”

” and ”


//g
# If you want to match 0 to 1 times “”
//g

# Any number of “a” “b” or “c”; match “abccc”
/[abc]+/g
/[a-z0-9_]+/g # Match “gegergerfs_5%” up to the 5, “gegergerfs_5”

# Match word class 1+ times (see desc of \w above)
/\w+/g

# Match 1+ digits
/\d+/g

# Match white space
/\s+/g

# Match word class (letters or digits, “_”) and “-”
/[\w-]+/g

# Match hex code color; a-f or a digit 3 times, 1 to 2 times of this
/^#([a-f\d]{3}){1,2}$/i

# Match char not a-f; could be “%”
/[^a-f]/g

# Using ^
\W=[^\w],\D=[^\d],\S=[^\s]
.=[^\r\n]

# Match “ab” or “ba”; cba or cab
/c(ab|ba)/g

# Not use capturing group; saves memory
/(?:Java|ECMA)Script/g

# Match .05, -1.2, 3.4538, +1000
/^[-+]?[\d.]+$/ // Too lax
/^[-+]?\d*\.?\d+$/ // False negatives: 5.
/^[-+]?\d*\.?\d*$/ // False positives: ., +., + etc
/^[-+]?(\d*\.?\d+|\d+\.)$/ // Accurate, but is it worth it?

# Match begining of string “a” like “apple”
/^a/g

# Match at end of string like “strypa”
/a$/g

# Would only match “a”; shouldn’t be used, but shown as example
/^a$/g

# beginning/end of lines, with the “m” flag; “a\nbbbbbb” would match
/^a$/gm

# \b=word boundary = between \w and \W; “$5” would match between the “$” and “5”
# or “5” by itself matching before and after it
/\b/g

#\B=non-word boundary = between \w and \w or \W and \W; “foo bar”
/\bfoo\b/g
/\B/g \\ Match in between the “$” in “$$$”

# Lookahead assertions; match “a” followed by “b”, but “b” not part of the match
/a(?=b)/g

# Will match “b” only in “ab”; (?=a) = followed by “a”, which can be any regex
/(?=a(b))/g

(?!a)=NOT followed by “a”

# Matching dates e.g. 2012-04-12, 1972-12-30
/^\d{4}-\d{2}-\d{2}$/g
/^\d{4}-(0\d|1[0-2])-([0-2]\d|3[01])$/ // Can it be improved? Will match 29th Feb

# Lookahead hacks
# Intersection
# A 6+ letter password with at least
# one number, one letter and one symbol
/^(?=.*\d)(?=.*[a-z])(?=.*[\W_]).{6,}$/i

# Subtraction
## Any number that’s NOT divisible by 50
/\b(?!\d+[50]0\d+\b/

# Negation
# Anyting that doesn’t contain “foo”
/^(?!.*foo).+$/

# Match strings (as a learning example); would match “‘foo'” or “‘foo”” (quotes
# mismatched
/(‘|”).+?(‘|”)/g

# Backreference; but would not catch “He said ‘boo’\” or the “\””
/(‘|”).+?\1/g
“”He said ‘boo'”

# Match “”He ‘said’ \”hi\”!” <== “another string””
/(“|’)(\\?.)*?\1/g

## For performance
– Avoid greedy quantifiers
– Don’t forget anchors (^ and $)
– Be as specific as possible (e.g. use \w instead of .)
– Prefer non-capturing groups (?:)
– Minimize backtracking

http://www.youtube.com/watch?v=EkluES9Rvak (Lea Verou)

Categories: Linux
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: