Regular Expressions

Regular expressions are a concise and flexible notation for finding and replacing patterns of text.

You can use the following regular expressions in the Find and Replace dialog boxes to refine and expand your search.

Note   You must choose the Use check box in the Find and Replace dialog boxes before using any of the following as part of your search criteria.

The following expressions can be used to match characters or digits in your search string:

Expression Syntax Description Example
Any character . Matches any one character except a line break.  
Maximal-zero or more * Matches zero or more occurrences of the preceding expression.  
Maximal-one or more + Matches at least one occurrence of the preceding expression.  
Minimal zero or more @ Matches zero or more occurrences of the preceding expression, matching as few characters as possible.  
Minimal-one or more # Matches one or more occurrences of the preceding expression, matching as few characters as possible.  
Repeat n times ^n Matches n occurrences of the preceding expression. [0-9]^4    matches any 4-digit sequence.
Set of characters [] Matches any one of the characters within the []. To specify a range of characters, list the starting and ending character separated by a dash (-), as in [a-z]  
Character not in set [^...] Matches any character not in the set of characters following the ^  
Beginning of line ^ Anchors the match to the beginning of a line.  
End of line $ Anchors the match to the end of a line.  
Beginning of word < Matches only when a word begins at this point in the text.  
End of word > Matches only when a word ends at this point in the text.  
Grouping () Groups a subexpression.  
Or | Matches the expression before or after the |. Mostly used within a group. (sponge|mud) bath    matches "sponge bath" and "mud bath."
Escape \ Matches the character following the backslash (\). This allows you to find characters used in the regular expression notation, such as { and ^. \^   Searches for the ^ character.
Tagged expression {} Tags the text matched by the enclosed expression.  
nth tagged text \n In a Find or Replace expression, indicates the text matched by the nth tagged expression, where n is a number from 1 to 9.

In a Replace expression, \0 inserts the entire matched text.

 
Right-justified field \(w,n) In a Replace expression, right-justifies the nth tagged expression in a field at least w characters wide.  
Left-justified field \(-w,n) In a Replace expression, left-justifies the nth tagged expression in a field at least w characters wide.  
Prevent match ~X Prevents a match when X appears at this point in the expression. real~(ity)    matches the "real" in "realty" and "really," but not the "real" in "reality."
Alphanumeric character :a Matches the expression
([a-zA-Z0-9])
 
Alphabetic character :c Matches the expression
([a-zA-Z])
 
Decimal digit :d Matches the expression
([0-9])
 
Hexadecimal digit :h Matches the expression
([0-9a-fA-F]+)
 
Identifier :i Matches the expression
([a-zA-Z-$][a-zA-Z0-9_$]*)
 
Rational number :n Matches the expression
(([0-9]+.[0-9]*)|([0-9]*.[0-9]+)|([0-9]+))
 
Quoted string :q Matches the expression (("[^"]*")|('[^']*'))  
Alphabetic string :w Matches the expression
([a-zA-Z]+)
 
Decimal integer :z Matches the expression
([0-9]+)
 
Escape \e Unicode U+001B  
Bell \g Unicode U+0007  
Backspace \h Unicode U+0008  
Line break \n Matches a platform-independent line break. In a Replace expression, inserts a line break.  
Tab \t Matches a tab character, Unicode U+0009  
Unicode character \x#### or \u#### Matches a character given by Unicode value where #### is hexadecimal digits. You can specify a character outside the Basic Multilingual Plane (that is, a surrogate) with the ISO 10646 code point or with two Unicode code points giving the values of the surrogate pair.  

The following table lists the syntax for matching by standard Unicode character properties. The two-letter abbreviation is the same as listed in the Unicode character properties database. These may be specified as part of a character set. For example, the expression [:Nd:Nl:No] matches any kind of digit.

Expression Syntax Description Example
Uppercase letter :Lu Matches any one capital letter. :Luhe   matches "The" but not "the".
Lowercase letter :Ll Matches any one lower case letter. :Llhe   matches "the" but not "The".
Title case letter :Lt Matches characters that combine an uppercase letter with a lowercase letter, such as Nj and Dz.  
Decimal digit :Nd Matches decimal digits such as 0-9 and their full-width equivalents.  
Open punctuation :Ps Matches opening punctuation such as open brackets and braces.  
Close punctuation :Pe Matches closing punctuation such as closing brackets and braces.  
Initial quote punctuation :Pi Matches initial double quotation marks.  
Final quote punctuation :Pf Matches single quotation marks and ending double quotation marks.  
Dash punctuation :Pd Matches the dash mark.  
Connector punctuation :Pc Matches the underscore or underline mark.  
Other punctuation :Po Matches commas (,), ?, ", !, @, #, %, &, *, \, colons (:), semi-colons (;), ', and /.  
Space separator :Zs Matches blanks.  
Line separator :Zl Matches the Unicode character U+2028  
Paragraph separator :Zp Matches the Unicode character U+2029  
Math symbol :Sm Matches +, =, ~, |, <, and >  
Currency symbol :Sc Matches $ and other currency symbols.  
Other control :Cc Matches end of line.  
Other format :Cf Formatting control character such as the bidirectional control characters.  
Surrogate :Cs Matches one half of a surrogate pair.  
Other private-use :Co Matches any character from the private-use area.  

In addition to the standard Unicode character properties, the following additional properties may be specified. These properties may be specified as part of a character set.

Expression Syntax Description Example
Alpha :Al Matches any one character :Alhe   matches words such as "The", "then", and "reached".
Numeric :Nu Matches any one number or digit  
Punctuation :Pu Matches any one punctuation mark, such as ?, @, ', etc.  
White space :Wh Matches all types of white space, including publishing and ideographic spaces.  
Bidi :Bi Matches characters from right-to-Left scripts such as Arabic and Hebrew.  
Hangul :Ha Matches Korean Hangul and combining Jamos.  
Hiragana :Hi Matches Hiragana characters.  
Katakana :Ka Matches katakana characters.  
Ideographic/Han/Kanji :Id Matches ideographic characters, such as Han and Kanji