From d796c9dd933ab96ec83b9a634feedd5d32e1ba3f Mon Sep 17 00:00:00 2001 From: Timothy Pearson Date: Tue, 8 Nov 2011 12:31:36 -0600 Subject: Test conversion to TQt3 from Qt3 8c6fc1f8e35fd264dd01c582ca5e7549b32ab731 --- doc/html/qregexp.html | 1037 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1037 insertions(+) create mode 100644 doc/html/qregexp.html (limited to 'doc/html/qregexp.html') diff --git a/doc/html/qregexp.html b/doc/html/qregexp.html new file mode 100644 index 000000000..5c0cb5784 --- /dev/null +++ b/doc/html/qregexp.html @@ -0,0 +1,1037 @@ + + + + + +TQRegExp Class + + + + + + + +

TQRegExp Class Reference

+ +

The TQRegExp class provides pattern matching using regular expressions. +More... +

All the functions in this class are reentrant when TQt is built with thread support.

#include <qregexp.h> +

List of all member functions. +

Public Members

enum CaretMode { CaretAtZero, CaretAtOffset, CaretWontMatch }
TQRegExp ()
TQRegExp ( const TQString & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE )
TQRegExp ( const TQRegExp & rx )
~TQRegExp ()
TQRegExp & operator= ( const TQRegExp & rx )
bool operator== ( const TQRegExp & rx ) const
bool operator!= ( const TQRegExp & rx ) const
bool isEmpty () const
bool isValid () const
TQString pattern () const
void setPattern ( const TQString & pattern )
bool caseSensitive () const
void setCaseSensitive ( bool sensitive )
bool wildcard () const
void setWildcard ( bool wildcard )
bool minimal () const
void setMinimal ( bool minimal )
bool exactMatch ( const TQString & str ) const
int match ( const TQString & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const (obsolete)
int search ( const TQString & str, int offset = 0, CaretMode caretMode = CaretAtZero ) const
int searchRev ( const TQString & str, int offset = -1, CaretMode caretMode = CaretAtZero ) const
int matchedLength () const
int numCaptures () const
TQStringList capturedTexts ()
TQString cap ( int nth = 0 )
int pos ( int nth = 0 )
TQString errorString ()

Static Public Members

TQString escape ( const TQString & str )

Detailed Description

+ + + +The TQRegExp class provides pattern matching using regular expressions. +

+ + + + +

Regular expressions, or "regexps", provide a way to find patterns +within text. This is useful in many contexts, for example: +

Validation +	A regexp can be used to check whether a piece of text +meets some criteria, e.g. is an integer or contains no +whitespace. +
Searching +	Regexps provide a much more powerful means of searching +text than simple string matching does. For example we can +create a regexp which says "find one of the words 'mail', +'letter' or 'correspondence' but not any of the words +'email', 'mailman' 'mailer', 'letterbox' etc." +
Search and Replace +	A regexp can be used to replace a pattern with a piece of +text, for example replace all occurrences of '&' with +'&' except where the '&' is already followed by 'amp;'. +
String Splitting +	A regexp can be used to identify where a string should be +split into its component fields, e.g. splitting tab-delimited +strings. +

We present a very brief introduction to regexps, a description of +TQt's regexp language, some code examples, and finally the function +documentation itself. TQRegExp is modeled on Perl's regexp +language, and also fully supports Unicode. TQRegExp can also be +used in the weaker 'wildcard' (globbing) mode which works in a +similar way to command shells. A good text on regexps is Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools by Jeffrey E. Friedl, ISBN 1565922573. +

Experienced regexp users may prefer to skip the introduction and +go directly to the relevant information. +

In case of multi-threaded programming, note that TQRegExp depends on +TQThreadStorage internally. For that reason, TQRegExp should only be +used with threads started with TQThread, i.e. not with threads +started with platform-specific APIs. +

Introduction + +
Characters and Abbreviations for Sets of Characters + +
Sets of Characters + +
Quantifiers + +
Capturing Text + +
Assertions + +
Wildcard Matching (globbing) + +
Notes for Perl Users + +
Code Examples + +

+ + +

Introduction +

Regexps are built up from expressions, quantifiers, and assertions. +The simplest form of expression is simply a character, e.g. +x or 5. An expression can also be a set of +characters. For example, [ABCD], will match an A or +a B or a C or a D. As a shorthand we could +write this as [A-D]. If we want to match any of the +captital letters in the English alphabet we can write +[A-Z]. A quantifier tells the regexp engine how many +occurrences of the expression we want, e.g. x{1,1} means +match an x which occurs at least once and at most once. +We'll look at assertions and more complex expressions later. +

Note that in general regexps cannot be used to check for balanced +brackets or tags. For example if you want to match an opening html + and its closing  you can only use a regexp if you +know that these tags are not nested; the html fragment, bold bolder will not match as expected. If you know the +maximum level of nesting it is possible to create a regexp that +will match correctly, but for an unknown level of nesting, regexps +will fail. +

We'll start by writing a regexp to match integers in the range 0 +to 99. We will retquire at least one digit so we will start with +[0-9]{1,1} which means match a digit exactly once. This +regexp alone will match integers in the range 0 to 9. To match one +or two digits we can increase the maximum number of occurrences so +the regexp becomes [0-9]{1,2} meaning match a digit at +least once and at most twice. However, this regexp as it stands +will not match correctly. This regexp will match one or two digits +within a string. To ensure that we match against the whole +string we must use the anchor assertions. We need ^ (caret) +which when it is the first character in the regexp means that the +regexp must match from the beginning of the string. And we also +need $ (dollar) which when it is the last character in the +regexp means that the regexp must match until the end of the +string. So now our regexp is ^[0-9]{1,2}$. Note that +assertions, such as ^ and $, do not match any +characters. +

If you've seen regexps elsewhere they may have looked different from +the ones above. This is because some sets of characters and some +quantifiers are so common that they have special symbols to +represent them. [0-9] can be replaced with the symbol +\d. The quantifier to match exactly one occurrence, +{1,1}, can be replaced with the expression itself. This means +that x{1,1} is exactly the same as x alone. So our 0 +to 99 matcher could be written ^\d{1,2}$. Another way of +writing it would be ^\d\d{0,1}$, i.e. from the start of the +string match a digit followed by zero or one digits. In practice +most people would write it ^\d\d?$. The ? is a +shorthand for the quantifier {0,1}, i.e. a minimum of no +occurrences a maximum of one occurrence. This is used to make an +expression optional. The regexp ^\d\d?$ means "from the +beginning of the string match one digit followed by zero or one +digits and then the end of the string". +

Our second example is matching the words 'mail', 'letter' or +'correspondence' but without matching 'email', 'mailman', +'mailer', 'letterbox' etc. We'll start by just matching 'mail'. In +full the regexp is, m{1,1}a{1,1}i{1,1}l{1,1}, but since +each expression itself is automatically quantified by {1,1} +we can simply write this as mail; an 'm' followed by an 'a' +followed by an 'i' followed by an 'l'. The symbol '|' (bar) is +used for alternation, so our regexp now becomes +mail|letter|correspondence which means match 'mail' or +'letter' or 'correspondence'. Whilst this regexp will find the +words we want it will also find words we don't want such as +'email'. We will start by putting our regexp in parentheses, +(mail|letter|correspondence). Parentheses have two effects, +firstly they group expressions together and secondly they identify +parts of the regexp that we wish to capture. Our regexp still matches any of the three words but now +they are grouped together as a unit. This is useful for building +up more complex regexps. It is also useful because it allows us to +examine which of the words actually matched. We need to use +another assertion, this time \b "word boundary": +\b(mail|letter|correspondence)\b. This regexp means "match +a word boundary followed by the expression in parentheses followed +by another word boundary". The \b assertion matches at a position in the regexp not a character in the regexp. A word +boundary is any non-word character such as a space a newline or +the beginning or end of the string. +

For our third example we want to replace ampersands with the HTML +entity '&'. The regexp to match is simple: &, i.e. +match one ampersand. Unfortunately this will mess up our text if +some of the ampersands have already been turned into HTML +entities. So what we really want to say is replace an ampersand +providing it is not followed by 'amp;'. For this we need the +negative lookahead assertion and our regexp becomes: +&(?!amp;). The negative lookahead assertion is introduced +with '(?!' and finishes at the ')'. It means that the text it +contains, 'amp;' in our example, must not follow the expression +that preceeds it. +

Regexps provide a rich language that can be used in a variety of +ways. For example suppose we want to count all the occurrences of +'Eric' and 'Eirik' in a string. Two valid regexps to match these +are \b(Eric|Eirik)\b and \bEi?ri[ck]\b. We need +the word boundary '\b' so we don't get 'Ericsson' etc. The second +regexp actually matches more than we want, 'Eric', 'Erik', 'Eiric' +and 'Eirik'. +

We will implement some the examples above in the +code examples section. +

Characters and Abbreviations for Sets of Characters +

Element	Meaning +
c +	Any character represents itself unless it has a special +regexp meaning. Thus c matches the character c. +
\c +	A character that follows a backslash matches the character +itself except where mentioned below. For example if you +wished to match a literal caret at the beginning of a string +you would write \^. +
\a +	This matches the ASCII bell character (BEL, 0x07). +
\f +	This matches the ASCII form feed character (FF, 0x0C). +
\n +	This matches the ASCII line feed character (LF, 0x0A, Unix newline). +
\r +	This matches the ASCII carriage return character (CR, 0x0D). +
\t +	This matches the ASCII horizontal tab character (HT, 0x09). +
\v +	This matches the ASCII vertical tab character (VT, 0x0B). +
\xhhhh +	This matches the Unicode character corresponding to the +hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo +(i.e., \zero ooo) matches the ASCII/Latin-1 character +corresponding to the octal number ooo (between 0 and 0377). +
. (dot) +	This matches any character (including newline). +
\d +	This matches a digit (TQChar::isDigit()). +
\D +	This matches a non-digit. +
\s +	This matches a whitespace (TQChar::isSpace()). +
\S +	This matches a non-whitespace. +
\w +	This matches a word character (TQChar::isLetterOrNumber() or '_'). +
\W +	This matches a non-word character. +
\n +	The n-th backreference, +e.g. \1, \2, etc. +

Note that the C++ compiler transforms backslashes in strings so to include a \ in a regexp you will need to enter it twice, i.e. \\. +

Sets of Characters +

Square brackets are used to match any character in the set of +characters contained within the square brackets. All the character +set abbreviations described above can be used within square +brackets. Apart from the character set abbreviations and the +following two exceptions no characters have special meanings in +square brackets. +

^ +	The caret negates the character set if it occurs as the +first character, i.e. immediately after the opening square +bracket. For example, [abc] matches 'a' or 'b' or 'c', +but [^abc] matches anything except 'a' or 'b' or +'c'. +
- +	The dash is used to indicate a range of characters, for +example [W-Z] matches 'W' or 'X' or 'Y' or 'Z'. +

Using the predefined character set abbreviations is more portable +than using character ranges across platforms and languages. For +example, [0-9] matches a digit in Western alphabets but +\d matches a digit in any alphabet. +

Note that in most regexp literature sets of characters are called +"character classes". +

Quantifiers +

By default an expression is automatically quantified by +{1,1}, i.e. it should occur exactly once. In the following +list E stands for any expression. An expression is a +character or an abbreviation for a set of characters or a set of +characters in square brackets or any parenthesised expression. +

E? +	Matches zero or one occurrence of E. This quantifier +means "the previous expression is optional" since it will +match whether or not the expression occurs in the string. It +is the same as E{0,1}. For example dents? +will match 'dent' and 'dents'. +
E+ +	Matches one or more occurrences of E. This is the same +as E{1,MAXINT}. For example, 0+ will match +'0', '00', '000', etc. +
E* +	Matches zero or more occurrences of E. This is the same +as E{0,MAXINT}. The * quantifier is often +used by a mistake. Since it matches zero or more +occurrences it will match no occurrences at all. For example +if we want to match strings that end in whitespace and use +the regexp *\s$ we would get a match on every string. +This is because we have said find zero or more whitespace +followed by the end of string, so even strings that don't end +in whitespace will match. The regexp we want in this case is +\s+$** to match strings that have at least one +whitespace at the end. +
E{n} +	Matches exactly n occurrences of the expression. This +is the same as repeating the expression n times. For +example, x{5} is the same as xxxxx. It is also +the same as E{n,n}, e.g. x{5,5}. +
E{n,} +	Matches at least n occurrences of the expression. This +is the same as E{n,MAXINT}. +
E{,m} +	Matches at most m occurrences of the expression. This +is the same as E{0,m}. +
E{n,m} +	Matches at least n occurrences of the expression and at +most m occurrences of the expression. +

(MAXINT is implementation dependent but will not be smaller than +1024.) +

If we wish to apply a quantifier to more than just the preceding +character we can use parentheses to group characters together in +an expression. For example, tag+ matches a 't' followed by +an 'a' followed by at least one 'g', whereas (tag)+ matches +at least one occurrence of 'tag'. +

Note that quantifiers are "greedy". They will match as much text +as they can. For example, 0+ will match as many zeros as it +can from the first zero it finds, e.g. '2.0005'. +Quantifiers can be made non-greedy, see setMinimal(). +

Capturing Text +

Parentheses allow us to group elements together so that we can +quantify and capture them. For example if we have the expression +mail|letter|correspondence that matches a string we know +that one of the words matched but not which one. Using +parentheses allows us to "capture" whatever is matched within +their bounds, so if we used (mail|letter|correspondence) +and matched this regexp against the string "I sent you some email" +we can use the cap() or capturedTexts() functions to extract the +matched characters, in this case 'mail'. +

We can use captured text within the regexp itself. To refer to the +captured text we use backreferences which are indexed from 1, +the same as for cap(). For example we could search for duplicate +words in a string using \b(\w+)\W+\1\b which means match a +word boundary followed by one or more word characters followed by +one or more non-word characters followed by the same text as the +first parenthesised expression followed by a word boundary. +

If we want to use parentheses purely for grouping and not for +capturing we can use the non-capturing syntax, e.g. +(?:green|blue). Non-capturing parentheses begin '(?:' and +end ')'. In this example we match either 'green' or 'blue' but we +do not capture the match so we only know whether or not we matched +but not which color we actually found. Using non-capturing +parentheses is more efficient than using capturing parentheses +since the regexp engine has to do less book-keeping. +

Both capturing and non-capturing parentheses may be nested. +

Assertions +

Assertions make some statement about the text at the point where +they occur in the regexp but they do not match any characters. In +the following list E stands for any expression. +

^ +	The caret signifies the beginning of the string. If you +wish to match a literal `^` you must escape it by +writing \^. For example, ^#include will only +match strings which begin with the characters '#include'. +(When the caret is the first character of a character set it +has a special meaning, see Sets of + Characters.) +
$ +	The dollar signifies the end of the string. For example +*\d\s$** will match strings which end with a digit +optionally followed by whitespace. If you wish to match a +literal `$` you must escape it by writing +\$. +
\b +	A word boundary. For example the regexp +\bOK\b means match immediately after a word +boundary (e.g. start of string or whitespace) the letter 'O' +then the letter 'K' immediately before another word boundary +(e.g. end of string or whitespace). But note that the +assertion does not actually match any whitespace so if we +write (\bOK\b) and we have a match it will only +contain 'OK' even if the string is "Its OK now". +
\B +	A non-word boundary. This assertion is true wherever +\b is false. For example if we searched for +\Bon\B in "Left on" the match would fail (space +and end of string aren't non-word boundaries), but it would +match in "tonne". +
(?=E) +	Positive lookahead. This assertion is true if the +expression matches at this point in the regexp. For example, +const(?=\s+char) matches 'const' whenever it is +followed by 'char', as in 'static const char '. +(Compare with const\s+char, which matches 'static +const char '.) +
(?!E) +	Negative lookahead. This assertion is true if the +expression does not match at this point in the regexp. For +example, const(?!\s+char) matches 'const' except +when it is followed by 'char'. +

Wildcard Matching (globbing) +

Most command shells such as bash or cmd.exe support "file +globbing", the ability to identify a group of files by using +wildcards. The setWildcard() function is used to switch between +regexp and wildcard mode. Wildcard matching is much simpler than +full regexps and has only four features: +

c +	Any character represents itself apart from those mentioned +below. Thus c matches the character c. +
? +	This matches any single character. It is the same as +. in full regexps. +
* +	This matches zero or more of any characters. It is the +same as .* in full regexps. +
[...] +	Sets of characters can be represented in square brackets, +similar to full regexps. Within the character class, like +outside, backslash has no special meaning. +

For example if we are in wildcard mode and have strings which +contain filenames we could identify HTML files with *.html. +This will match zero or more characters followed by a dot followed +by 'h', 't', 'm' and 'l'. +

Notes for Perl Users +

Most of the character class abbreviations supported by Perl are +supported by TQRegExp, see characters + and abbreviations for sets of characters. +

In TQRegExp, apart from within character classes, ^ always +signifies the start of the string, so carets must always be +escaped unless used for that purpose. In Perl the meaning of caret +varies automagically depending on where it occurs so escaping it +is rarely necessary. The same applies to $ which in +TQRegExp always signifies the end of the string. +

TQRegExp's quantifiers are the same as Perl's greedy quantifiers. +Non-greedy matching cannot be applied to individual quantifiers, +but can be applied to all the quantifiers in the pattern. For +example, to match the Perl regexp ro+?m retquires: +

+    TQRegExp rx( "ro+m" );
+    rx.setMinimal( TRUE );
+

+ +

The equivalent of Perl's /i option is +setCaseSensitive(FALSE). +

Perl's /g option can be emulated using a loop. +

In TQRegExp . matches any character, therefore all TQRegExp +regexps have the equivalent of Perl's /s option. TQRegExp +does not have an equivalent to Perl's /m option, but this +can be emulated in various ways for example by splitting the input +into lines or by looping with a regexp that searches for newlines. +

Because TQRegExp is string oriented there are no \A, \Z or \z +assertions. The \G assertion is not supported but can be emulated +in a loop. +

Perl's $& is cap(0) or capturedTexts()[0]. There are no TQRegExp +equivalents for $`, $' or $+. Perl's capturing variables, $1, $2, +... correspond to cap(1) or capturedTexts()[1], cap(2) or +capturedTexts()[2], etc. +

To substitute a pattern use TQString::replace(). +

Perl's extended /x syntax is not supported, nor are +directives, e.g. (?i), or regexp comments, e.g. (?#comment). On +the other hand, C++'s rules for literal strings can be used to +achieve the same: +

+    TQRegExp mark( "\\b" // word boundary
+                  "[Mm]ark" // the word we want to match
+                );
+

+ +

Both zero-width positive and zero-width negative lookahead +assertions (?=pattern) and (?!pattern) are supported with the same +syntax as Perl. Perl's lookbehind assertions, "independent" +subexpressions and conditional expressions are not supported. +

Non-capturing parentheses are also supported, with the same +(?:pattern) syntax. +

See TQStringList::split() and TQStringList::join() for equivalents +to Perl's split and join functions. +

Note: because C++ transforms \'s they must be written twice in +code, e.g. \b must be written \\b. +

Code Examples +

+    TQRegExp rx( "^\\d\\d?$" );  // match integers 0 to 99
+    rx.search( "123" );         // returns -1 (no match)
+    rx.search( "-6" );          // returns -1 (no match)
+    rx.search( "6" );           // returns 0 (matched as position 0)
+

+ +

The third string matches '6'. This is a simple validation +regexp for integers in the range 0 to 99. +

+    TQRegExp rx( "^\\S+$" );     // match strings without whitespace
+    rx.search( "Hello world" ); // returns -1 (no match)
+    rx.search( "This_is-OK" );  // returns 0 (matched at position 0)
+

+ +

The second string matches 'This_is-OK'. We've used the +character set abbreviation '\S' (non-whitespace) and the anchors +to match strings which contain no whitespace. +

In the following example we match strings containing 'mail' or +'letter' or 'correspondence' but only match whole words i.e. not +'email' +

+    TQRegExp rx( "\\b(mail|letter|correspondence)\\b" );
+    rx.search( "I sent you an email" );     // returns -1 (no match)
+    rx.search( "Please write the letter" ); // returns 17
+

+ +

The second string matches "Please write the letter". The +word 'letter' is also captured (because of the parentheses). We +can see what text we've captured like this: +

+    TQString captured = rx.cap( 1 ); // captured == "letter"
+

+ +

This will capture the text from the first set of capturing +parentheses (counting capturing left parentheses from left to +right). The parentheses are counted from 1 since cap( 0 ) is the +whole matched regexp (equivalent to '&' in most regexp engines). +

+    TQRegExp rx( "&(?!amp;)" );      // match ampersands but not &amp;
+    TQString line1 = "This & that";
+    line1.replace( rx, "&amp;" );
+    // line1 == "This &amp; that"
+    TQString line2 = "His &amp; hers & theirs";
+    line2.replace( rx, "&amp;" );
+    // line2 == "His &amp; hers &amp; theirs"
+

+ +

Here we've passed the TQRegExp to TQString's replace() function to +replace the matched text with new text. +

+    TQString str = "One Eric another Eirik, and an Ericsson."
+                    " How many Eiriks, Eric?";
+    TQRegExp rx( "\\b(Eric|Eirik)\\b" ); // match Eric or Eirik
+    int pos = 0;    // where we are in the string
+    int count = 0;  // how many Eric and Eirik's we've counted
+    while ( pos >= 0 ) {
+        pos = rx.search( str, pos );
+        if ( pos >= 0 ) {
+            pos++;      // move along in str
+            count++;    // count our Eric or Eirik
+        }
+    }
+

+ +

We've used the search() function to repeatedly match the regexp in +the string. Note that instead of moving forward by one character +at a time pos++ we could have written pos += rx.matchedLength() to skip over the already matched string. The +count will equal 3, matching 'One Eric another +Eirik, and an Ericsson. How many Eiriks, Eric?'; it +doesn't match 'Ericsson' or 'Eiriks' because they are not bounded +by non-word boundaries. +

One common use of regexps is to split lines of delimited data into +their component fields. +

+    str = "Trolltech AS\twww.trolltech.com\tNorway";
+    TQString company, web, country;
+    rx.setPattern( "^([^\t]+)\t([^\t]+)\t([^\t]+)$" );
+    if ( rx.search( str ) != -1 ) {
+        company = rx.cap( 1 );
+        web = rx.cap( 2 );
+        country = rx.cap( 3 );
+    }
+

+ +

In this example our input lines have the format company name, web +address and country. Unfortunately the regexp is rather long and +not very versatile -- the code will break if we add any more +fields. A simpler and better solution is to look for the +separator, '\t' in this case, and take the surrounding text. The +TQStringList split() function can take a separator string or regexp +as an argument and split a string accordingly. +

+    TQStringList field = TQStringList::split( "\t", str );
+

+ +

Here field[0] is the company, field[1] the web address and so on. +

To imitate the matching of a shell we can use wildcard mode. +

+    TQRegExp rx( "*.html" );         // invalid regexp: * doesn't quantify anything
+    rx.setWildcard( TRUE );         // now it's a valid wildcard regexp
+    rx.exactMatch( "index.html" );  // returns TRUE
+    rx.exactMatch( "default.htm" ); // returns FALSE
+    rx.exactMatch( "readme.txt" );  // returns FALSE
+

+ +

Wildcard matching can be convenient because of its simplicity, but +any wildcard regexp can be defined using full regexps, e.g. +.*\.html$. Notice that we can't match both .html and .htm files with a wildcard unless we use *.htm* which will +also match 'test.html.bak'. A full regexp gives us the precision +we need, .*\.html?$. +

TQRegExp can match case insensitively using setCaseSensitive(), and +can use non-greedy matching, see setMinimal(). By default TQRegExp +uses full regexps but this can be changed with setWildcard(). +Searching can be forward with search() or backward with +searchRev(). Captured text can be accessed using capturedTexts() +which returns a string list of all captured strings, or using +cap() which returns the captured string for the given index. The +pos() function takes a match index and returns the position in the +string where the match was made (or -1 if there was no match). +

+ +

Member Type Documentation

TQRegExp::CaretMode

+ +

The CaretMode enum defines the different meanings of the caret +(^) in a regular expression. The possible values are: +

TQRegExp::CaretAtZero - +The caret corresponds to index 0 in the searched string. +
TQRegExp::CaretAtOffset - +The caret corresponds to the start offset of the search. +
TQRegExp::CaretWontMatch - +The caret never matches. +

Member Function Documentation

TQRegExp::TQRegExp () +

+Constructs an empty regexp. +

See also isValid() and errorString(). + +

TQRegExp::TQRegExp ( const TQString & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE ) +

+Constructs a regular expression object for the given pattern +string. The pattern must be given using wildcard notation if wildcard is TRUE (default is FALSE). The pattern is case +sensitive, unless caseSensitive is FALSE. Matching is greedy +(maximal), but can be changed by calling setMinimal(). +

See also setPattern(), setCaseSensitive(), setWildcard(), and setMinimal(). + +

TQRegExp::TQRegExp ( const TQRegExp & rx ) +

+Constructs a regular expression as a copy of rx. +

TQRegExp::~TQRegExp () +

+Destroys the regular expression and cleans up its internal data. + +

TQString TQRegExp::cap ( int nth = 0 ) +

+Returns the text captured by the nth subexpression. The entire +match has index 0 and the parenthesized subexpressions have +indices starting from 1 (excluding non-capturing parentheses). +

+    TQRegExp rxlen( "(\\d+)(?:\\s*)(cm|inch)" );
+    int pos = rxlen.search( "Length: 189cm" );
+    if ( pos > -1 ) {
+        TQString value = rxlen.cap( 1 ); // "189"
+        TQString unit = rxlen.cap( 2 );  // "cm"
+        // ...
+    }
+

+ +

The order of elements matched by cap() is as follows. The first +element, cap(0), is the entire matching string. Each subsequent +element corresponds to the next capturing open left parentheses. +Thus cap(1) is the text of the first capturing parentheses, cap(2) +is the text of the second, and so on. +

+Some patterns may lead to a number of matches which cannot be +determined in advance, for example: +

+    TQRegExp rx( "(\\d+)" );
+    str = "Offsets: 12 14 99 231 7";
+    TQStringList list;
+    pos = 0;
+    while ( pos >= 0 ) {
+        pos = rx.search( str, pos );
+        if ( pos > -1 ) {
+            list += rx.cap( 1 );
+            pos  += rx.matchedLength();
+        }
+    }
+    // list contains "12", "14", "99", "231", "7"
+

+ +

See also capturedTexts(), pos(), exactMatch(), search(), and searchRev(). + +

Examples: network/archivesearch/archivedialog.ui.h and regexptester/regexptester.cpp. +

TQStringList TQRegExp::capturedTexts () +

+Returns a list of the captured text strings. +

The first string in the list is the entire matched string. Each +subsequent list element contains a string that matched a +(capturing) subexpression of the regexp. +

For example: +

+        TQRegExp rx( "(\\d+)(\\s*)(cm|inch(es)?)" );
+        int pos = rx.search( "Length: 36 inches" );
+        TQStringList list = rx.capturedTexts();
+        // list is now ( "36 inches", "36", " ", "inches", "es" )
+

+ +

The above example also captures elements that may be present but +which we have no interest in. This problem can be solved by using +non-capturing parentheses: +

+        TQRegExp rx( "(\\d+)(?:\\s*)(cm|inch(?:es)?)" );
+        int pos = rx.search( "Length: 36 inches" );
+        TQStringList list = rx.capturedTexts();
+        // list is now ( "36 inches", "36", "inches" )
+

+ +

Note that if you want to iterate over the list, you should iterate +over a copy, e.g. +

+        TQStringList list = rx.capturedTexts();
+        TQStringList::Iterator it = list.begin();
+        while( it != list.end() ) {
+            myProcessing( *it );
+            ++it;
+        }
+

+ +

Some regexps can match an indeterminate number of times. For +example if the input string is "Offsets: 12 14 99 231 7" and the +regexp, rx, is (\d+)+, we would hope to get a list of +all the numbers matched. However, after calling +rx.search(str), capturedTexts() will return the list ( "12", +"12" ), i.e. the entire match was "12" and the first subexpression +matched was "12". The correct approach is to use cap() in a loop. +

The order of elements in the string list is as follows. The first +element is the entire matching string. Each subsequent element +corresponds to the next capturing open left parentheses. Thus +capturedTexts()[1] is the text of the first capturing parentheses, +capturedTexts()[2] is the text of the second and so on +(corresponding to $1, $2, etc., in some other regexp languages). +

See also cap(), pos(), exactMatch(), search(), and searchRev(). + +

bool TQRegExp::caseSensitive () const +

+Returns TRUE if case sensitivity is enabled; otherwise returns +FALSE. The default is TRUE. +

TQString TQRegExp::errorString () +

+Returns a text string that explains why a regexp pattern is +invalid the case being; otherwise returns "no error occurred". +

TQString TQRegExp::escape ( const TQString & str ) `[static]` +

+Returns the string str with every regexp special character +escaped with a backslash. The special characters are $, (, ), *, +, +., ?, [, \, ], ^, {, | and }. +

Example: +

+     s1 = TQRegExp::escape( "bingo" );   // s1 == "bingo"
+     s2 = TQRegExp::escape( "f(x)" );    // s2 == "f\\(x\\)"
+

+ +

This function is useful to construct regexp patterns dynamically: +

+    TQRegExp rx( "(" + TQRegExp::escape(name) +
+                "|" + TQRegExp::escape(alias) + ")" );
+

+ + +

bool TQRegExp::exactMatch ( const TQString & str ) const +

+Returns TRUE if str is matched exactly by this regular expression; otherwise returns FALSE. You can determine how much of +the string was matched by calling matchedLength(). +

For a given regexp string, R, exactMatch("R") is the equivalent of +search("^R$") since exactMatch() effectively encloses the regexp +in the start of string and end of string anchors, except that it +sets matchedLength() differently. +

For example, if the regular expression is blue, then +exactMatch() returns TRUE only for input blue. For inputs bluebell, blutak and lightblue, exactMatch() returns FALSE +and matchedLength() will return 4, 3 and 0 respectively. +

Although const, this function sets matchedLength(), +capturedTexts() and pos(). +

See also search(), searchRev(), and TQRegExpValidator. + +

bool TQRegExp::isEmpty () const +

+Returns TRUE if the pattern string is empty; otherwise returns +FALSE. +

If you call exactMatch() with an empty pattern on an empty string +it will return TRUE; otherwise it returns FALSE since it operates +over the whole string. If you call search() with an empty pattern +on any string it will return the start offset (0 by default) +because the empty pattern matches the 'emptiness' at the start of +the string. In this case the length of the match returned by +matchedLength() will be 0. +

See TQString::isEmpty(). + +

bool TQRegExp::isValid () const +

+Returns TRUE if the regular expression is valid; otherwise returns +FALSE. An invalid regular expression never matches. +

The pattern [a-z is an example of an invalid pattern, since +it lacks a closing square bracket. +

Note that the validity of a regexp may also depend on the setting +of the wildcard flag, for example *.html is a valid +wildcard regexp but an invalid full regexp. +

int TQRegExp::match ( const TQString & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const +

This function is obsolete. It is provided to keep old source working. We strongly advise against using it in new code. +

Attempts to match in str, starting from position index. +Returns the position of the match, or -1 if there was no match. +

The length of the match is stored in *len, unless len is a +null pointer. +

If indexIsStart is TRUE (the default), the position index in +the string will match the start of string anchor, ^, in the +regexp, if present. Otherwise, position 0 in str will match. +

Use search() and matchedLength() instead of this function. +

See also TQString::mid() and TQConstString. + +

Example: qmag/qmag.cpp. +

int TQRegExp::matchedLength () const +

+Returns the length of the last matched string, or -1 if there was +no match. +

See also exactMatch(), search(), and searchRev(). + +

Examples: network/archivesearch/archivedialog.ui.h and regexptester/regexptester.cpp. +

bool TQRegExp::minimal () const +

+Returns TRUE if minimal (non-greedy) matching is enabled; +otherwise returns FALSE. +

int TQRegExp::numCaptures () const +

+Returns the number of captures contained in the regular expression. + +

Example: regexptester/regexptester.cpp. +

bool TQRegExp::operator!= ( const TQRegExp & rx ) const +

+ +

Returns TRUE if this regular expression is not equal to rx; +otherwise returns FALSE. +

TQRegExp & TQRegExp::operator= ( const TQRegExp & rx ) +

+Copies the regular expression rx and returns a reference to the +copy. The case sensitivity, wildcard and minimal matching options +are also copied. + +

bool TQRegExp::operator== ( const TQRegExp & rx ) const +

+Returns TRUE if this regular expression is equal to rx; +otherwise returns FALSE. +

Two TQRegExp objects are equal if they have the same pattern +strings and the same settings for case sensitivity, wildcard and +minimal matching. + +

TQString TQRegExp::pattern () const +

+Returns the pattern string of the regular expression. The pattern +has either regular expression syntax or wildcard syntax, depending +on wildcard(). +

int TQRegExp::pos ( int nth = 0 ) +

+Returns the position of the nth captured text in the searched +string. If nth is 0 (the default), pos() returns the position +of the whole match. +

Example: +

+    TQRegExp rx( "/([a-z]+)/([a-z]+)" );
+    rx.search( "Output /dev/null" );    // returns 7 (position of /dev/null)
+    rx.pos( 0 );                        // returns 7 (position of /dev/null)
+    rx.pos( 1 );                        // returns 8 (position of dev)
+    rx.pos( 2 );                        // returns 12 (position of null)
+

+ +

For zero-length matches, pos() always returns -1. (For example, if +cap(4) would return an empty string, pos(4) returns -1.) This is +due to an implementation tradeoff. +

See also capturedTexts(), exactMatch(), search(), and searchRev(). + +

int TQRegExp::search ( const TQString & str, int offset = 0, CaretMode caretMode = CaretAtZero ) const +

+Attempts to find a match in str from position offset (0 by +default). If offset is -1, the search starts at the last +character; if -2, at the next to last character; etc. +

Returns the position of the first match, or -1 if there was no +match. +

The caretMode parameter can be used to instruct whether ^ +should match at index 0 or at offset. +

You might prefer to use TQString::find(), TQString::contains() or +even TQStringList::grep(). To replace matches use +TQString::replace(). +

Example: +

+        TQString str = "offsets: 1.23 .50 71.00 6.00";
+        TQRegExp rx( "\\d*\\.\\d+" );    // primitive floating point matching
+        int count = 0;
+        int pos = 0;
+        while ( (pos = rx.search(str, pos)) != -1 ) {
+            count++;
+            pos += rx.matchedLength();
+        }
+        // pos will be 9, 14, 18 and finally 24; count will end up as 4
+

+ +

Although const, this function sets matchedLength(), +capturedTexts() and pos(). +

See also searchRev() and exactMatch(). + +

Examples: network/archivesearch/archivedialog.ui.h and regexptester/regexptester.cpp. +

int TQRegExp::searchRev ( const TQString & str, int offset = -1, CaretMode caretMode = CaretAtZero ) const +

+Attempts to find a match backwards in str from position offset. If offset is -1 (the default), the search starts at the +last character; if -2, at the next to last character; etc. +

Returns the position of the first match, or -1 if there was no +match. +

The caretMode parameter can be used to instruct whether ^ +should match at index 0 or at offset. +

Although const, this function sets matchedLength(), +capturedTexts() and pos(). +

Warning: Searching backwards is much slower than searching +forwards. +

void TQRegExp::setCaseSensitive ( bool sensitive ) +

+Sets case sensitive matching to sensitive. +

If sensitive is TRUE, \.txt$ matches readme.txt but +not README.TXT. +

void TQRegExp::setMinimal ( bool minimal ) +

+Enables or disables minimal matching. If minimal is FALSE, +matching is greedy (maximal) which is the default. +

For example, suppose we have the input string "We must be +bold, very bold!" and the pattern +.*. With the default greedy (maximal) matching, +the match is "We must be bold, very +bold!". But with minimal (non-greedy) matching the +first match is: "We must be bold, very +bold!" and the second match is "We must be bold, +very bold!". In practice we might use the pattern +[^<]+ instead, although this will still fail for +nested tags. +

void TQRegExp::setPattern ( const TQString & pattern ) +

+Sets the pattern string to pattern. The case sensitivity, +wildcard and minimal matching options are not changed. +

void TQRegExp::setWildcard ( bool wildcard ) +

+Sets the wildcard mode for the regular expression. The default is +FALSE. +

Setting wildcard to TRUE enables simple shell-like wildcard +matching. (See wildcard matching + (globbing).) +

For example, r*.txt matches the string readme.txt in +wildcard mode, but does not match readme. +

bool TQRegExp::wildcard () const +

+Returns TRUE if wildcard mode is enabled; otherwise returns FALSE. +The default is FALSE. +

TQRegExp Class Reference

Public Members

Static Public Members

Detailed Description

Introduction +

Characters and Abbreviations for Sets of Characters +

Sets of Characters +

Quantifiers +

Capturing Text +

Assertions +

Wildcard Matching (globbing) +

Notes for Perl Users +

Code Examples +

Member Type Documentation

TQRegExp::CaretMode

Member Function Documentation

TQRegExp::TQRegExp () +

TQRegExp::TQRegExp ( const TQString & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE ) +

TQRegExp::TQRegExp ( const TQRegExp & rx ) +

TQRegExp::~TQRegExp () +

TQString TQRegExp::cap ( int nth = 0 ) +

TQStringList TQRegExp::capturedTexts () +

bool TQRegExp::caseSensitive () const +

TQString TQRegExp::errorString () +

TQString TQRegExp::escape ( const TQString & str ) [static] +

bool TQRegExp::exactMatch ( const TQString & str ) const +

bool TQRegExp::isEmpty () const +

bool TQRegExp::isValid () const +

int TQRegExp::match ( const TQString & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const +

int TQRegExp::matchedLength () const +

bool TQRegExp::minimal () const +

int TQRegExp::numCaptures () const +

bool TQRegExp::operator!= ( const TQRegExp & rx ) const +

TQRegExp & TQRegExp::operator= ( const TQRegExp & rx ) +

bool TQRegExp::operator== ( const TQRegExp & rx ) const +

TQString TQRegExp::pattern () const +

int TQRegExp::pos ( int nth = 0 ) +

int TQRegExp::search ( const TQString & str, int offset = 0, CaretMode caretMode = CaretAtZero ) const +

int TQRegExp::searchRev ( const TQString & str, int offset = -1, CaretMode caretMode = CaretAtZero ) const +

void TQRegExp::setCaseSensitive ( bool sensitive ) +

void TQRegExp::setMinimal ( bool minimal ) +

void TQRegExp::setPattern ( const TQString & pattern ) +

void TQRegExp::setWildcard ( bool wildcard ) +

bool TQRegExp::wildcard () const +

TQString TQRegExp::escape ( const TQString & str ) `[static]` +