diff options
Diffstat (limited to 'tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook')
-rw-r--r-- | tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook | 1167 |
1 files changed, 217 insertions, 950 deletions
diff --git a/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook b/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook index c692da92cd5..5adc38a3f0c 100644 --- a/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook +++ b/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook @@ -1,491 +1,162 @@ <appendix id="regular-expressions"> <appendixinfo> <authorgroup> -<author ->&Anders.Lund; &Anders.Lund.mail;</author> -<othercredit role="translator" -><firstname ->Malcolm</firstname -><surname ->Hunter</surname -><affiliation -><address -><email ->malcolm.hunter@gmx.co.uk</email -></address -></affiliation -><contrib ->Conversion to British English</contrib -></othercredit -> +<author>&Anders.Lund; &Anders.Lund.mail;</author> +<othercredit role="translator"><firstname>Malcolm</firstname><surname>Hunter</surname><affiliation><address><email>malcolm.hunter@gmx.co.uk</email></address></affiliation><contrib>Conversion to British English</contrib></othercredit> </authorgroup> </appendixinfo> -<title ->Regular Expressions</title> +<title>Regular Expressions</title> -<synopsis ->This Appendix contains a brief but hopefully sufficient and -covering introduction to the world of <emphasis ->regular -expressions</emphasis ->. It documents regular expressions in the form +<synopsis>This Appendix contains a brief but hopefully sufficient and +covering introduction to the world of <emphasis>regular +expressions</emphasis>. It documents regular expressions in the form available within &kate;, which is not compatible with the regular expressions of perl, nor with those of for example -<command ->grep</command ->.</synopsis> +<command>grep</command>.</synopsis> <sect1> -<title ->Introduction</title> - -<para -><emphasis ->Regular Expressions</emphasis -> provides us with a way to describe some possible contents of a text string in a way understood by a small piece of software, so that it can investigate if a text matches, and also in the case of advanced applications with the means of saving pieces or the matching text.</para> - -<para ->An example: Say you want to search a text for paragraphs that starts with either of the names <quote ->Henrik</quote -> or <quote ->Pernille</quote -> followed by some form of the verb <quote ->say</quote ->.</para> - -<para ->With a normal search, you would start out searching for the first name, <quote ->Henrik</quote -> maybe followed by <quote ->sa</quote -> like this: <userinput ->Henrik sa</userinput ->, and while looking for matches, you would have to discard those not being the beginning of a paragraph, as well as those in which the word starting with the letters <quote ->sa</quote -> was not either <quote ->says</quote ->, <quote ->said</quote -> or so. And then of cause repeat all of that with the next name...</para> - -<para ->With Regular Expressions, that task could be accomplished with a single search, and with a larger degree of preciseness.</para> - -<para ->To achieve this, Regular Expressions defines rules for expressing in details a generalisation of a string to match. Our example, which we might literally express like this: <quote ->A line starting with either <quote ->Henrik</quote -> or <quote ->Pernille</quote -> (possibly following up to 4 blanks or tab characters) followed by a whitespace followed by <quote ->sa</quote -> and then either <quote ->ys</quote -> or <quote ->id</quote -></quote -> could be expressed with the following regular expression:</para -> <para -><userinput ->^[ \t]{0,4}(Henrik|Pernille) sa(ys|id)</userinput -></para> - -<para ->The above example demonstrates all four major concepts of modern Regular Expressions, namely:</para> +<title>Introduction</title> + +<para><emphasis>Regular Expressions</emphasis> provides us with a way to describe some possible contents of a text string in a way understood by a small piece of software, so that it can investigate if a text matches, and also in the case of advanced applications with the means of saving pieces or the matching text.</para> + +<para>An example: Say you want to search a text for paragraphs that starts with either of the names <quote>Henrik</quote> or <quote>Pernille</quote> followed by some form of the verb <quote>say</quote>.</para> + +<para>With a normal search, you would start out searching for the first name, <quote>Henrik</quote> maybe followed by <quote>sa</quote> like this: <userinput>Henrik sa</userinput>, and while looking for matches, you would have to discard those not being the beginning of a paragraph, as well as those in which the word starting with the letters <quote>sa</quote> was not either <quote>says</quote>, <quote>said</quote> or so. And then of cause repeat all of that with the next name...</para> + +<para>With Regular Expressions, that task could be accomplished with a single search, and with a larger degree of preciseness.</para> + +<para>To achieve this, Regular Expressions defines rules for expressing in details a generalisation of a string to match. Our example, which we might literally express like this: <quote>A line starting with either <quote>Henrik</quote> or <quote>Pernille</quote> (possibly following up to 4 blanks or tab characters) followed by a whitespace followed by <quote>sa</quote> and then either <quote>ys</quote> or <quote>id</quote></quote> could be expressed with the following regular expression:</para> <para><userinput>^[ \t]{0,4}(Henrik|Pernille) sa(ys|id)</userinput></para> + +<para>The above example demonstrates all four major concepts of modern Regular Expressions, namely:</para> <itemizedlist> -<listitem -><para ->Patterns</para -></listitem> -<listitem -><para ->Assertions</para -></listitem> -<listitem -><para ->Quantifiers</para -></listitem> -<listitem -><para ->Back references</para -></listitem> +<listitem><para>Patterns</para></listitem> +<listitem><para>Assertions</para></listitem> +<listitem><para>Quantifiers</para></listitem> +<listitem><para>Back references</para></listitem> </itemizedlist> -<para ->The caret (<literal ->^</literal ->) starting the expression is an assertion, being true only if the following matching string is at the start of a line.</para> - -<para ->The stings <literal ->[ \t]</literal -> and <literal ->(Henrik|Pernille) sa(ys|id)</literal -> are patterns. The first one is a <emphasis ->character class</emphasis -> that matches either a blank or a (horizontal) tab character; the other pattern contains first a subpattern matching either <literal ->Henrik</literal -> <emphasis ->or</emphasis -> <literal ->Pernille</literal ->, then a piece matching the exact string <literal -> sa</literal -> and finally a subpattern matching either <literal ->ys</literal -> <emphasis ->or</emphasis -> <literal ->id</literal -></para> - -<para ->The string <literal ->{0,4}</literal -> is a quantifier saying <quote ->anywhere from 0 up to 4 of the previous</quote ->.</para> - -<para ->Because regular expression software supporting the concept of <emphasis ->back references</emphasis -> saves the entire matching part of the string as well as sub-patterns enclosed in parentheses, given some means of access to those references, we could get our hands on either the whole match (when searching a text document in an editor with a regular expression, that is often marked as selected) or either the name found, or the last part of the verb.</para> - -<para ->All together, the expression will match where we wanted it to, and only there.</para> - -<para ->The following sections will describe in details how to construct and use patterns, character classes, assertions, quantifiers and back references, and the final section will give a few useful examples.</para> +<para>The caret (<literal>^</literal>) starting the expression is an assertion, being true only if the following matching string is at the start of a line.</para> + +<para>The stings <literal>[ \t]</literal> and <literal>(Henrik|Pernille) sa(ys|id)</literal> are patterns. The first one is a <emphasis>character class</emphasis> that matches either a blank or a (horizontal) tab character; the other pattern contains first a subpattern matching either <literal>Henrik</literal> <emphasis>or</emphasis> <literal>Pernille</literal>, then a piece matching the exact string <literal> sa</literal> and finally a subpattern matching either <literal>ys</literal> <emphasis>or</emphasis> <literal>id</literal></para> + +<para>The string <literal>{0,4}</literal> is a quantifier saying <quote>anywhere from 0 up to 4 of the previous</quote>.</para> + +<para>Because regular expression software supporting the concept of <emphasis>back references</emphasis> saves the entire matching part of the string as well as sub-patterns enclosed in parentheses, given some means of access to those references, we could get our hands on either the whole match (when searching a text document in an editor with a regular expression, that is often marked as selected) or either the name found, or the last part of the verb.</para> + +<para>All together, the expression will match where we wanted it to, and only there.</para> + +<para>The following sections will describe in details how to construct and use patterns, character classes, assertions, quantifiers and back references, and the final section will give a few useful examples.</para> </sect1> <sect1 id="regex-patterns"> -<title ->Patterns</title> +<title>Patterns</title> -<para ->Patterns consists of literal strings and character classes. Patterns may contain sub-patterns, which are patterns enclosed in parentheses.</para> +<para>Patterns consists of literal strings and character classes. Patterns may contain sub-patterns, which are patterns enclosed in parentheses.</para> <sect2> -<title ->Escaping characters</title> +<title>Escaping characters</title> -<para ->In patterns as well as in character classes, some characters have a special meaning. To literally match any of those characters, they must be marked or <emphasis ->escaped</emphasis -> to let the regular expression software know that it should interpret such characters in their literal meaning.</para> +<para>In patterns as well as in character classes, some characters have a special meaning. To literally match any of those characters, they must be marked or <emphasis>escaped</emphasis> to let the regular expression software know that it should interpret such characters in their literal meaning.</para> -<para ->This is done by prepending the character with a backslash (<literal ->\</literal ->).</para> +<para>This is done by prepending the character with a backslash (<literal>\</literal>).</para> -<para ->The regular expression software will silently ignore escaping a character that does not have any special meaning in the context, so escaping for example a <quote ->j</quote -> (<userinput ->\j</userinput ->) is safe. If you are in doubt whether a character could have a special meaning, you can therefore escape it safely.</para> +<para>The regular expression software will silently ignore escaping a character that does not have any special meaning in the context, so escaping for example a <quote>j</quote> (<userinput>\j</userinput>) is safe. If you are in doubt whether a character could have a special meaning, you can therefore escape it safely.</para> -<para ->Escaping of cause includes the backslash character it self, to literally match a such, you would write <userinput ->\\</userinput ->.</para> +<para>Escaping of cause includes the backslash character it self, to literally match a such, you would write <userinput>\\</userinput>.</para> </sect2> <sect2> -<title ->Character Classes and abbreviations</title> - -<para ->A <emphasis ->character class</emphasis -> is an expression that matches one of a defined set of characters. In Regular Expressions, character classes are defined by putting the legal characters for the class in square brackets, <literal ->[]</literal ->, or by using one of the abbreviated classes described below.</para> - -<para ->Simple character classes just contains one or more literal characters, for example <userinput ->[abc]</userinput -> (matching either of the letters <quote ->a</quote ->, <quote ->b</quote -> or <quote ->c</quote ->) or <userinput ->[0123456789]</userinput -> (matching any digit).</para> - -<para ->Because letters and digits have a logical order, you can abbreviate those by specifying ranges of them: <userinput ->[a-c]</userinput -> is equal to <userinput ->[abc]</userinput -> and <userinput ->[0-9]</userinput -> is equal to <userinput ->[0123456789]</userinput ->. Combining these constructs, for example <userinput ->[a-fynot1-38]</userinput -> is completely legal (the last one would match, of cause, either of <quote ->a</quote ->,<quote ->b</quote ->,<quote ->c</quote ->,<quote ->d</quote ->, <quote ->e</quote ->,<quote ->f</quote ->,<quote ->y</quote ->,<quote ->n</quote ->,<quote ->o</quote ->,<quote ->t</quote ->, <quote ->1</quote ->,<quote ->2</quote ->,<quote ->3</quote -> or <quote ->8</quote ->).</para> - -<para ->As capital letters are different characters from their non-capital equivalents, to create a caseless character class matching <quote ->a</quote -> or <quote ->b</quote ->, in any case, you need to write it <userinput ->[aAbB]</userinput ->.</para> - -<para ->It is of cause possible to create a <quote ->negative</quote -> class matching as <quote ->anything but</quote -> To do so put a caret (<literal ->^</literal ->) at the beginning of the class: </para> - -<para -><userinput ->[^abc]</userinput -> will match any character <emphasis ->but</emphasis -> <quote ->a</quote ->, <quote ->b</quote -> or <quote ->c</quote ->.</para> - -<para ->In addition to literal characters, some abbreviations are defined, making life still a bit easier: <variablelist> +<title>Character Classes and abbreviations</title> + +<para>A <emphasis>character class</emphasis> is an expression that matches one of a defined set of characters. In Regular Expressions, character classes are defined by putting the legal characters for the class in square brackets, <literal>[]</literal>, or by using one of the abbreviated classes described below.</para> + +<para>Simple character classes just contains one or more literal characters, for example <userinput>[abc]</userinput> (matching either of the letters <quote>a</quote>, <quote>b</quote> or <quote>c</quote>) or <userinput>[0123456789]</userinput> (matching any digit).</para> + +<para>Because letters and digits have a logical order, you can abbreviate those by specifying ranges of them: <userinput>[a-c]</userinput> is equal to <userinput>[abc]</userinput> and <userinput>[0-9]</userinput> is equal to <userinput>[0123456789]</userinput>. Combining these constructs, for example <userinput>[a-fynot1-38]</userinput> is completely legal (the last one would match, of cause, either of <quote>a</quote>,<quote>b</quote>,<quote>c</quote>,<quote>d</quote>, <quote>e</quote>,<quote>f</quote>,<quote>y</quote>,<quote>n</quote>,<quote>o</quote>,<quote>t</quote>, <quote>1</quote>,<quote>2</quote>,<quote>3</quote> or <quote>8</quote>).</para> + +<para>As capital letters are different characters from their non-capital equivalents, to create a caseless character class matching <quote>a</quote> or <quote>b</quote>, in any case, you need to write it <userinput>[aAbB]</userinput>.</para> + +<para>It is of cause possible to create a <quote>negative</quote> class matching as <quote>anything but</quote> To do so put a caret (<literal>^</literal>) at the beginning of the class: </para> + +<para><userinput>[^abc]</userinput> will match any character <emphasis>but</emphasis> <quote>a</quote>, <quote>b</quote> or <quote>c</quote>.</para> + +<para>In addition to literal characters, some abbreviations are defined, making life still a bit easier: <variablelist> <varlistentry> -<term -><userinput ->\a</userinput -></term> -<listitem -><para ->This matches the <acronym ->ASCII</acronym -> bell character (BEL, 0x07).</para -></listitem> +<term><userinput>\a</userinput></term> +<listitem><para>This matches the <acronym>ASCII</acronym> bell character (BEL, 0x07).</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\f</userinput -></term> -<listitem -><para ->This matches the <acronym ->ASCII</acronym -> form feed character (FF, 0x0C).</para -></listitem> +<term><userinput>\f</userinput></term> +<listitem><para>This matches the <acronym>ASCII</acronym> form feed character (FF, 0x0C).</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\n</userinput -></term> -<listitem -><para ->This matches the <acronym ->ASCII</acronym -> line feed character (LF, 0x0A, Unix newline).</para -></listitem> +<term><userinput>\n</userinput></term> +<listitem><para>This matches the <acronym>ASCII</acronym> line feed character (LF, 0x0A, Unix newline).</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\r</userinput -></term> -<listitem -><para ->This matches the <acronym ->ASCII</acronym -> carriage return character (CR, 0x0D).</para -></listitem> +<term><userinput>\r</userinput></term> +<listitem><para>This matches the <acronym>ASCII</acronym> carriage return character (CR, 0x0D).</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\t</userinput -></term> -<listitem -><para ->This matches the <acronym ->ASCII</acronym -> horizontal tab character (HT, 0x09).</para -></listitem> +<term><userinput>\t</userinput></term> +<listitem><para>This matches the <acronym>ASCII</acronym> horizontal tab character (HT, 0x09).</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\v</userinput -></term> -<listitem -><para ->This matches the <acronym ->ASCII</acronym -> vertical tab character (VT, 0x0B).</para -></listitem> +<term><userinput>\v</userinput></term> +<listitem><para>This matches the <acronym>ASCII</acronym> vertical tab character (VT, 0x0B).</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\xhhhh</userinput -></term> - -<listitem -><para ->This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo (&ie;, \zero ooo) matches the <acronym ->ASCII</acronym ->/Latin-1 character corresponding to the octal number ooo (between 0 and 0377).</para -></listitem> +<term><userinput>\xhhhh</userinput></term> + +<listitem><para>This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo (&ie;, \zero ooo) matches the <acronym>ASCII</acronym>/Latin-1 character corresponding to the octal number ooo (between 0 and 0377).</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->.</userinput -> (dot)</term> -<listitem -><para ->This matches any character (including newline).</para -></listitem> +<term><userinput>.</userinput> (dot)</term> +<listitem><para>This matches any character (including newline).</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\d</userinput -></term> -<listitem -><para ->This matches a digit. Equal to <literal ->[0-9]</literal -></para -></listitem> +<term><userinput>\d</userinput></term> +<listitem><para>This matches a digit. Equal to <literal>[0-9]</literal></para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\D</userinput -></term> -<listitem -><para ->This matches a non-digit. Equal to <literal ->[^0-9]</literal -> or <literal ->[^\d]</literal -></para -></listitem> +<term><userinput>\D</userinput></term> +<listitem><para>This matches a non-digit. Equal to <literal>[^0-9]</literal> or <literal>[^\d]</literal></para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\s</userinput -></term> -<listitem -><para ->This matches a whitespace character. Practically equal to <literal ->[ \t\n\r]</literal -></para -></listitem> +<term><userinput>\s</userinput></term> +<listitem><para>This matches a whitespace character. Practically equal to <literal>[ \t\n\r]</literal></para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\S</userinput -></term> -<listitem -><para ->This matches a non-whitespace. Practically equal to <literal ->[^ \t\r\n]</literal ->, and equal to <literal ->[^\s]</literal -></para -></listitem> +<term><userinput>\S</userinput></term> +<listitem><para>This matches a non-whitespace. Practically equal to <literal>[^ \t\r\n]</literal>, and equal to <literal>[^\s]</literal></para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\w</userinput -></term> -<listitem -><para ->Matches any <quote ->word character</quote -> - in this case any letter or digit. Note that underscore (<literal ->_</literal ->) is not matched, as is the case with perl regular expressions. Equal to <literal ->[a-zA-Z0-9]</literal -></para -></listitem> +<term><userinput>\w</userinput></term> +<listitem><para>Matches any <quote>word character</quote> - in this case any letter or digit. Note that underscore (<literal>_</literal>) is not matched, as is the case with perl regular expressions. Equal to <literal>[a-zA-Z0-9]</literal></para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\W</userinput -></term> -<listitem -><para ->Matches any non-word character - anything but letters or numbers. Equal to <literal ->[^a-zA-Z0-9]</literal -> or <literal ->[^\w]</literal -></para -></listitem> +<term><userinput>\W</userinput></term> +<listitem><para>Matches any non-word character - anything but letters or numbers. Equal to <literal>[^a-zA-Z0-9]</literal> or <literal>[^\w]</literal></para></listitem> </varlistentry> @@ -493,69 +164,31 @@ expressions of perl, nor with those of for example </para> -<para ->The abbreviated classes can be put inside a custom class, for example to match a word character, a blank or a dot, you could write <userinput ->[\w \.]</userinput -></para -> +<para>The abbreviated classes can be put inside a custom class, for example to match a word character, a blank or a dot, you could write <userinput>[\w \.]</userinput></para> -<note -> <para ->The POSIX notation of classes, <userinput ->[:<class name>:]</userinput -> is currently not supported.</para -> </note> +<note> <para>The POSIX notation of classes, <userinput>[:<class name>:]</userinput> is currently not supported.</para> </note> <sect3> -<title ->Characters with special meanings inside character classes</title> +<title>Characters with special meanings inside character classes</title> -<para ->The following characters has a special meaning inside the <quote ->[]</quote -> character class construct, and must be escaped to be literally included in a class:</para> +<para>The following characters has a special meaning inside the <quote>[]</quote> character class construct, and must be escaped to be literally included in a class:</para> <variablelist> <varlistentry> -<term -><userinput ->]</userinput -></term> -<listitem -><para ->Ends the character class. Must be escaped unless it is the very first character in the class (may follow an unescaped caret)</para -></listitem> +<term><userinput>]</userinput></term> +<listitem><para>Ends the character class. Must be escaped unless it is the very first character in the class (may follow an unescaped caret)</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->^</userinput -> (caret)</term> -<listitem -><para ->Denotes a negative class, if it is the first character. Must be escaped to match literally if it is the first character in the class.</para -></listitem -> +<term><userinput>^</userinput> (caret)</term> +<listitem><para>Denotes a negative class, if it is the first character. Must be escaped to match literally if it is the first character in the class.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->-</userinput -> (dash)</term> -<listitem -><para ->Denotes a logical range. Must always be escaped within a character class.</para -></listitem> +<term><userinput>-</userinput> (dash)</term> +<listitem><para>Denotes a logical range. Must always be escaped within a character class.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\</userinput -> (backslash)</term> -<listitem -><para ->The escape character. Must always be escaped.</para -></listitem> +<term><userinput>\</userinput> (backslash)</term> +<listitem><para>The escape character. Must always be escaped.</para></listitem> </varlistentry> </variablelist> @@ -566,240 +199,110 @@ expressions of perl, nor with those of for example <sect2> -<title ->Alternatives: matching <quote ->one of</quote -></title> - -<para ->If you want to match one of a set of alternative patterns, you can separate those with <literal ->|</literal -> (vertical bar character).</para> - -<para ->For example to find either <quote ->John</quote -> or <quote ->Harry</quote -> you would use an expression <userinput ->John|Harry</userinput ->.</para> +<title>Alternatives: matching <quote>one of</quote></title> + +<para>If you want to match one of a set of alternative patterns, you can separate those with <literal>|</literal> (vertical bar character).</para> + +<para>For example to find either <quote>John</quote> or <quote>Harry</quote> you would use an expression <userinput>John|Harry</userinput>.</para> </sect2> <sect2> -<title ->Sub Patterns</title> +<title>Sub Patterns</title> -<para -><emphasis ->Sub patterns</emphasis -> are patterns enclosed in parentheses, and they have several uses in the world of regular expressions.</para> +<para><emphasis>Sub patterns</emphasis> are patterns enclosed in parentheses, and they have several uses in the world of regular expressions.</para> <sect3> -<title ->Specifying alternatives</title> - -<para ->You may use a sub pattern to group a set of alternatives within a larger pattern. The alternatives are separated by the character <quote ->|</quote -> (vertical bar).</para> - -<para ->For example to match either of the words <quote ->int</quote ->, <quote ->float</quote -> or <quote ->double</quote ->, you could use the pattern <userinput ->int|float|double</userinput ->. If you only want to find one if it is followed by some whitespace and then some letters, put the alternatives inside a subpattern: <userinput ->(int|float|double)\s+\w+</userinput ->.</para> +<title>Specifying alternatives</title> + +<para>You may use a sub pattern to group a set of alternatives within a larger pattern. The alternatives are separated by the character <quote>|</quote> (vertical bar).</para> + +<para>For example to match either of the words <quote>int</quote>, <quote>float</quote> or <quote>double</quote>, you could use the pattern <userinput>int|float|double</userinput>. If you only want to find one if it is followed by some whitespace and then some letters, put the alternatives inside a subpattern: <userinput>(int|float|double)\s+\w+</userinput>.</para> </sect3> <sect3> -<title ->Capturing matching text (back references)</title> - -<para ->If you want to use a back reference, use a sub pattern to have the desired part of the pattern remembered.</para> - -<para ->For example, it you want to find two occurrences of the same word separated by a comma and possibly some whitespace, you could write <userinput ->(\w+),\s*\1</userinput ->. The sub pattern <literal ->\w+</literal -> would find a chunk of word characters, and the entire expression would match if those were followed by a comma, 0 or more whitespace and then an equal chunk of word characters. (The string <literal ->\1</literal -> references <emphasis ->the first sub pattern enclosed in parentheses</emphasis ->)</para> - -<!-- <para ->See also <link linkend="backreferences" ->Back references</link ->.</para -> --> +<title>Capturing matching text (back references)</title> + +<para>If you want to use a back reference, use a sub pattern to have the desired part of the pattern remembered.</para> + +<para>For example, it you want to find two occurrences of the same word separated by a comma and possibly some whitespace, you could write <userinput>(\w+),\s*\1</userinput>. The sub pattern <literal>\w+</literal> would find a chunk of word characters, and the entire expression would match if those were followed by a comma, 0 or more whitespace and then an equal chunk of word characters. (The string <literal>\1</literal> references <emphasis>the first sub pattern enclosed in parentheses</emphasis>)</para> + +<!-- <para>See also <link linkend="backreferences">Back references</link>.</para> --> </sect3> <sect3 id="lookahead-assertions"> -<title ->Lookahead Assertions</title> - -<para ->A lookahead assertion is a sub pattern, starting with either <literal ->?=</literal -> or <literal ->?!</literal ->.</para> - -<para ->For example to match the literal string <quote ->Bill</quote -> but only if not followed by <quote -> Gates</quote ->, you could use this expression: <userinput ->Bill(?! Gates)</userinput ->. (This would find <quote ->Bill Clinton</quote -> as well as <quote ->Billy the kid</quote ->, but silently ignore the other matches.)</para> - -<para ->Sub patterns used for assertions are not captured.</para> - -<para ->See also <link linkend="assertions" ->Assertions</link -></para> +<title>Lookahead Assertions</title> + +<para>A lookahead assertion is a sub pattern, starting with either <literal>?=</literal> or <literal>?!</literal>.</para> + +<para>For example to match the literal string <quote>Bill</quote> but only if not followed by <quote> Gates</quote>, you could use this expression: <userinput>Bill(?! Gates)</userinput>. (This would find <quote>Bill Clinton</quote> as well as <quote>Billy the kid</quote>, but silently ignore the other matches.)</para> + +<para>Sub patterns used for assertions are not captured.</para> + +<para>See also <link linkend="assertions">Assertions</link></para> </sect3> </sect2> <sect2 id="special-characters-in-patterns"> -<title ->Characters with a special meaning inside patterns</title> +<title>Characters with a special meaning inside patterns</title> -<para ->The following characters have meaning inside a pattern, and must be escaped if you want to literally match them: <variablelist> +<para>The following characters have meaning inside a pattern, and must be escaped if you want to literally match them: <variablelist> <varlistentry> -<term -><userinput ->\</userinput -> (backslash)</term> -<listitem -><para ->The escape character.</para -></listitem> +<term><userinput>\</userinput> (backslash)</term> +<listitem><para>The escape character.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->^</userinput -> (caret)</term> -<listitem -><para ->Asserts the beginning of the string.</para -></listitem> +<term><userinput>^</userinput> (caret)</term> +<listitem><para>Asserts the beginning of the string.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->$</userinput -></term> -<listitem -><para ->Asserts the end of string.</para -></listitem> +<term><userinput>$</userinput></term> +<listitem><para>Asserts the end of string.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->()</userinput -> (left and right parentheses)</term> -<listitem -><para ->Denotes sub patterns.</para -></listitem> +<term><userinput>()</userinput> (left and right parentheses)</term> +<listitem><para>Denotes sub patterns.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->{}</userinput -> (left and right curly braces)</term> -<listitem -><para ->Denotes numeric quantifiers.</para -></listitem> +<term><userinput>{}</userinput> (left and right curly braces)</term> +<listitem><para>Denotes numeric quantifiers.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->[]</userinput -> (left and right square brackets)</term> -<listitem -><para ->Denotes character classes.</para -></listitem> +<term><userinput>[]</userinput> (left and right square brackets)</term> +<listitem><para>Denotes character classes.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->|</userinput -> (vertical bar)</term> -<listitem -><para ->logical OR. Separates alternatives.</para -></listitem> +<term><userinput>|</userinput> (vertical bar)</term> +<listitem><para>logical OR. Separates alternatives.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->+</userinput -> (plus sign)</term> -<listitem -><para ->Quantifier, 1 or more.</para -></listitem> +<term><userinput>+</userinput> (plus sign)</term> +<listitem><para>Quantifier, 1 or more.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->*</userinput -> (asterisk)</term> -<listitem -><para ->Quantifier, 0 or more.</para -></listitem> +<term><userinput>*</userinput> (asterisk)</term> +<listitem><para>Quantifier, 0 or more.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->?</userinput -> (question mark)</term> -<listitem -><para ->An optional character. Can be interpreted as a quantifier, 0 or 1.</para -></listitem> +<term><userinput>?</userinput> (question mark)</term> +<listitem><para>An optional character. Can be interpreted as a quantifier, 0 or 1.</para></listitem> </varlistentry> </variablelist> @@ -811,125 +314,58 @@ expressions of perl, nor with those of for example </sect1> <sect1 id="quantifiers"> -<title ->Quantifiers</title> - -<para -><emphasis ->Quantifiers</emphasis -> allows a regular expression to match a specified number or range of numbers of either a character, character class or sub pattern.</para> - -<para ->Quantifiers are enclosed in curly brackets (<literal ->{</literal -> and <literal ->}</literal ->) and have the general form <literal ->{[minimum-occurrences][,[maximum-occurrences]]}</literal -> </para> - -<para ->The usage is best explained by example: <variablelist> +<title>Quantifiers</title> + +<para><emphasis>Quantifiers</emphasis> allows a regular expression to match a specified number or range of numbers of either a character, character class or sub pattern.</para> + +<para>Quantifiers are enclosed in curly brackets (<literal>{</literal> and <literal>}</literal>) and have the general form <literal>{[minimum-occurrences][,[maximum-occurrences]]}</literal> </para> + +<para>The usage is best explained by example: <variablelist> <varlistentry> -<term -><userinput ->{1}</userinput -></term> -<listitem -><para ->Exactly 1 occurrence</para -></listitem> +<term><userinput>{1}</userinput></term> +<listitem><para>Exactly 1 occurrence</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->{0,1}</userinput -></term> -<listitem -><para ->Zero or 1 occurrences</para -></listitem> +<term><userinput>{0,1}</userinput></term> +<listitem><para>Zero or 1 occurrences</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->{,1}</userinput -></term> -<listitem -><para ->The same, with less work;)</para -></listitem> +<term><userinput>{,1}</userinput></term> +<listitem><para>The same, with less work;)</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->{5,10}</userinput -></term> -<listitem -><para ->At least 5 but maximum 10 occurrences.</para -></listitem> +<term><userinput>{5,10}</userinput></term> +<listitem><para>At least 5 but maximum 10 occurrences.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->{5,}</userinput -></term> -<listitem -><para ->At least 5 occurrences, no maximum.</para -></listitem> +<term><userinput>{5,}</userinput></term> +<listitem><para>At least 5 occurrences, no maximum.</para></listitem> </varlistentry> </variablelist> </para> -<para ->Additionally, there are some abbreviations: <variablelist> +<para>Additionally, there are some abbreviations: <variablelist> <varlistentry> -<term -><userinput ->*</userinput -> (asterisk)</term> -<listitem -><para ->similar to <literal ->{0,}</literal ->, find any number of occurrences.</para -></listitem> +<term><userinput>*</userinput> (asterisk)</term> +<listitem><para>similar to <literal>{0,}</literal>, find any number of occurrences.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->+</userinput -> (plus sign)</term> -<listitem -><para ->similar to <literal ->{1,}</literal ->, at least 1 occurrence.</para -></listitem> +<term><userinput>+</userinput> (plus sign)</term> +<listitem><para>similar to <literal>{1,}</literal>, at least 1 occurrence.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->?</userinput -> (question mark)</term> -<listitem -><para ->similar to <literal ->{0,1}</literal ->, zero or 1 occurrence.</para -></listitem> +<term><userinput>?</userinput> (question mark)</term> +<listitem><para>similar to <literal>{0,1}</literal>, zero or 1 occurrence.</para></listitem> </varlistentry> </variablelist> @@ -938,98 +374,39 @@ expressions of perl, nor with those of for example <sect2> -<title ->Greed</title> +<title>Greed</title> -<para ->When using quantifiers with no maximum, regular expressions defaults to match as much of the searched string as possible, commonly known as <emphasis ->greedy</emphasis -> behaviour.</para> +<para>When using quantifiers with no maximum, regular expressions defaults to match as much of the searched string as possible, commonly known as <emphasis>greedy</emphasis> behaviour.</para> -<para ->Modern regular expression software provides the means of <quote ->turning off greediness</quote ->, though in a graphical environment it is up to the interface to provide you with access to this feature. For example a search dialogue providing a regular expression search could have a check box labelled <quote ->Minimal matching</quote -> as well as it ought to indicate if greediness is the default behaviour.</para> +<para>Modern regular expression software provides the means of <quote>turning off greediness</quote>, though in a graphical environment it is up to the interface to provide you with access to this feature. For example a search dialogue providing a regular expression search could have a check box labelled <quote>Minimal matching</quote> as well as it ought to indicate if greediness is the default behaviour.</para> </sect2> <sect2> -<title ->In context examples</title> +<title>In context examples</title> -<para ->Here are a few examples of using quantifiers</para> +<para>Here are a few examples of using quantifiers</para> <variablelist> <varlistentry> -<term -><userinput ->^\d{4,5}\s</userinput -></term> -<listitem -><para ->Matches the digits in <quote ->1234 go</quote -> and <quote ->12345 now</quote ->, but neither in <quote ->567 eleven</quote -> nor in <quote ->223459 somewhere</quote -></para -></listitem> +<term><userinput>^\d{4,5}\s</userinput></term> +<listitem><para>Matches the digits in <quote>1234 go</quote> and <quote>12345 now</quote>, but neither in <quote>567 eleven</quote> nor in <quote>223459 somewhere</quote></para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\s+</userinput -></term> -<listitem -><para ->Matches one or more whitespace characters</para -></listitem> +<term><userinput>\s+</userinput></term> +<listitem><para>Matches one or more whitespace characters</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->(bla){1,}</userinput -></term> -<listitem -><para ->Matches all of <quote ->blablabla</quote -> and the <quote ->bla</quote -> in <quote ->blackbird</quote -> or <quote ->tabla</quote -></para -></listitem> +<term><userinput>(bla){1,}</userinput></term> +<listitem><para>Matches all of <quote>blablabla</quote> and the <quote>bla</quote> in <quote>blackbird</quote> or <quote>tabla</quote></para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->/?></userinput -></term> -<listitem -><para ->Matches <quote ->/></quote -> in <quote -><closeditem/></quote -> as well as <quote ->></quote -> in <quote -><openitem></quote ->.</para -></listitem> +<term><userinput>/?></userinput></term> +<listitem><para>Matches <quote>/></quote> in <quote><closeditem/></quote> as well as <quote>></quote> in <quote><openitem></quote>.</para></listitem> </varlistentry> </variablelist> @@ -1039,164 +416,56 @@ expressions of perl, nor with those of for example </sect1> <sect1 id="assertions"> -<title ->Assertions</title> - -<para -><emphasis ->Assertions</emphasis -> allows a regular expression to match only under certain controlled conditions.</para> - -<para ->An assertion does not need a character to match, it rather investigates the surroundings of a possible match before acknowledging it. For example the <emphasis ->word boundary</emphasis -> assertion does not try to find a non word character opposite a word one at its position, instead it makes sure that there is not a word character. This means that the assertion can match where there is no character, &ie; at the ends of a searched string.</para> - -<para ->Some assertions actually does have a pattern to match, but the part of the string matching that will not be a part of the result of the match of the full expression.</para> - -<para ->Regular Expressions as documented here supports the following assertions: <variablelist> - -<varlistentry -> -<term -><userinput ->^</userinput -> (caret: beginning of string)</term -> -<listitem -><para ->Matches the beginning of the searched string.</para -> <para ->The expression <userinput ->^Peter</userinput -> will match at <quote ->Peter</quote -> in the string <quote ->Peter, hey!</quote -> but not in <quote ->Hey, Peter!</quote -> </para -> </listitem> +<title>Assertions</title> + +<para><emphasis>Assertions</emphasis> allows a regular expression to match only under certain controlled conditions.</para> + +<para>An assertion does not need a character to match, it rather investigates the surroundings of a possible match before acknowledging it. For example the <emphasis>word boundary</emphasis> assertion does not try to find a non word character opposite a word one at its position, instead it makes sure that there is not a word character. This means that the assertion can match where there is no character, &ie; at the ends of a searched string.</para> + +<para>Some assertions actually does have a pattern to match, but the part of the string matching that will not be a part of the result of the match of the full expression.</para> + +<para>Regular Expressions as documented here supports the following assertions: <variablelist> + +<varlistentry> +<term><userinput>^</userinput> (caret: beginning of string)</term> +<listitem><para>Matches the beginning of the searched string.</para> <para>The expression <userinput>^Peter</userinput> will match at <quote>Peter</quote> in the string <quote>Peter, hey!</quote> but not in <quote>Hey, Peter!</quote> </para> </listitem> </varlistentry> <varlistentry> -<term -><userinput ->$</userinput -> (end of string)</term> -<listitem -><para ->Matches the end of the searched string.</para> - -<para ->The expression <userinput ->you\?$</userinput -> will match at the last you in the string <quote ->You didn't do that, did you?</quote -> but nowhere in <quote ->You didn't do that, right?</quote -></para> +<term><userinput>$</userinput> (end of string)</term> +<listitem><para>Matches the end of the searched string.</para> + +<para>The expression <userinput>you\?$</userinput> will match at the last you in the string <quote>You didn't do that, did you?</quote> but nowhere in <quote>You didn't do that, right?</quote></para> </listitem> </varlistentry> <varlistentry> -<term -><userinput ->\b</userinput -> (word boundary)</term> -<listitem -><para ->Matches if there is a word character at one side and not a word character at the other.</para> -<para ->This is useful to find word ends, for example both ends to find a whole word. The expression <userinput ->\bin\b</userinput -> will match at the separate <quote ->in</quote -> in the string <quote ->He came in through the window</quote ->, but not at the <quote ->in</quote -> in <quote ->window</quote ->.</para -></listitem> +<term><userinput>\b</userinput> (word boundary)</term> +<listitem><para>Matches if there is a word character at one side and not a word character at the other.</para> +<para>This is useful to find word ends, for example both ends to find a whole word. The expression <userinput>\bin\b</userinput> will match at the separate <quote>in</quote> in the string <quote>He came in through the window</quote>, but not at the <quote>in</quote> in <quote>window</quote>.</para></listitem> </varlistentry> <varlistentry> -<term -><userinput ->\B</userinput -> (non word boundary)</term> -<listitem -><para ->Matches wherever <quote ->\b</quote -> does not.</para> -<para ->That means that it will match for example within words: The expression <userinput ->\Bin\B</userinput -> will match at in <quote ->window</quote -> but not in <quote ->integer</quote -> or <quote ->I'm in love</quote ->.</para> +<term><userinput>\B</userinput> (non word boundary)</term> +<listitem><para>Matches wherever <quote>\b</quote> does not.</para> +<para>That means that it will match for example within words: The expression <userinput>\Bin\B</userinput> will match at in <quote>window</quote> but not in <quote>integer</quote> or <quote>I'm in love</quote>.</para> </listitem> </varlistentry> <varlistentry> -<term -><userinput ->(?=PATTERN)</userinput -> (Positive lookahead)</term> -<listitem -><para ->A lookahead assertion looks at the part of the string following a possible match. The positive lookahead will prevent the string from matching if the text following the possible match does not match the <emphasis ->PATTERN</emphasis -> of the assertion, but the text matched by that will not be included in the result.</para> -<para ->The expression <userinput ->handy(?=\w)</userinput -> will match at <quote ->handy</quote -> in <quote ->handyman</quote -> but not in <quote ->That came in handy!</quote -></para> +<term><userinput>(?=PATTERN)</userinput> (Positive lookahead)</term> +<listitem><para>A lookahead assertion looks at the part of the string following a possible match. The positive lookahead will prevent the string from matching if the text following the possible match does not match the <emphasis>PATTERN</emphasis> of the assertion, but the text matched by that will not be included in the result.</para> +<para>The expression <userinput>handy(?=\w)</userinput> will match at <quote>handy</quote> in <quote>handyman</quote> but not in <quote>That came in handy!</quote></para> </listitem> </varlistentry> <varlistentry> -<term -><userinput ->(?!PATTERN)</userinput -> (Negative lookahead)</term> - -<listitem -><para ->The negative lookahead prevents a possible match to be acknowledged if the following part of the searched string does match its <emphasis ->PATTERN</emphasis ->.</para> -<para ->The expression <userinput ->const \w+\b(?!\s*&)</userinput -> will match at <quote ->const char</quote -> in the string <quote ->const char* foo</quote -> while it can not match <quote ->const QString</quote -> in <quote ->const QString& bar</quote -> because the <quote ->&</quote -> matches the negative lookahead assertion pattern.</para> +<term><userinput>(?!PATTERN)</userinput> (Negative lookahead)</term> + +<listitem><para>The negative lookahead prevents a possible match to be acknowledged if the following part of the searched string does match its <emphasis>PATTERN</emphasis>.</para> +<para>The expression <userinput>const \w+\b(?!\s*&)</userinput> will match at <quote>const char</quote> in the string <quote>const char* foo</quote> while it can not match <quote>const QString</quote> in <quote>const QString& bar</quote> because the <quote>&</quote> matches the negative lookahead assertion pattern.</para> </listitem> </varlistentry> @@ -1208,11 +477,9 @@ expressions of perl, nor with those of for example <!-- TODO sect1 id="backreferences"> -<title ->Back References</title> +<title>Back References</title> -<para -></para> +<para></para> </sect1 --> |