summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/library/ure.rst94
1 files changed, 74 insertions, 20 deletions
diff --git a/docs/library/ure.rst b/docs/library/ure.rst
index 6f9094028..d2615e37d 100644
--- a/docs/library/ure.rst
+++ b/docs/library/ure.rst
@@ -10,47 +10,101 @@ This module implements regular expression operations. Regular expression
syntax supported is a subset of CPython ``re`` module (and actually is
a subset of POSIX extended regular expressions).
-Supported operators are:
+Supported operators and special sequences are:
-``'.'``
+``.``
Match any character.
-``'[...]'``
+``[...]``
Match set of characters. Individual characters and ranges are supported,
including negated sets (e.g. ``[^a-c]``).
-``'^'``
+``^``
Match the start of the string.
-``'$'``
+``$``
Match the end of the string.
-``'?'``
- Match zero or one of the previous entity.
+``?``
+ Match zero or one of the previous sub-pattern.
-``'*'``
- Match zero or more of the previous entity.
+``*``
+ Match zero or more of the previous sub-pattern.
-``'+'``
- Match one or more of the previous entity.
+``+``
+ Match one or more of the previous sub-pattern.
-``'??'``
+``??``
+ Non-greedy version of ``?``, match zero or one, with the preference
+ for zero.
-``'*?'``
+``*?``
+ Non-greedy version of ``*``, match zero or more, with the preference
+ for the shortest match.
-``'+?'``
+``+?``
+ Non-greedy version of ``+``, match one or more, with the preference
+ for the shortest match.
-``'|'``
- Match either the LHS or the RHS of this operator.
+``|``
+ Match either the left-hand side or the right-hand side sub-patterns of
+ this operator.
-``'(...)'``
+``(...)``
Grouping. Each group is capturing (a substring it captures can be accessed
with `match.group()` method).
-**NOT SUPPORTED**: Counted repetitions (``{m,n}``), more advanced assertions
-(``\b``, ``\B``), named groups (``(?P<name>...)``), non-capturing groups
-(``(?:...)``), etc.
+``\d``
+ Matches digit. Equivalent to ``[0-9]``.
+``\D``
+ Matches non-digit. Equivalent to ``[^0-9]``.
+
+``\s``
+ Matches whitespace. Equivalent to ``[ \t-\r]``.
+
+``\S``
+ Matches non-whitespace. Equivalent to ``[^ \t-\r]``.
+
+``\w``
+ Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_]``.
+
+``\W``
+ Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_]``.
+
+``\``
+ Escape character. Any other character following the backslash, except
+ for those listed above, is taken literally. For example, ``\*`` is
+ equivalent to literal ``*`` (not treated as the ``*`` operator).
+ Note that ``\r``, ``\n``, etc. are not handled specially, and will be
+ equivalent to literal letters ``r``, ``n``, etc. Due to this, it's
+ not recommended to use raw Python strings (``r""``) for regular
+ expressions. For example, ``r"\r\n"`` when used as the regular
+ expression is equivalent to ``"rn"``. To match CR character followed
+ by LF, use ``"\r\n"``.
+
+**NOT SUPPORTED**:
+
+* counted repetitions (``{m,n}``)
+* named groups (``(?P<name>...)``)
+* non-capturing groups (``(?:...)``)
+* more advanced assertions (``\b``, ``\B``)
+* special character escapes like ``\r``, ``\n`` - use Python's own escaping
+ instead
+* etc.
+
+Example::
+
+ import ure
+
+ # As ure doesn't support escapes itself, use of r"" strings is not
+ # recommended.
+ regex = ure.compile("[\r\n]")
+
+ regex.split("line1\rline2\nline3\r\n")
+
+ # Result:
+ # ['line1', 'line2', 'line3', '', '']
Functions
---------