ure – simple regular expressions

This module implements a subset of the corresponding CPython module, as described below. For more information, refer to the original CPython documentation: re.

This module implements regular expression operations. Regular expression syntax supported is a subset of CPython re module (and actually is a subset of POSIX extended regular expressions).

Supported operators and special sequences are:

.
Match any character.
[...]
Match set of characters. Individual characters and ranges are supported, including negated sets (e.g. [^a-c]). To include ] in the set, it should be escaped, e.g.: r"[\]]" or "[\\]]". No other escapes are supported (will lead to error). To include - in the set, it should be listed as the first item, e.g. [^-0-9].
^
Match the start of the string.
$
Match the end of the string.
?
Match zero or one of the previous sub-pattern.
*
Match zero or more of the previous sub-pattern.
+
Match one or more of the previous sub-pattern.
??
Non-greedy version of ?, match zero or one, with the preference for zero.
*?
Non-greedy version of *, match zero or more, with the preference for the shortest match.
+?
Non-greedy version of +, match one or more, with the preference for the shortest match.
|
Match either the left-hand side or the right-hand side sub-patterns of this operator.
(...)
Capturing group. A substring captured by a group can be accessed with match.group() method.
(?:...)
Non-capturing group.
\d
Matches digit. Equivalent to [0-9].
\D
Matches non-digit. Equivalent to [^0-9].
\s
Matches whitespace. Equivalent to [ \t-\r].
\S
Matches non-whitespace. Equivalent to [^ \t-\r].
\w
Matches “word characters” (ASCII only). Equivalent to [A-Za-z0-9_].
\W
Matches non “word characters” (ASCII only). Equivalent to [^A-Za-z0-9_].
\
Escape character. Allows to quote characters which have special meaning in regex syntax: .*+?[](){}|^$. For example, \* is equivalent to a literal * (not treated as the * repetition operator). It is an error to escape any other character besides these operators and special sequences described above (\d, etc.). In particular, sequences like \r, \n, etc. should not appear as regular expression syntax. Instead, they can (and should) be handled as normal escapes in Python strings. Due to this, it’s not recommended to use raw Python strings (r"") for ure regular expressions. For example, r"\r\n" when used as a regular expression will lead to error (unsupported escape sequence in regex). To match CR character followed by LF, use "\r\n".

NOT SUPPORTED:

  • counted repetitions ({m,n})
  • named groups ((?P<name>...))
  • more advanced assertions (\A, \Z, \b, \B)
  • special character escapes like \r, \n - use Python’s own escaping instead
  • etc.

Example:

import ure

# As ure doesn't support escapes itself, use of r"" strings is not
# recommended.
regex = ure.compile("[\r\n]")

regex.split("line1\rline2\nline3\r\n")

# Result:
# ['line1', 'line2', 'line3', '', '']

Functions

ure.compile(regex_str[, flags])

Compile regular expression, return regex object.

ure.match(regex_str, string)

Compile regex_str and match against string. Match always happens from starting position in a string.

ure.search(regex_str, string)

Compile regex_str and search it in a string. Unlike match, this will search string for first position which matches regex (which still may be 0 if regex is anchored).

ure.sub(regex_str, replace, string, count=0, /)

Compile regex_str and search for it in string, replacing all matches with replace, and returning the new string.

replace can be a string or a function. If it is a string then escape sequences of the form \<number> and \g<number> can be used to expand to the corresponding group (or an empty string for unmatched groups). If replace is a function then it must take a single argument (the match object) and should return a replacement string.

If count is specified and non-zero then substitution will stop after this many substitutions are made.

Note: availability of this function depends on Pycopy port.

ure.DEBUG

Flag value, display debug information about compiled expression. (Availability depends on Pycopy port.)

Regex objects

Compiled regular expression. Instances of this class are created using ure.compile().

regex.match(string)
regex.search(string)
regex.sub(replace, string, count=0, /)

Similar to the module-level functions match(), search() and sub(). Using methods is (much) more efficient if the same regex is applied to multiple strings.

regex.split(string, max_split=-1, /)

Split a string using regex. If max_split is given, it specifies maximum number of splits to perform. Returns list of strings (there may be up to max_split+1 elements if it’s specified).

Match objects

Match objects as returned by match() and search() methods, and passed to the replacement function in sub().

match.group(index)

Return matching (sub)string. index is 0 for entire match, 1 and above for each capturing group. Only numeric groups are supported.

match.start([index])
match.end([index])

Return the index in the original string of the start or end of the substring group that was matched. index defaults to the entire group, otherwise it will select a group.

Note: availability of these methods depends on Pycopy port.