654 lines
38 KiB
HTML
654 lines
38 KiB
HTML
|
<!DOCTYPE html>
|
|||
|
<html lang="en">
|
|||
|
<head>
|
|||
|
<meta charset="utf-8">
|
|||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|||
|
<meta name="generator" content="rustdoc">
|
|||
|
<meta name="description" content="API documentation for the Rust `regex` crate.">
|
|||
|
<meta name="keywords" content="rust, rustlang, rust-lang, regex">
|
|||
|
|
|||
|
<title>regex - Rust</title>
|
|||
|
|
|||
|
<link rel="stylesheet" type="text/css" href="../normalize.css">
|
|||
|
<link rel="stylesheet" type="text/css" href="../rustdoc.css">
|
|||
|
<link rel="stylesheet" type="text/css" href="../main.css">
|
|||
|
|
|||
|
|
|||
|
<link rel="shortcut icon" href="https://www.rust-lang.org/favicon.ico">
|
|||
|
|
|||
|
</head>
|
|||
|
<body class="rustdoc mod">
|
|||
|
<!--[if lte IE 8]>
|
|||
|
<div class="warning">
|
|||
|
This old browser is unsupported and will most likely display funky
|
|||
|
things.
|
|||
|
</div>
|
|||
|
<![endif]-->
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<nav class="sidebar">
|
|||
|
<a href='../regex/index.html'><img src='https://www.rust-lang.org/logos/rust-logo-128x128-blk-v2.png' alt='logo' width='100'></a>
|
|||
|
<p class='location'>Crate regex</p><div class="block items"><ul><li><a href="#modules">Modules</a></li><li><a href="#structs">Structs</a></li><li><a href="#enums">Enums</a></li><li><a href="#traits">Traits</a></li><li><a href="#functions">Functions</a></li></ul></div><p class='location'></p><script>window.sidebarCurrent = {name: 'regex', ty: 'mod', relpath: '../'};</script>
|
|||
|
</nav>
|
|||
|
|
|||
|
<nav class="sub">
|
|||
|
<form class="search-form js-only">
|
|||
|
<div class="search-container">
|
|||
|
<input class="search-input" name="search"
|
|||
|
autocomplete="off"
|
|||
|
placeholder="Click or press ‘S’ to search, ‘?’ for more options…"
|
|||
|
type="search">
|
|||
|
</div>
|
|||
|
</form>
|
|||
|
</nav>
|
|||
|
|
|||
|
<section id='main' class="content">
|
|||
|
<h1 class='fqn'><span class='in-band'>Crate <a class="mod" href=''>regex</a></span><span class='out-of-band'><span id='render-detail'>
|
|||
|
<a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs">
|
|||
|
[<span class='inner'>−</span>]
|
|||
|
</a>
|
|||
|
</span><a class='srclink' href='../src/regex/lib.rs.html#11-606' title='goto source code'>[src]</a></span></h1>
|
|||
|
<div class='docblock'><p>This crate provides a native implementation of regular expressions that is
|
|||
|
heavily based on RE2 both in syntax and in implementation. Notably,
|
|||
|
backreferences and arbitrary lookahead/lookbehind assertions are not
|
|||
|
provided. In return, regular expression searching provided by this package
|
|||
|
has excellent worst-case performance. The specific syntax supported is
|
|||
|
documented further down.</p>
|
|||
|
|
|||
|
<p>This crate's documentation provides some simple examples, describes Unicode
|
|||
|
support and exhaustively lists the supported syntax. For more specific
|
|||
|
details on the API, please see the documentation for the
|
|||
|
<a href="struct.Regex.html"><code>Regex</code></a> type.</p>
|
|||
|
|
|||
|
<h1 id='usage' class='section-header'><a href='#usage'>Usage</a></h1>
|
|||
|
<p>This crate is <a href="https://crates.io/crates/regex">on crates.io</a> and can be
|
|||
|
used by adding <code>regex</code> to your dependencies in your project's <code>Cargo.toml</code>.</p>
|
|||
|
|
|||
|
<pre><code class="language-toml">[dependencies]
|
|||
|
regex = "0.1"
|
|||
|
</code></pre>
|
|||
|
|
|||
|
<p>and this to your crate root:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">extern</span> <span class="kw">crate</span> <span class="ident">regex</span>;</pre>
|
|||
|
|
|||
|
<h1 id='example-find-a-date' class='section-header'><a href='#example-find-a-date'>Example: find a date</a></h1>
|
|||
|
<p>General use of regular expressions in this package involves compiling an
|
|||
|
expression and then using it to search, split or replace text. For example,
|
|||
|
to confirm that some text resembles a date:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">use</span> <span class="ident">regex</span>::<span class="ident">Regex</span>;
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"^\d{4}-\d{2}-\d{2}$"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">re</span>.<span class="ident">is_match</span>(<span class="string">"2014-01-01"</span>));</pre>
|
|||
|
|
|||
|
<p>Notice the use of the <code>^</code> and <code>$</code> anchors. In this crate, every expression
|
|||
|
is executed with an implicit <code>.*?</code> at the beginning and end, which allows
|
|||
|
it to match anywhere in the text. Anchors can be used to ensure that the
|
|||
|
full text matches an expression.</p>
|
|||
|
|
|||
|
<p>This example also demonstrates the utility of
|
|||
|
<a href="https://doc.rust-lang.org/stable/reference.html#raw-string-literals">raw strings</a>
|
|||
|
in Rust, which
|
|||
|
are just like regular strings except they are prefixed with an <code>r</code> and do
|
|||
|
not process any escape sequences. For example, <code>"\\d"</code> is the same
|
|||
|
expression as <code>r"\d"</code>.</p>
|
|||
|
|
|||
|
<h1 id='example-avoid-compiling-the-same-regex-in-a-loop' class='section-header'><a href='#example-avoid-compiling-the-same-regex-in-a-loop'>Example: Avoid compiling the same regex in a loop</a></h1>
|
|||
|
<p>It is an anti-pattern to compile the same regular expression in a loop
|
|||
|
since compilation is typically expensive. (It takes anywhere from a few
|
|||
|
microseconds to a few <strong>milliseconds</strong> depending on the size of the
|
|||
|
regex.) Not only is compilation itself expensive, but this also prevents
|
|||
|
optimizations that reuse allocations internally to the matching engines.</p>
|
|||
|
|
|||
|
<p>In Rust, it can sometimes be a pain to pass regular expressions around if
|
|||
|
they're used from inside a helper function. Instead, we recommend using the
|
|||
|
<a href="https://crates.io/crates/lazy_static"><code>lazy_static</code></a> crate to ensure that
|
|||
|
regular expressions are compiled exactly once.</p>
|
|||
|
|
|||
|
<p>For example:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="attribute">#[<span class="ident">macro_use</span>]</span> <span class="kw">extern</span> <span class="kw">crate</span> <span class="ident">lazy_static</span>;
|
|||
|
<span class="kw">extern</span> <span class="kw">crate</span> <span class="ident">regex</span>;
|
|||
|
|
|||
|
<span class="kw">use</span> <span class="ident">regex</span>::<span class="ident">Regex</span>;
|
|||
|
|
|||
|
<span class="kw">fn</span> <span class="ident">some_helper_function</span>(<span class="ident">text</span>: <span class="kw-2">&</span><span class="ident">str</span>) <span class="op">-></span> <span class="ident">bool</span> {
|
|||
|
<span class="macro">lazy_static</span><span class="macro">!</span> {
|
|||
|
<span class="kw">static</span> <span class="kw-2">ref</span> <span class="ident">RE</span>: <span class="ident">Regex</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">"..."</span>).<span class="ident">unwrap</span>();
|
|||
|
}
|
|||
|
<span class="ident">RE</span>.<span class="ident">is_match</span>(<span class="ident">text</span>)
|
|||
|
}
|
|||
|
|
|||
|
<span class="kw">fn</span> <span class="ident">main</span>() {}</pre>
|
|||
|
|
|||
|
<p>Specifically, in this example, the regex will be compiled when it is used for
|
|||
|
the first time. On subsequent uses, it will reuse the previous compilation.</p>
|
|||
|
|
|||
|
<h1 id='example-iterating-over-capture-groups' class='section-header'><a href='#example-iterating-over-capture-groups'>Example: iterating over capture groups</a></h1>
|
|||
|
<p>This crate provides convenient iterators for matching an expression
|
|||
|
repeatedly against a search string to find successive non-overlapping
|
|||
|
matches. For example, to find all dates in a string and be able to access
|
|||
|
them by their component pieces:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(\d{4})-(\d{2})-(\d{2})"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="kw">let</span> <span class="ident">text</span> <span class="op">=</span> <span class="string">"2012-03-14, 2013-01-01 and 2014-07-05"</span>;
|
|||
|
<span class="kw">for</span> <span class="ident">cap</span> <span class="kw">in</span> <span class="ident">re</span>.<span class="ident">captures_iter</span>(<span class="ident">text</span>) {
|
|||
|
<span class="macro">println</span><span class="macro">!</span>(<span class="string">"Month: {} Day: {} Year: {}"</span>,
|
|||
|
<span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">2</span>).<span class="ident">unwrap_or</span>(<span class="string">""</span>), <span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">3</span>).<span class="ident">unwrap_or</span>(<span class="string">""</span>),
|
|||
|
<span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">1</span>).<span class="ident">unwrap_or</span>(<span class="string">""</span>));
|
|||
|
}
|
|||
|
<span class="comment">// Output:</span>
|
|||
|
<span class="comment">// Month: 03 Day: 14 Year: 2012</span>
|
|||
|
<span class="comment">// Month: 01 Day: 01 Year: 2013</span>
|
|||
|
<span class="comment">// Month: 07 Day: 05 Year: 2014</span></pre>
|
|||
|
|
|||
|
<p>Notice that the year is in the capture group indexed at <code>1</code>. This is
|
|||
|
because the <em>entire match</em> is stored in the capture group at index <code>0</code>.</p>
|
|||
|
|
|||
|
<h1 id='example-replacement-with-named-capture-groups' class='section-header'><a href='#example-replacement-with-named-capture-groups'>Example: replacement with named capture groups</a></h1>
|
|||
|
<p>Building on the previous example, perhaps we'd like to rearrange the date
|
|||
|
formats. This can be done with text replacement. But to make the code
|
|||
|
clearer, we can <em>name</em> our capture groups and use those names as variables
|
|||
|
in our replacement text:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="kw">let</span> <span class="ident">before</span> <span class="op">=</span> <span class="string">"2012-03-14, 2013-01-01 and 2014-07-05"</span>;
|
|||
|
<span class="kw">let</span> <span class="ident">after</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">replace_all</span>(<span class="ident">before</span>, <span class="string">"$m/$d/$y"</span>);
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">after</span>, <span class="string">"03/14/2012, 01/01/2013 and 07/05/2014"</span>);</pre>
|
|||
|
|
|||
|
<p>The <code>replace</code> methods are actually polymorphic in the replacement, which
|
|||
|
provides more flexibility than is seen here. (See the documentation for
|
|||
|
<code>Regex::replace</code> for more details.)</p>
|
|||
|
|
|||
|
<p>Note that if your regex gets complicated, you can use the <code>x</code> flag to
|
|||
|
enable insigificant whitespace mode, which also lets you write comments:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?x)
|
|||
|
(?P<y>\d{4}) # the year
|
|||
|
-
|
|||
|
(?P<m>\d{2}) # the month
|
|||
|
-
|
|||
|
(?P<d>\d{2}) # the day
|
|||
|
"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="kw">let</span> <span class="ident">before</span> <span class="op">=</span> <span class="string">"2012-03-14, 2013-01-01 and 2014-07-05"</span>;
|
|||
|
<span class="kw">let</span> <span class="ident">after</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">replace_all</span>(<span class="ident">before</span>, <span class="string">"$m/$d/$y"</span>);
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">after</span>, <span class="string">"03/14/2012, 01/01/2013 and 07/05/2014"</span>);</pre>
|
|||
|
|
|||
|
<h1 id='example-match-multiple-regular-expressions-simultaneously' class='section-header'><a href='#example-match-multiple-regular-expressions-simultaneously'>Example: match multiple regular expressions simultaneously</a></h1>
|
|||
|
<p>This demonstrates how to use a <code>RegexSet</code> to match multiple (possibly
|
|||
|
overlapping) regular expressions in a single scan of the search text:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">use</span> <span class="ident">regex</span>::<span class="ident">RegexSet</span>;
|
|||
|
|
|||
|
<span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&</span>[
|
|||
|
<span class="string">r"\w+"</span>,
|
|||
|
<span class="string">r"\d+"</span>,
|
|||
|
<span class="string">r"\pL+"</span>,
|
|||
|
<span class="string">r"foo"</span>,
|
|||
|
<span class="string">r"bar"</span>,
|
|||
|
<span class="string">r"barfoo"</span>,
|
|||
|
<span class="string">r"foobar"</span>,
|
|||
|
]).<span class="ident">unwrap</span>();
|
|||
|
|
|||
|
<span class="comment">// Iterate over and collect all of the matches.</span>
|
|||
|
<span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op"><</span>_<span class="op">></span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">"foobar"</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>();
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">matches</span>, <span class="macro">vec</span><span class="macro">!</span>[<span class="number">0</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">6</span>]);
|
|||
|
|
|||
|
<span class="comment">// You can also test whether a particular regex matched:</span>
|
|||
|
<span class="kw">let</span> <span class="ident">matches</span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">"foobar"</span>);
|
|||
|
<span class="macro">assert</span><span class="macro">!</span>(<span class="op">!</span><span class="ident">matches</span>.<span class="ident">matched</span>(<span class="number">5</span>));
|
|||
|
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">matches</span>.<span class="ident">matched</span>(<span class="number">6</span>));</pre>
|
|||
|
|
|||
|
<h1 id='pay-for-what-you-use' class='section-header'><a href='#pay-for-what-you-use'>Pay for what you use</a></h1>
|
|||
|
<p>With respect to searching text with a regular expression, there are three
|
|||
|
questions that can be asked:</p>
|
|||
|
|
|||
|
<ol>
|
|||
|
<li>Does the text match this expression?</li>
|
|||
|
<li>If so, where does it match?</li>
|
|||
|
<li>Where are the submatches?</li>
|
|||
|
</ol>
|
|||
|
|
|||
|
<p>Generally speaking, this crate could provide a function to answer only #3,
|
|||
|
which would subsume #1 and #2 automatically. However, it can be
|
|||
|
significantly more expensive to compute the location of submatches, so it's
|
|||
|
best not to do it if you don't need to.</p>
|
|||
|
|
|||
|
<p>Therefore, only use what you need. For example, don't use <code>find</code> if you
|
|||
|
only need to test if an expression matches a string. (Use <code>is_match</code>
|
|||
|
instead.)</p>
|
|||
|
|
|||
|
<h1 id='unicode' class='section-header'><a href='#unicode'>Unicode</a></h1>
|
|||
|
<p>This implementation executes regular expressions <strong>only</strong> on valid UTF-8
|
|||
|
while exposing match locations as byte indices into the search string.</p>
|
|||
|
|
|||
|
<p>Only simple case folding is supported. Namely, when matching
|
|||
|
case-insensitively, the characters are first mapped using the <a href="ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt">simple case
|
|||
|
folding</a> mapping
|
|||
|
before matching.</p>
|
|||
|
|
|||
|
<p>Regular expressions themselves are <strong>only</strong> interpreted as a sequence of
|
|||
|
Unicode scalar values. This means you can use Unicode characters directly
|
|||
|
in your expression:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?i)Δ+"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">re</span>.<span class="ident">find</span>(<span class="string">"ΔδΔ"</span>), <span class="prelude-val">Some</span>((<span class="number">0</span>, <span class="number">6</span>)));</pre>
|
|||
|
|
|||
|
<p>Finally, Unicode general categories and scripts are available as character
|
|||
|
classes. For example, you can match a sequence of numerals, Greek or
|
|||
|
Cherokee letters:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"[\pN\p{Greek}\p{Cherokee}]+"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">re</span>.<span class="ident">find</span>(<span class="string">"abcΔᎠβⅠᏴγδⅡxyz"</span>), <span class="prelude-val">Some</span>((<span class="number">3</span>, <span class="number">23</span>)));</pre>
|
|||
|
|
|||
|
<h1 id='opt-out-of-unicode-support' class='section-header'><a href='#opt-out-of-unicode-support'>Opt out of Unicode support</a></h1>
|
|||
|
<p>The <code>bytes</code> sub-module provides a <code>Regex</code> type that can be used to match
|
|||
|
on <code>&[u8]</code>. By default, text is interpreted as ASCII compatible text with
|
|||
|
all Unicode support disabled (e.g., <code>.</code> matches any byte instead of any
|
|||
|
Unicode codepoint). Unicode support can be selectively enabled with the
|
|||
|
<code>u</code> flag. See the <code>bytes</code> module documentation for more details.</p>
|
|||
|
|
|||
|
<p>Unicode support can also be selectively <em>disabled</em> with the main <code>Regex</code>
|
|||
|
type that matches on <code>&str</code>. For example, <code>(?-u:\b)</code> will match an ASCII
|
|||
|
word boundary. Note though that invalid UTF-8 is not allowed to be matched
|
|||
|
even when the <code>u</code> flag is disabled. For example, <code>(?-u:.)</code> will return an
|
|||
|
error, since <code>.</code> matches <em>any byte</em> when Unicode support is disabled.</p>
|
|||
|
|
|||
|
<h1 id='syntax' class='section-header'><a href='#syntax'>Syntax</a></h1>
|
|||
|
<p>The syntax supported in this crate is almost in an exact correspondence
|
|||
|
with the syntax supported by RE2. It is documented below.</p>
|
|||
|
|
|||
|
<p>Note that the regular expression parser and abstract syntax are exposed in
|
|||
|
a separate crate, <a href="../regex_syntax/index.html"><code>regex-syntax</code></a>.</p>
|
|||
|
|
|||
|
<h2 id='matching-one-character' class='section-header'><a href='#matching-one-character'>Matching one character</a></h2>
|
|||
|
<pre class="rust">
|
|||
|
. any character except new line (includes new line with s flag)
|
|||
|
[xyz] A character class matching either x, y or z.
|
|||
|
[^xyz] A character class matching any character except x, y and z.
|
|||
|
[a-z] A character class matching any character in range a-z.
|
|||
|
\d digit (\p{Nd})
|
|||
|
\D not digit
|
|||
|
[:alpha:] ASCII character class ([A-Za-z])
|
|||
|
[:^alpha:] Negated ASCII character class ([^A-Za-z])
|
|||
|
\pN One-letter name Unicode character class
|
|||
|
\p{Greek} Unicode character class (general category or script)
|
|||
|
\PN Negated one-letter name Unicode character class
|
|||
|
\P{Greek} negated Unicode character class (general category or script)
|
|||
|
</pre>
|
|||
|
|
|||
|
<p>Any named character class may appear inside a bracketed <code>[...]</code> character
|
|||
|
class. For example, <code>[\p{Greek}\pN]</code> matches any Greek or numeral
|
|||
|
character.</p>
|
|||
|
|
|||
|
<h2 id='composites' class='section-header'><a href='#composites'>Composites</a></h2>
|
|||
|
<pre class="rust">
|
|||
|
xy concatenation (x followed by y)
|
|||
|
x|y alternation (x or y, prefer x)
|
|||
|
</pre>
|
|||
|
|
|||
|
<h2 id='repetitions' class='section-header'><a href='#repetitions'>Repetitions</a></h2>
|
|||
|
<pre class="rust">
|
|||
|
x* zero or more of x (greedy)
|
|||
|
x+ one or more of x (greedy)
|
|||
|
x? zero or one of x (greedy)
|
|||
|
x*? zero or more of x (ungreedy/lazy)
|
|||
|
x+? one or more of x (ungreedy/lazy)
|
|||
|
x?? zero or one of x (ungreedy/lazy)
|
|||
|
x{n,m} at least n x and at most m x (greedy)
|
|||
|
x{n,} at least n x (greedy)
|
|||
|
x{n} exactly n x
|
|||
|
x{n,m}? at least n x and at most m x (ungreedy/lazy)
|
|||
|
x{n,}? at least n x (ungreedy/lazy)
|
|||
|
x{n}? exactly n x
|
|||
|
</pre>
|
|||
|
|
|||
|
<h2 id='empty-matches' class='section-header'><a href='#empty-matches'>Empty matches</a></h2>
|
|||
|
<pre class="rust">
|
|||
|
^ the beginning of text (or start-of-line with multi-line mode)
|
|||
|
$ the end of text (or end-of-line with multi-line mode)
|
|||
|
\A only the beginning of text (even with multi-line mode enabled)
|
|||
|
\z only the end of text (even with multi-line mode enabled)
|
|||
|
\b a Unicode word boundary (\w on one side and \W, \A, or \z on other)
|
|||
|
\B not a Unicode word boundary
|
|||
|
</pre>
|
|||
|
|
|||
|
<h2 id='grouping-and-flags' class='section-header'><a href='#grouping-and-flags'>Grouping and flags</a></h2>
|
|||
|
<pre class="rust">
|
|||
|
(exp) numbered capture group (indexed by opening parenthesis)
|
|||
|
(?P<name>exp) named (also numbered) capture group (allowed chars: [_0-9a-zA-Z])
|
|||
|
(?:exp) non-capturing group
|
|||
|
(?flags) set flags within current group
|
|||
|
(?flags:exp) set flags for exp (non-capturing)
|
|||
|
</pre>
|
|||
|
|
|||
|
<p>Flags are each a single character. For example, <code>(?x)</code> sets the flag <code>x</code>
|
|||
|
and <code>(?-x)</code> clears the flag <code>x</code>. Multiple flags can be set or cleared at
|
|||
|
the same time: <code>(?xy)</code> sets both the <code>x</code> and <code>y</code> flags and <code>(?x-y)</code> sets
|
|||
|
the <code>x</code> flag and clears the <code>y</code> flag.</p>
|
|||
|
|
|||
|
<p>All flags are by default disabled unless stated otherwise. They are:</p>
|
|||
|
|
|||
|
<pre class="rust">
|
|||
|
i case-insensitive
|
|||
|
m multi-line mode: ^ and $ match begin/end of line
|
|||
|
s allow . to match \n
|
|||
|
U swap the meaning of x* and x*?
|
|||
|
u Unicode support (enabled by default)
|
|||
|
x ignore whitespace and allow line comments (starting with `#`)
|
|||
|
</pre>
|
|||
|
|
|||
|
<p>Here's an example that matches case-insensitively for only part of the
|
|||
|
expression:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?i)a+(?-i)b+"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="kw">let</span> <span class="ident">cap</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">captures</span>(<span class="string">"AaAaAbbBBBb"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">0</span>), <span class="prelude-val">Some</span>(<span class="string">"AaAaAbb"</span>));</pre>
|
|||
|
|
|||
|
<p>Notice that the <code>a+</code> matches either <code>a</code> or <code>A</code>, but the <code>b+</code> only matches
|
|||
|
<code>b</code>.</p>
|
|||
|
|
|||
|
<p>Here is an example that uses an ASCII word boundary instead of a Unicode
|
|||
|
word boundary:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?-u:\b).+(?-u:\b)"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="kw">let</span> <span class="ident">cap</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">captures</span>(<span class="string">"$$abc$$"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">0</span>), <span class="prelude-val">Some</span>(<span class="string">"abc"</span>));</pre>
|
|||
|
|
|||
|
<h2 id='escape-sequences' class='section-header'><a href='#escape-sequences'>Escape sequences</a></h2>
|
|||
|
<pre class="rust">
|
|||
|
\* literal *, works for any punctuation character: \.+*?()|[]{}^$
|
|||
|
\a bell (\x07)
|
|||
|
\f form feed (\x0C)
|
|||
|
\t horizontal tab
|
|||
|
\n new line
|
|||
|
\r carriage return
|
|||
|
\v vertical tab (\x0B)
|
|||
|
\123 octal character code (up to three digits)
|
|||
|
\x7F hex character code (exactly two digits)
|
|||
|
\x{10FFFF} any hex character code corresponding to a Unicode code point
|
|||
|
</pre>
|
|||
|
|
|||
|
<h2 id='perl-character-classes-unicode-friendly' class='section-header'><a href='#perl-character-classes-unicode-friendly'>Perl character classes (Unicode friendly)</a></h2>
|
|||
|
<p>These classes are based on the definitions provided in
|
|||
|
<a href="http://www.unicode.org/reports/tr18/#Compatibility_Properties">UTS#18</a>:</p>
|
|||
|
|
|||
|
<pre class="rust">
|
|||
|
\d digit (\p{Nd})
|
|||
|
\D not digit
|
|||
|
\s whitespace (\p{White_Space})
|
|||
|
\S not whitespace
|
|||
|
\w word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control})
|
|||
|
\W not word character
|
|||
|
</pre>
|
|||
|
|
|||
|
<h2 id='ascii-character-classes' class='section-header'><a href='#ascii-character-classes'>ASCII character classes</a></h2>
|
|||
|
<pre class="rust">
|
|||
|
[:alnum:] alphanumeric ([0-9A-Za-z])
|
|||
|
[:alpha:] alphabetic ([A-Za-z])
|
|||
|
[:ascii:] ASCII ([\x00-\x7F])
|
|||
|
[:blank:] blank ([\t ])
|
|||
|
[:cntrl:] control ([\x00-\x1F\x7F])
|
|||
|
[:digit:] digits ([0-9])
|
|||
|
[:graph:] graphical ([!-~])
|
|||
|
[:lower:] lower case ([a-z])
|
|||
|
[:print:] printable ([ -~])
|
|||
|
[:punct:] punctuation ([!-/:-@[-`{-~])
|
|||
|
[:space:] whitespace ([\t\n\v\f\r ])
|
|||
|
[:upper:] upper case ([A-Z])
|
|||
|
[:word:] word characters ([0-9A-Za-z_])
|
|||
|
[:xdigit:] hex digit ([0-9A-Fa-f])
|
|||
|
</pre>
|
|||
|
|
|||
|
<h1 id='untrusted-input' class='section-header'><a href='#untrusted-input'>Untrusted input</a></h1>
|
|||
|
<p>This crate can handle both untrusted regular expressions and untrusted
|
|||
|
search text.</p>
|
|||
|
|
|||
|
<p>Untrusted regular expressions are handled by capping the size of a compiled
|
|||
|
regular expression. (See <code>Regex::with_size_limit</code>.) Without this, it would
|
|||
|
be trivial for an attacker to exhaust your system's memory with expressions
|
|||
|
like <code>a{100}{100}{100}</code>.</p>
|
|||
|
|
|||
|
<p>Untrusted search text is allowed because the matching engine(s) in this
|
|||
|
crate have time complexity <code>O(mn)</code> (with <code>m ~ regex</code> and <code>n ~ search text</code>), which means there's no way to cause exponential blow-up like with
|
|||
|
some other regular expression engines. (We pay for this by disallowing
|
|||
|
features like arbitrary look-ahead and backreferences.)</p>
|
|||
|
|
|||
|
<p>When a DFA is used, pathological cases with exponential state blow up are
|
|||
|
avoided by constructing the DFA lazily or in an "online" manner. Therefore,
|
|||
|
at most one new state can be created for each byte of input. This satisfies
|
|||
|
our time complexity guarantees, but can lead to unbounded memory growth
|
|||
|
proportional to the size of the input. As a stopgap, the DFA is only
|
|||
|
allowed to store a fixed number of states. (When the limit is reached, its
|
|||
|
states are wiped and continues on, possibly duplicating previous work. If
|
|||
|
the limit is reached too frequently, it gives up and hands control off to
|
|||
|
another matching engine with fixed memory requirements.)</p>
|
|||
|
</div><h2 id='modules' class='section-header'><a href="#modules">Modules</a></h2>
|
|||
|
<table>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="mod" href="bytes/index.html"
|
|||
|
title='mod regex::bytes'>bytes</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Match regular expressions on arbitrary bytes.</p>
|
|||
|
</td>
|
|||
|
</tr></table><h2 id='structs' class='section-header'><a href="#structs">Structs</a></h2>
|
|||
|
<table>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.CaptureNames.html"
|
|||
|
title='struct regex::CaptureNames'>CaptureNames</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An iterator over the names of all possible captures.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.Captures.html"
|
|||
|
title='struct regex::Captures'>Captures</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Captures represents a group of captured strings for a single match.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.FindCaptures.html"
|
|||
|
title='struct regex::FindCaptures'>FindCaptures</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An iterator that yields all non-overlapping capture groups matching a
|
|||
|
particular regular expression.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.FindMatches.html"
|
|||
|
title='struct regex::FindMatches'>FindMatches</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An iterator over all non-overlapping matches for a particular string.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.NoExpand.html"
|
|||
|
title='struct regex::NoExpand'>NoExpand</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>NoExpand indicates literal string replacement.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.Regex.html"
|
|||
|
title='struct regex::Regex'>Regex</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>A compiled regular expression for matching Unicode strings.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.RegexBuilder.html"
|
|||
|
title='struct regex::RegexBuilder'>RegexBuilder</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>A configurable builder for a regular expression.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.RegexSet.html"
|
|||
|
title='struct regex::RegexSet'>RegexSet</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Match multiple (possibly overlapping) regular expressions in a single scan.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.RegexSplits.html"
|
|||
|
title='struct regex::RegexSplits'>RegexSplits</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Yields all substrings delimited by a regular expression match.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.RegexSplitsN.html"
|
|||
|
title='struct regex::RegexSplitsN'>RegexSplitsN</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Yields at most <code>N</code> substrings delimited by a regular expression match.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SetMatches.html"
|
|||
|
title='struct regex::SetMatches'>SetMatches</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>A set of matches returned by a regex set.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SetMatchesIntoIter.html"
|
|||
|
title='struct regex::SetMatchesIntoIter'>SetMatchesIntoIter</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An owned iterator over the set of matches from a regex set.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SetMatchesIter.html"
|
|||
|
title='struct regex::SetMatchesIter'>SetMatchesIter</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>A borrowed iterator over the set of matches from a regex set.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SubCaptures.html"
|
|||
|
title='struct regex::SubCaptures'>SubCaptures</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An iterator over capture groups for a particular match of a regular
|
|||
|
expression.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SubCapturesNamed.html"
|
|||
|
title='struct regex::SubCapturesNamed'>SubCapturesNamed</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An Iterator over named capture groups as a tuple with the group
|
|||
|
name and the value.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SubCapturesPos.html"
|
|||
|
title='struct regex::SubCapturesPos'>SubCapturesPos</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An iterator over capture group positions for a particular match of a
|
|||
|
regular expression.</p>
|
|||
|
</td>
|
|||
|
</tr></table><h2 id='enums' class='section-header'><a href="#enums">Enums</a></h2>
|
|||
|
<table>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="enum" href="enum.Error.html"
|
|||
|
title='enum regex::Error'>Error</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An error that occurred during parsing or compiling a regular expression.</p>
|
|||
|
</td>
|
|||
|
</tr></table><h2 id='traits' class='section-header'><a href="#traits">Traits</a></h2>
|
|||
|
<table>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="trait" href="trait.Replacer.html"
|
|||
|
title='trait regex::Replacer'>Replacer</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Replacer describes types that can be used to replace matches in a string.</p>
|
|||
|
</td>
|
|||
|
</tr></table><h2 id='functions' class='section-header'><a href="#functions">Functions</a></h2>
|
|||
|
<table>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="fn" href="fn.is_match.html"
|
|||
|
title='fn regex::is_match'>is_match</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Tests if the given regular expression matches somewhere in the text given.</p>
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="fn" href="fn.quote.html"
|
|||
|
title='fn regex::quote'>quote</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Escapes all regular expression meta characters in <code>text</code>.</p>
|
|||
|
</td>
|
|||
|
</tr></table></section>
|
|||
|
<section id='search' class="content hidden"></section>
|
|||
|
|
|||
|
<section class="footer"></section>
|
|||
|
|
|||
|
<aside id="help" class="hidden">
|
|||
|
<div>
|
|||
|
<h1 class="hidden">Help</h1>
|
|||
|
|
|||
|
<div class="shortcuts">
|
|||
|
<h2>Keyboard Shortcuts</h2>
|
|||
|
|
|||
|
<dl>
|
|||
|
<dt>?</dt>
|
|||
|
<dd>Show this help dialog</dd>
|
|||
|
<dt>S</dt>
|
|||
|
<dd>Focus the search field</dd>
|
|||
|
<dt>⇤</dt>
|
|||
|
<dd>Move up in search results</dd>
|
|||
|
<dt>⇥</dt>
|
|||
|
<dd>Move down in search results</dd>
|
|||
|
<dt>⏎</dt>
|
|||
|
<dd>Go to active search result</dd>
|
|||
|
<dt>+</dt>
|
|||
|
<dd>Collapse/expand all sections</dd>
|
|||
|
</dl>
|
|||
|
</div>
|
|||
|
|
|||
|
<div class="infos">
|
|||
|
<h2>Search Tricks</h2>
|
|||
|
|
|||
|
<p>
|
|||
|
Prefix searches with a type followed by a colon (e.g.
|
|||
|
<code>fn:</code>) to restrict the search to a given type.
|
|||
|
</p>
|
|||
|
|
|||
|
<p>
|
|||
|
Accepted types are: <code>fn</code>, <code>mod</code>,
|
|||
|
<code>struct</code>, <code>enum</code>,
|
|||
|
<code>trait</code>, <code>type</code>, <code>macro</code>,
|
|||
|
and <code>const</code>.
|
|||
|
</p>
|
|||
|
|
|||
|
<p>
|
|||
|
Search functions by type signature (e.g.
|
|||
|
<code>vec -> usize</code> or <code>* -> vec</code>)
|
|||
|
</p>
|
|||
|
</div>
|
|||
|
</div>
|
|||
|
</aside>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<script>
|
|||
|
window.rootPath = "../";
|
|||
|
window.currentCrate = "regex";
|
|||
|
</script>
|
|||
|
<script src="../main.js"></script>
|
|||
|
<script defer src="../search-index.js"></script>
|
|||
|
</body>
|
|||
|
</html>
|