pnmixer-rust/regex/struct.RegexSet.html

269 lines
21 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="rustdoc">
<meta name="description" content="API documentation for the Rust `RegexSet` struct in crate `regex`.">
<meta name="keywords" content="rust, rustlang, rust-lang, RegexSet">
<title>regex::RegexSet - Rust</title>
<link rel="stylesheet" type="text/css" href="../normalize.css">
<link rel="stylesheet" type="text/css" href="../rustdoc.css">
<link rel="stylesheet" type="text/css" href="../main.css">
<link rel="shortcut icon" href="https://www.rust-lang.org/favicon.ico">
</head>
<body class="rustdoc struct">
<!--[if lte IE 8]>
<div class="warning">
This old browser is unsupported and will most likely display funky
things.
</div>
<![endif]-->
<nav class="sidebar">
<a href='../regex/index.html'><img src='https://www.rust-lang.org/logos/rust-logo-128x128-blk-v2.png' alt='logo' width='100'></a>
<p class='location'>Struct RegexSet</p><div class="block items"><ul><li><a href="#methods">Methods</a></li><li><a href="#implementations">Trait Implementations</a></li></ul></div><p class='location'><a href='index.html'>regex</a></p><script>window.sidebarCurrent = {name: 'RegexSet', ty: 'struct', relpath: ''};</script><script defer src="sidebar-items.js"></script>
</nav>
<nav class="sub">
<form class="search-form js-only">
<div class="search-container">
<input class="search-input" name="search"
autocomplete="off"
placeholder="Click or press S to search, ? for more options…"
type="search">
</div>
</form>
</nav>
<section id='main' class="content">
<h1 class='fqn'><span class='in-band'>Struct <a href='index.html'>regex</a>::<wbr><a class="struct" href=''>RegexSet</a></span><span class='out-of-band'><span id='render-detail'>
<a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs">
[<span class='inner'>&#x2212;</span>]
</a>
</span><a class='srclink' href='../src/regex/re_set.rs.html#105' title='goto source code'>[src]</a></span></h1>
<pre class='rust struct'>pub struct RegexSet(_);</pre><div class='docblock'><p>Match multiple (possibly overlapping) regular expressions in a single scan.</p>
<p>A regex set corresponds to the union of two or more regular expressions.
That is, a regex set will match text where at least one of its
constituent regular expressions matches. A regex set as its formulated here
provides a touch more power: it will also report <em>which</em> regular
expressions in the set match. Indeed, this is the key difference between
regex sets and a single <code>Regex</code> with many alternates, since only one
alternate can match at a time.</p>
<p>For example, consider regular expressions to match email addresses and
domains: <code>[a-z]+@[a-z]+\.(com|org|net)</code> and <code>[a-z]+\.(com|org|net)</code>. If a
regex set is constructed from those regexes, then searching the text
<code>foo@example.com</code> will report both regexes as matching. Of course, one
could accomplish this by compiling each regex on its own and doing two
searches over the text. The key advantage of using a regex set is that it
will report the matching regexes using a <em>single pass through the text</em>.
If one has hundreds or thousands of regexes to match repeatedly (like a URL
router for a complex web application or a user agent matcher), then a regex
set can realize huge performance gains.</p>
<h1 id='example' class='section-header'><a href='#example'>Example</a></h1>
<p>This shows how the above two regexes (for matching email addresses and
domains) might work:</p>
<pre class="rust rust-example-rendered">
<span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&amp;</span>[
<span class="string">r&quot;[a-z]+@[a-z]+\.(com|org|net)&quot;</span>,
<span class="string">r&quot;[a-z]+\.(com|org|net)&quot;</span>,
]).<span class="ident">unwrap</span>();
<span class="comment">// Ask whether any regexes in the set match.</span>
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">set</span>.<span class="ident">is_match</span>(<span class="string">&quot;foo@example.com&quot;</span>));
<span class="comment">// Identify which regexes in the set match.</span>
<span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op">&lt;</span>_<span class="op">&gt;</span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">&quot;foo@example.com&quot;</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>();
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="macro">vec</span><span class="macro">!</span>[<span class="number">0</span>, <span class="number">1</span>], <span class="ident">matches</span>);
<span class="comment">// Try again, but with text that only matches one of the regexes.</span>
<span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op">&lt;</span>_<span class="op">&gt;</span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">&quot;example.com&quot;</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>();
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="macro">vec</span><span class="macro">!</span>[<span class="number">1</span>], <span class="ident">matches</span>);
<span class="comment">// Try again, but with text that doesn&#39;t match any regex in the set.</span>
<span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op">&lt;</span>_<span class="op">&gt;</span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">&quot;example&quot;</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>();
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">matches</span>.<span class="ident">is_empty</span>());</pre>
<p>Note that it would be possible to adapt the above example to using <code>Regex</code>
with an expression like:</p>
<pre class="rust rust-example-rendered">
(<span class="question-mark">?</span><span class="ident">P</span><span class="op">&lt;</span><span class="ident">email</span><span class="op">&gt;</span>[<span class="ident">a</span><span class="op">-</span><span class="ident">z</span>]<span class="op">+</span>@(<span class="question-mark">?</span><span class="ident">P</span><span class="op">&lt;</span><span class="ident">email_domain</span><span class="op">&gt;</span>[<span class="ident">a</span><span class="op">-</span><span class="ident">z</span>]<span class="op">+</span>[.](<span class="ident">com</span><span class="op">|</span><span class="ident">org</span><span class="op">|</span><span class="ident">net</span>)))<span class="op">|</span>(<span class="question-mark">?</span><span class="ident">P</span><span class="op">&lt;</span><span class="ident">domain</span><span class="op">&gt;</span>[<span class="ident">a</span><span class="op">-</span><span class="ident">z</span>]<span class="op">+</span>[.](<span class="ident">com</span><span class="op">|</span><span class="ident">org</span><span class="op">|</span><span class="ident">net</span>))</pre>
<p>After a match, one could then inspect the capture groups to figure out
which alternates matched. The problem is that it is hard to make this
approach scale when there are many regexes since the overlap between each
alternate isn&#39;t always obvious to reason about.</p>
<h1 id='limitations' class='section-header'><a href='#limitations'>Limitations</a></h1>
<p>Regex sets are limited to answering the following two questions:</p>
<ol>
<li>Does any regex in the set match?</li>
<li>If so, which regexes in the set match?</li>
</ol>
<p>As with the main <code>Regex</code> type, it is cheaper to ask (1) instead of (2)
since the matching engines can stop after the first match is found.</p>
<p>Other features like finding the location of successive matches or their
sub-captures aren&#39;t supported. If you need this functionality, the
recommended approach is to compile each regex in the set independently and
selectively match them based on which regexes in the set matched.</p>
<h1 id='performance' class='section-header'><a href='#performance'>Performance</a></h1>
<p>A <code>RegexSet</code> has the same performance characteristics as <code>Regex</code>. Namely,
search takes <code>O(mn)</code> time, where <code>m</code> is proportional to the size of the
regex set and <code>n</code> is proportional to the length of the search text.</p>
</div><h2 id='methods'>Methods</h2><h3 class='impl'><span class='in-band'><code>impl <a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a></code></span><span class='out-of-band'><div class='ghost'></div><a class='srclink' href='../src/regex/re_set.rs.html#107-207' title='goto source code'>[src]</a></span></h3>
<div class='impl-items'><h4 id='method.new' class="method"><span id='new.v' class='invisible'><code>fn <a href='#method.new' class='fnname'>new</a>&lt;I, S&gt;(exprs: I) -&gt; <a class="enum" href="https://doc.rust-lang.org/nightly/core/result/enum.Result.html" title="enum core::result::Result">Result</a>&lt;<a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a>, <a class="enum" href="../regex/enum.Error.html" title="enum regex::Error">Error</a>&gt; <span class="where fmt-newline">where<br>&nbsp;&nbsp;&nbsp;&nbsp;S: <a class="trait" href="https://doc.rust-lang.org/nightly/core/convert/trait.AsRef.html" title="trait core::convert::AsRef">AsRef</a>&lt;<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.str.html">str</a>&gt;,<br>&nbsp;&nbsp;&nbsp;&nbsp;I: <a class="trait" href="https://doc.rust-lang.org/nightly/core/iter/traits/trait.IntoIterator.html" title="trait core::iter::traits::IntoIterator">IntoIterator</a>&lt;Item = S&gt;,&nbsp;</span></code></span></h4>
<div class='docblock'><p>Create a new regex set with the given regular expressions.</p>
<p>This takes an iterator of <code>S</code>, where <code>S</code> is something that can produce
a <code>&amp;str</code>. If any of the strings in the iterator are not valid regular
expressions, then an error is returned.</p>
<h1 id='example-1' class='section-header'><a href='#example-1'>Example</a></h1>
<p>Create a new regex set from an iterator of strings:</p>
<pre class="rust rust-example-rendered">
<span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&amp;</span>[<span class="string">r&quot;\w+&quot;</span>, <span class="string">r&quot;\d+&quot;</span>]).<span class="ident">unwrap</span>();
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">set</span>.<span class="ident">is_match</span>(<span class="string">&quot;foo&quot;</span>));</pre>
</div><h4 id='method.is_match' class="method"><span id='is_match.v' class='invisible'><code>fn <a href='#method.is_match' class='fnname'>is_match</a>(&amp;self, text: &amp;<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.str.html">str</a>) -&gt; <a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.bool.html">bool</a></code></span></h4>
<div class='docblock'><p>Returns true if and only if one of the regexes in this set matches
the text given.</p>
<p>This method should be preferred if you only need to test whether any
of the regexes in the set should match, but don&#39;t care about <em>which</em>
regexes matched. This is because the underlying matching engine will
quit immediately after seeing the first match instead of continuing to
find all matches.</p>
<p>Note that as with searches using <code>Regex</code>, the expression is unanchored
by default. That is, if the regex does not start with <code>^</code> or <code>\A</code>, or
end with <code>$</code> or <code>\z</code>, then it is permitted to match anywhere in the
text.</p>
<h1 id='example-2' class='section-header'><a href='#example-2'>Example</a></h1>
<p>Tests whether a set matches some text:</p>
<pre class="rust rust-example-rendered">
<span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&amp;</span>[<span class="string">r&quot;\w+&quot;</span>, <span class="string">r&quot;\d+&quot;</span>]).<span class="ident">unwrap</span>();
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">set</span>.<span class="ident">is_match</span>(<span class="string">&quot;foo&quot;</span>));
<span class="macro">assert</span><span class="macro">!</span>(<span class="op">!</span><span class="ident">set</span>.<span class="ident">is_match</span>(<span class="string">&quot;&quot;</span>));</pre>
</div><h4 id='method.matches' class="method"><span id='matches.v' class='invisible'><code>fn <a href='#method.matches' class='fnname'>matches</a>(&amp;self, text: &amp;<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.str.html">str</a>) -&gt; <a class="struct" href="../regex/struct.SetMatches.html" title="struct regex::SetMatches">SetMatches</a></code></span></h4>
<div class='docblock'><p>Returns the set of regular expressions that match in the given text.</p>
<p>The set returned contains the index of each regular expression that
matches in the given text. The index is in correspondence with the
order of regular expressions given to <code>RegexSet</code>&#39;s constructor.</p>
<p>The set can also be used to iterate over the matched indices.</p>
<p>Note that as with searches using <code>Regex</code>, the expression is unanchored
by default. That is, if the regex does not start with <code>^</code> or <code>\A</code>, or
end with <code>$</code> or <code>\z</code>, then it is permitted to match anywhere in the
text.</p>
<h1 id='example-3' class='section-header'><a href='#example-3'>Example</a></h1>
<p>Tests which regular expressions match the given text:</p>
<pre class="rust rust-example-rendered">
<span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&amp;</span>[
<span class="string">r&quot;\w+&quot;</span>,
<span class="string">r&quot;\d+&quot;</span>,
<span class="string">r&quot;\pL+&quot;</span>,
<span class="string">r&quot;foo&quot;</span>,
<span class="string">r&quot;bar&quot;</span>,
<span class="string">r&quot;barfoo&quot;</span>,
<span class="string">r&quot;foobar&quot;</span>,
]).<span class="ident">unwrap</span>();
<span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op">&lt;</span>_<span class="op">&gt;</span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">&quot;foobar&quot;</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>();
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">matches</span>, <span class="macro">vec</span><span class="macro">!</span>[<span class="number">0</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">6</span>]);
<span class="comment">// You can also test whether a particular regex matched:</span>
<span class="kw">let</span> <span class="ident">matches</span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">&quot;foobar&quot;</span>);
<span class="macro">assert</span><span class="macro">!</span>(<span class="op">!</span><span class="ident">matches</span>.<span class="ident">matched</span>(<span class="number">5</span>));
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">matches</span>.<span class="ident">matched</span>(<span class="number">6</span>));</pre>
</div><h4 id='method.len' class="method"><span id='len.v' class='invisible'><code>fn <a href='#method.len' class='fnname'>len</a>(&amp;self) -&gt; <a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a></code></span></h4>
<div class='docblock'><p>Returns the total number of regular expressions in this set.</p>
</div></div><h2 id='implementations'>Trait Implementations</h2><h3 class='impl'><span class='in-band'><code>impl <a class="trait" href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html" title="trait core::clone::Clone">Clone</a> for <a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a></code></span><span class='out-of-band'><div class='ghost'></div><a class='srclink' href='../src/regex/re_set.rs.html#104' title='goto source code'>[src]</a></span></h3>
<div class='impl-items'><h4 id='method.clone' class="method"><span id='clone.v' class='invisible'><code>fn <a href='https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#tymethod.clone' class='fnname'>clone</a>(&amp;self) -&gt; <a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a></code></span></h4>
<div class='docblock'><p>Returns a copy of the value. <a href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#tymethod.clone">Read more</a></p>
</div><h4 id='method.clone_from' class="method"><span id='clone_from.v' class='invisible'><code>fn <a href='https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#method.clone_from' class='fnname'>clone_from</a>(&amp;mut self, source: &amp;Self)</code><div class='since' title='Stable since Rust version 1.0.0'>1.0.0</div></span></h4>
<div class='docblock'><p>Performs copy-assignment from <code>source</code>. <a href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#method.clone_from">Read more</a></p>
</div></div><h3 class='impl'><span class='in-band'><code>impl <a class="trait" href="https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html" title="trait core::fmt::Debug">Debug</a> for <a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a></code></span><span class='out-of-band'><div class='ghost'></div><a class='srclink' href='../src/regex/re_set.rs.html#331-335' title='goto source code'>[src]</a></span></h3>
<div class='impl-items'><h4 id='method.fmt' class="method"><span id='fmt.v' class='invisible'><code>fn <a href='https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html#tymethod.fmt' class='fnname'>fmt</a>(&amp;self, f: &amp;mut <a class="struct" href="https://doc.rust-lang.org/nightly/core/fmt/struct.Formatter.html" title="struct core::fmt::Formatter">Formatter</a>) -&gt; <a class="type" href="https://doc.rust-lang.org/nightly/core/fmt/type.Result.html" title="type core::fmt::Result">Result</a></code></span></h4>
<div class='docblock'><p>Formats the value using the given formatter.</p>
</div></div></section>
<section id='search' class="content hidden"></section>
<section class="footer"></section>
<aside id="help" class="hidden">
<div>
<h1 class="hidden">Help</h1>
<div class="shortcuts">
<h2>Keyboard Shortcuts</h2>
<dl>
<dt>?</dt>
<dd>Show this help dialog</dd>
<dt>S</dt>
<dd>Focus the search field</dd>
<dt>&larrb;</dt>
<dd>Move up in search results</dd>
<dt>&rarrb;</dt>
<dd>Move down in search results</dd>
<dt>&#9166;</dt>
<dd>Go to active search result</dd>
<dt>+</dt>
<dd>Collapse/expand all sections</dd>
</dl>
</div>
<div class="infos">
<h2>Search Tricks</h2>
<p>
Prefix searches with a type followed by a colon (e.g.
<code>fn:</code>) to restrict the search to a given type.
</p>
<p>
Accepted types are: <code>fn</code>, <code>mod</code>,
<code>struct</code>, <code>enum</code>,
<code>trait</code>, <code>type</code>, <code>macro</code>,
and <code>const</code>.
</p>
<p>
Search functions by type signature (e.g.
<code>vec -> usize</code> or <code>* -> vec</code>)
</p>
</div>
</div>
</aside>
<script>
window.rootPath = "../";
window.currentCrate = "regex";
</script>
<script src="../main.js"></script>
<script defer src="../search-index.js"></script>
</body>
</html>