269 lines
21 KiB
HTML
269 lines
21 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||
<meta name="generator" content="rustdoc">
|
||
<meta name="description" content="API documentation for the Rust `RegexSet` struct in crate `regex`.">
|
||
<meta name="keywords" content="rust, rustlang, rust-lang, RegexSet">
|
||
|
||
<title>regex::RegexSet - Rust</title>
|
||
|
||
<link rel="stylesheet" type="text/css" href="../normalize.css">
|
||
<link rel="stylesheet" type="text/css" href="../rustdoc.css">
|
||
<link rel="stylesheet" type="text/css" href="../main.css">
|
||
|
||
|
||
<link rel="shortcut icon" href="https://www.rust-lang.org/favicon.ico">
|
||
|
||
</head>
|
||
<body class="rustdoc struct">
|
||
<!--[if lte IE 8]>
|
||
<div class="warning">
|
||
This old browser is unsupported and will most likely display funky
|
||
things.
|
||
</div>
|
||
<![endif]-->
|
||
|
||
|
||
|
||
<nav class="sidebar">
|
||
<a href='../regex/index.html'><img src='https://www.rust-lang.org/logos/rust-logo-128x128-blk-v2.png' alt='logo' width='100'></a>
|
||
<p class='location'>Struct RegexSet</p><div class="block items"><ul><li><a href="#methods">Methods</a></li><li><a href="#implementations">Trait Implementations</a></li></ul></div><p class='location'><a href='index.html'>regex</a></p><script>window.sidebarCurrent = {name: 'RegexSet', ty: 'struct', relpath: ''};</script><script defer src="sidebar-items.js"></script>
|
||
</nav>
|
||
|
||
<nav class="sub">
|
||
<form class="search-form js-only">
|
||
<div class="search-container">
|
||
<input class="search-input" name="search"
|
||
autocomplete="off"
|
||
placeholder="Click or press ‘S’ to search, ‘?’ for more options…"
|
||
type="search">
|
||
</div>
|
||
</form>
|
||
</nav>
|
||
|
||
<section id='main' class="content">
|
||
<h1 class='fqn'><span class='in-band'>Struct <a href='index.html'>regex</a>::<wbr><a class="struct" href=''>RegexSet</a></span><span class='out-of-band'><span id='render-detail'>
|
||
<a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs">
|
||
[<span class='inner'>−</span>]
|
||
</a>
|
||
</span><a class='srclink' href='../src/regex/re_set.rs.html#105' title='goto source code'>[src]</a></span></h1>
|
||
<pre class='rust struct'>pub struct RegexSet(_);</pre><div class='docblock'><p>Match multiple (possibly overlapping) regular expressions in a single scan.</p>
|
||
|
||
<p>A regex set corresponds to the union of two or more regular expressions.
|
||
That is, a regex set will match text where at least one of its
|
||
constituent regular expressions matches. A regex set as its formulated here
|
||
provides a touch more power: it will also report <em>which</em> regular
|
||
expressions in the set match. Indeed, this is the key difference between
|
||
regex sets and a single <code>Regex</code> with many alternates, since only one
|
||
alternate can match at a time.</p>
|
||
|
||
<p>For example, consider regular expressions to match email addresses and
|
||
domains: <code>[a-z]+@[a-z]+\.(com|org|net)</code> and <code>[a-z]+\.(com|org|net)</code>. If a
|
||
regex set is constructed from those regexes, then searching the text
|
||
<code>foo@example.com</code> will report both regexes as matching. Of course, one
|
||
could accomplish this by compiling each regex on its own and doing two
|
||
searches over the text. The key advantage of using a regex set is that it
|
||
will report the matching regexes using a <em>single pass through the text</em>.
|
||
If one has hundreds or thousands of regexes to match repeatedly (like a URL
|
||
router for a complex web application or a user agent matcher), then a regex
|
||
set can realize huge performance gains.</p>
|
||
|
||
<h1 id='example' class='section-header'><a href='#example'>Example</a></h1>
|
||
<p>This shows how the above two regexes (for matching email addresses and
|
||
domains) might work:</p>
|
||
|
||
<pre class="rust rust-example-rendered">
|
||
<span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&</span>[
|
||
<span class="string">r"[a-z]+@[a-z]+\.(com|org|net)"</span>,
|
||
<span class="string">r"[a-z]+\.(com|org|net)"</span>,
|
||
]).<span class="ident">unwrap</span>();
|
||
|
||
<span class="comment">// Ask whether any regexes in the set match.</span>
|
||
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">set</span>.<span class="ident">is_match</span>(<span class="string">"foo@example.com"</span>));
|
||
|
||
<span class="comment">// Identify which regexes in the set match.</span>
|
||
<span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op"><</span>_<span class="op">></span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">"foo@example.com"</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>();
|
||
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="macro">vec</span><span class="macro">!</span>[<span class="number">0</span>, <span class="number">1</span>], <span class="ident">matches</span>);
|
||
|
||
<span class="comment">// Try again, but with text that only matches one of the regexes.</span>
|
||
<span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op"><</span>_<span class="op">></span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">"example.com"</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>();
|
||
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="macro">vec</span><span class="macro">!</span>[<span class="number">1</span>], <span class="ident">matches</span>);
|
||
|
||
<span class="comment">// Try again, but with text that doesn't match any regex in the set.</span>
|
||
<span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op"><</span>_<span class="op">></span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">"example"</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>();
|
||
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">matches</span>.<span class="ident">is_empty</span>());</pre>
|
||
|
||
<p>Note that it would be possible to adapt the above example to using <code>Regex</code>
|
||
with an expression like:</p>
|
||
|
||
<pre class="rust rust-example-rendered">
|
||
(<span class="question-mark">?</span><span class="ident">P</span><span class="op"><</span><span class="ident">email</span><span class="op">></span>[<span class="ident">a</span><span class="op">-</span><span class="ident">z</span>]<span class="op">+</span>@(<span class="question-mark">?</span><span class="ident">P</span><span class="op"><</span><span class="ident">email_domain</span><span class="op">></span>[<span class="ident">a</span><span class="op">-</span><span class="ident">z</span>]<span class="op">+</span>[.](<span class="ident">com</span><span class="op">|</span><span class="ident">org</span><span class="op">|</span><span class="ident">net</span>)))<span class="op">|</span>(<span class="question-mark">?</span><span class="ident">P</span><span class="op"><</span><span class="ident">domain</span><span class="op">></span>[<span class="ident">a</span><span class="op">-</span><span class="ident">z</span>]<span class="op">+</span>[.](<span class="ident">com</span><span class="op">|</span><span class="ident">org</span><span class="op">|</span><span class="ident">net</span>))</pre>
|
||
|
||
<p>After a match, one could then inspect the capture groups to figure out
|
||
which alternates matched. The problem is that it is hard to make this
|
||
approach scale when there are many regexes since the overlap between each
|
||
alternate isn't always obvious to reason about.</p>
|
||
|
||
<h1 id='limitations' class='section-header'><a href='#limitations'>Limitations</a></h1>
|
||
<p>Regex sets are limited to answering the following two questions:</p>
|
||
|
||
<ol>
|
||
<li>Does any regex in the set match?</li>
|
||
<li>If so, which regexes in the set match?</li>
|
||
</ol>
|
||
|
||
<p>As with the main <code>Regex</code> type, it is cheaper to ask (1) instead of (2)
|
||
since the matching engines can stop after the first match is found.</p>
|
||
|
||
<p>Other features like finding the location of successive matches or their
|
||
sub-captures aren't supported. If you need this functionality, the
|
||
recommended approach is to compile each regex in the set independently and
|
||
selectively match them based on which regexes in the set matched.</p>
|
||
|
||
<h1 id='performance' class='section-header'><a href='#performance'>Performance</a></h1>
|
||
<p>A <code>RegexSet</code> has the same performance characteristics as <code>Regex</code>. Namely,
|
||
search takes <code>O(mn)</code> time, where <code>m</code> is proportional to the size of the
|
||
regex set and <code>n</code> is proportional to the length of the search text.</p>
|
||
</div><h2 id='methods'>Methods</h2><h3 class='impl'><span class='in-band'><code>impl <a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a></code></span><span class='out-of-band'><div class='ghost'></div><a class='srclink' href='../src/regex/re_set.rs.html#107-207' title='goto source code'>[src]</a></span></h3>
|
||
<div class='impl-items'><h4 id='method.new' class="method"><span id='new.v' class='invisible'><code>fn <a href='#method.new' class='fnname'>new</a><I, S>(exprs: I) -> <a class="enum" href="https://doc.rust-lang.org/nightly/core/result/enum.Result.html" title="enum core::result::Result">Result</a><<a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a>, <a class="enum" href="../regex/enum.Error.html" title="enum regex::Error">Error</a>> <span class="where fmt-newline">where<br> S: <a class="trait" href="https://doc.rust-lang.org/nightly/core/convert/trait.AsRef.html" title="trait core::convert::AsRef">AsRef</a><<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.str.html">str</a>>,<br> I: <a class="trait" href="https://doc.rust-lang.org/nightly/core/iter/traits/trait.IntoIterator.html" title="trait core::iter::traits::IntoIterator">IntoIterator</a><Item = S>, </span></code></span></h4>
|
||
<div class='docblock'><p>Create a new regex set with the given regular expressions.</p>
|
||
|
||
<p>This takes an iterator of <code>S</code>, where <code>S</code> is something that can produce
|
||
a <code>&str</code>. If any of the strings in the iterator are not valid regular
|
||
expressions, then an error is returned.</p>
|
||
|
||
<h1 id='example-1' class='section-header'><a href='#example-1'>Example</a></h1>
|
||
<p>Create a new regex set from an iterator of strings:</p>
|
||
|
||
<pre class="rust rust-example-rendered">
|
||
<span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&</span>[<span class="string">r"\w+"</span>, <span class="string">r"\d+"</span>]).<span class="ident">unwrap</span>();
|
||
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">set</span>.<span class="ident">is_match</span>(<span class="string">"foo"</span>));</pre>
|
||
</div><h4 id='method.is_match' class="method"><span id='is_match.v' class='invisible'><code>fn <a href='#method.is_match' class='fnname'>is_match</a>(&self, text: &<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.str.html">str</a>) -> <a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.bool.html">bool</a></code></span></h4>
|
||
<div class='docblock'><p>Returns true if and only if one of the regexes in this set matches
|
||
the text given.</p>
|
||
|
||
<p>This method should be preferred if you only need to test whether any
|
||
of the regexes in the set should match, but don't care about <em>which</em>
|
||
regexes matched. This is because the underlying matching engine will
|
||
quit immediately after seeing the first match instead of continuing to
|
||
find all matches.</p>
|
||
|
||
<p>Note that as with searches using <code>Regex</code>, the expression is unanchored
|
||
by default. That is, if the regex does not start with <code>^</code> or <code>\A</code>, or
|
||
end with <code>$</code> or <code>\z</code>, then it is permitted to match anywhere in the
|
||
text.</p>
|
||
|
||
<h1 id='example-2' class='section-header'><a href='#example-2'>Example</a></h1>
|
||
<p>Tests whether a set matches some text:</p>
|
||
|
||
<pre class="rust rust-example-rendered">
|
||
<span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&</span>[<span class="string">r"\w+"</span>, <span class="string">r"\d+"</span>]).<span class="ident">unwrap</span>();
|
||
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">set</span>.<span class="ident">is_match</span>(<span class="string">"foo"</span>));
|
||
<span class="macro">assert</span><span class="macro">!</span>(<span class="op">!</span><span class="ident">set</span>.<span class="ident">is_match</span>(<span class="string">"☃"</span>));</pre>
|
||
</div><h4 id='method.matches' class="method"><span id='matches.v' class='invisible'><code>fn <a href='#method.matches' class='fnname'>matches</a>(&self, text: &<a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.str.html">str</a>) -> <a class="struct" href="../regex/struct.SetMatches.html" title="struct regex::SetMatches">SetMatches</a></code></span></h4>
|
||
<div class='docblock'><p>Returns the set of regular expressions that match in the given text.</p>
|
||
|
||
<p>The set returned contains the index of each regular expression that
|
||
matches in the given text. The index is in correspondence with the
|
||
order of regular expressions given to <code>RegexSet</code>'s constructor.</p>
|
||
|
||
<p>The set can also be used to iterate over the matched indices.</p>
|
||
|
||
<p>Note that as with searches using <code>Regex</code>, the expression is unanchored
|
||
by default. That is, if the regex does not start with <code>^</code> or <code>\A</code>, or
|
||
end with <code>$</code> or <code>\z</code>, then it is permitted to match anywhere in the
|
||
text.</p>
|
||
|
||
<h1 id='example-3' class='section-header'><a href='#example-3'>Example</a></h1>
|
||
<p>Tests which regular expressions match the given text:</p>
|
||
|
||
<pre class="rust rust-example-rendered">
|
||
<span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&</span>[
|
||
<span class="string">r"\w+"</span>,
|
||
<span class="string">r"\d+"</span>,
|
||
<span class="string">r"\pL+"</span>,
|
||
<span class="string">r"foo"</span>,
|
||
<span class="string">r"bar"</span>,
|
||
<span class="string">r"barfoo"</span>,
|
||
<span class="string">r"foobar"</span>,
|
||
]).<span class="ident">unwrap</span>();
|
||
<span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op"><</span>_<span class="op">></span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">"foobar"</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>();
|
||
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">matches</span>, <span class="macro">vec</span><span class="macro">!</span>[<span class="number">0</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">6</span>]);
|
||
|
||
<span class="comment">// You can also test whether a particular regex matched:</span>
|
||
<span class="kw">let</span> <span class="ident">matches</span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">"foobar"</span>);
|
||
<span class="macro">assert</span><span class="macro">!</span>(<span class="op">!</span><span class="ident">matches</span>.<span class="ident">matched</span>(<span class="number">5</span>));
|
||
<span class="macro">assert</span><span class="macro">!</span>(<span class="ident">matches</span>.<span class="ident">matched</span>(<span class="number">6</span>));</pre>
|
||
</div><h4 id='method.len' class="method"><span id='len.v' class='invisible'><code>fn <a href='#method.len' class='fnname'>len</a>(&self) -> <a class="primitive" href="https://doc.rust-lang.org/nightly/std/primitive.usize.html">usize</a></code></span></h4>
|
||
<div class='docblock'><p>Returns the total number of regular expressions in this set.</p>
|
||
</div></div><h2 id='implementations'>Trait Implementations</h2><h3 class='impl'><span class='in-band'><code>impl <a class="trait" href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html" title="trait core::clone::Clone">Clone</a> for <a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a></code></span><span class='out-of-band'><div class='ghost'></div><a class='srclink' href='../src/regex/re_set.rs.html#104' title='goto source code'>[src]</a></span></h3>
|
||
<div class='impl-items'><h4 id='method.clone' class="method"><span id='clone.v' class='invisible'><code>fn <a href='https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#tymethod.clone' class='fnname'>clone</a>(&self) -> <a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a></code></span></h4>
|
||
<div class='docblock'><p>Returns a copy of the value. <a href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#tymethod.clone">Read more</a></p>
|
||
</div><h4 id='method.clone_from' class="method"><span id='clone_from.v' class='invisible'><code>fn <a href='https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#method.clone_from' class='fnname'>clone_from</a>(&mut self, source: &Self)</code><div class='since' title='Stable since Rust version 1.0.0'>1.0.0</div></span></h4>
|
||
<div class='docblock'><p>Performs copy-assignment from <code>source</code>. <a href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#method.clone_from">Read more</a></p>
|
||
</div></div><h3 class='impl'><span class='in-band'><code>impl <a class="trait" href="https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html" title="trait core::fmt::Debug">Debug</a> for <a class="struct" href="../regex/struct.RegexSet.html" title="struct regex::RegexSet">RegexSet</a></code></span><span class='out-of-band'><div class='ghost'></div><a class='srclink' href='../src/regex/re_set.rs.html#331-335' title='goto source code'>[src]</a></span></h3>
|
||
<div class='impl-items'><h4 id='method.fmt' class="method"><span id='fmt.v' class='invisible'><code>fn <a href='https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html#tymethod.fmt' class='fnname'>fmt</a>(&self, f: &mut <a class="struct" href="https://doc.rust-lang.org/nightly/core/fmt/struct.Formatter.html" title="struct core::fmt::Formatter">Formatter</a>) -> <a class="type" href="https://doc.rust-lang.org/nightly/core/fmt/type.Result.html" title="type core::fmt::Result">Result</a></code></span></h4>
|
||
<div class='docblock'><p>Formats the value using the given formatter.</p>
|
||
</div></div></section>
|
||
<section id='search' class="content hidden"></section>
|
||
|
||
<section class="footer"></section>
|
||
|
||
<aside id="help" class="hidden">
|
||
<div>
|
||
<h1 class="hidden">Help</h1>
|
||
|
||
<div class="shortcuts">
|
||
<h2>Keyboard Shortcuts</h2>
|
||
|
||
<dl>
|
||
<dt>?</dt>
|
||
<dd>Show this help dialog</dd>
|
||
<dt>S</dt>
|
||
<dd>Focus the search field</dd>
|
||
<dt>⇤</dt>
|
||
<dd>Move up in search results</dd>
|
||
<dt>⇥</dt>
|
||
<dd>Move down in search results</dd>
|
||
<dt>⏎</dt>
|
||
<dd>Go to active search result</dd>
|
||
<dt>+</dt>
|
||
<dd>Collapse/expand all sections</dd>
|
||
</dl>
|
||
</div>
|
||
|
||
<div class="infos">
|
||
<h2>Search Tricks</h2>
|
||
|
||
<p>
|
||
Prefix searches with a type followed by a colon (e.g.
|
||
<code>fn:</code>) to restrict the search to a given type.
|
||
</p>
|
||
|
||
<p>
|
||
Accepted types are: <code>fn</code>, <code>mod</code>,
|
||
<code>struct</code>, <code>enum</code>,
|
||
<code>trait</code>, <code>type</code>, <code>macro</code>,
|
||
and <code>const</code>.
|
||
</p>
|
||
|
||
<p>
|
||
Search functions by type signature (e.g.
|
||
<code>vec -> usize</code> or <code>* -> vec</code>)
|
||
</p>
|
||
</div>
|
||
</div>
|
||
</aside>
|
||
|
||
|
||
|
||
<script>
|
||
window.rootPath = "../";
|
||
window.currentCrate = "regex";
|
||
</script>
|
||
<script src="../main.js"></script>
|
||
<script defer src="../search-index.js"></script>
|
||
</body>
|
||
</html> |