-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial matches #127
Comments
That's essentially what I was thinking when I wrote "hacking the DFA execution engine to handle input strings in chunks" in #126 (comment). In theory, you'd just have to pass around a How would this fit into the current interface and implementation? I'm struggling to imagine how the public API should look, for example. And if the caller wants to know where the regular expression matched, not just whether it matched, then you'd have to run the DFA backwards in order to find the start of the match. What if you need to read previous chunks? What if the caller doesn't have previous chunks? Then there's the whole problem of elegantly plumbing and wiring the needful. Also, because the DFA execution engine permits concurrent use of a Sorry, this isn't going to happen in the foreseeable future. :( (Ping, @rsc and @BurntSushi.) |
Agreed. |
Thanks for the detailed and thoughtful reply. Ultimately, my goal is to be able to use RE2 in Emacs; it already works nicely for strings, but not for buffer contents, as Emacs stores buffer text in a gap buffer. One can think of this as essentially two strings, and indeed in the past glibc had a Ideally, a solution for this use case would also cover similar ones, like ropes or piece tables. There are multiple possible interfaces for this, but here are the three that I can think of:
It looks like 1 would be pretty complex to implement, and 2 would be prohibitively costly, so 3 may be the only option left? |
FWIW, glibc still has the
You'd have to do this for the general case, not just for the specific case of two strings, and you'd have to use Google's Again, this isn't going to happen in the foreseeable future. |
P.S. Having now seen k-takata/Onigmo#83 and kkos/oniguruma#45, I'm rather curious about your ultimate goal. |
No, not really: static int
internal_function
re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
int length1, const char *string2, int length2, int start,
int range, struct re_registers *regs,
int stop, int ret_len)
{
const char *str;
int rval;
int len = length1 + length2;
char *s = NULL;
…
/* Concatenate the strings. */
if (length2 > 0)
if (length1 > 0)
{
s = re_malloc (char, len);
if (BE (s == NULL, 0))
return -2;
memcpy (s, string1, length1);
memcpy (s + length1, string2, length2);
str = s;
}
else
str = string2;
else
str = string1;
rval = re_search_stub (bufp, str, len, start, range, stop, regs, ret_len);
re_free (s);
return rval;
}
Thanks for taking the time to clarify! |
As an alternative to matching on arbitrary input streams, how hard would it be to support partial matches (in PCRE's sense)?
That is, how hard would it be to extend the API so that one could write something along the lines of:
Such a feature would make it easy to use RE2 to search Emacs' gap buffer.
The text was updated successfully, but these errors were encountered: