Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacement of internal RegEx with PCRE2 #10148

Merged
merged 1 commit into from
Aug 31, 2017
Merged

Replacement of internal RegEx with PCRE2 #10148

merged 1 commit into from
Aug 31, 2017

Conversation

leezh
Copy link
Contributor

@leezh leezh commented Aug 7, 2017

Turns out PCRE2 was not as heavy as I thought and was fairly easy to integrate into Godot. This is the minimal feature-parity-ish commit. Plans to introduce serialised compilations and JIT speed-ups would come separately later.

That said, discussion is needed for modules/regex/SCsub. Due to how wchar_t can be 32 bit on *nix but 16-bit on Windows, it is necessary to know at compile-time its width. Unfortunately, I'm not sure the tests in modules/regex/SCsub is the right place (or even the right method).

If someone more familiar with SCons can help me out, that'd be great.

Notable minor breaking changes:

  • The pattern and replacement matching behaviour has been changed purely due to the nature of switching to a standards-compliant library.
  • One mistake in the previous behaviour was that named groups didn't have a number. This has been corrected.
  • As names are actually just an alias of numbered groups, RegExMatch::get_name_dict() has been removed. Also, for clarity, RegExMatch::get_group_array() has been renamed.
  • Due the lack of a suitable equivalent in PCRE2, RegExMatch::expand() was removed.

@ghost
Copy link

ghost commented Aug 7, 2017

re2 or hyperscan perform much better than pcre, so are there any good reasons that I'm missing for choosing pcre over them? Its performance with JIT becomes comparable to them, apparently, but JIT may not be available in all platforms. Also re2 has a linear-time processing (which comes at the cost of dropping back-references).

@leezh
Copy link
Contributor Author

leezh commented Aug 7, 2017

The biggest motivation for picking PCRE is because I can easily choose between 16-bit character or 32-bit character strings at compile time. In doing so I can get it to play nice with Godot's built-in String type rather than converting back and forth between UTF-8.

@ghost
Copy link

ghost commented Aug 7, 2017

Ah, does this String class use UCS-4?

@leezh
Copy link
Contributor Author

leezh commented Aug 7, 2017 via email

@ghost
Copy link

ghost commented Aug 8, 2017

I'm guessing godot's string class isn't using utf8 for historical reasons; maybe someone knows if there are any plans for a switch?

@leezh
Copy link
Contributor Author

leezh commented Aug 8, 2017 via email

}

void RegEx::_bind_methods() {

ClassDB::bind_method(D_METHOD("clear"), &RegEx::clear);
ClassDB::bind_method(D_METHOD("compile", "pattern"), &RegEx::compile);
ClassDB::bind_method(D_METHOD("search:RegExMatch", "text", "start", "end"), &RegEx::search, DEFVAL(0), DEFVAL(-1));
ClassDB::bind_method(D_METHOD("sub", "text", "replacement", "all", "start", "end"), &RegEx::sub, DEFVAL(false), DEFVAL(0), DEFVAL(-1));
ClassDB::bind_method(D_METHOD("search:RegExMatch", "subject", "offset", "end"), &RegEx::search, DEFVAL(0), DEFVAL(-1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting today, the :type suffix must not be used anymore, except for virtual method binds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Also fixed the merge conflicts.

@akien-mga
Copy link
Member

There seems to be a linking issue on Windows (at least old mingw version on Ubuntu 14.04): https://travis-ci.org/godotengine/godot/jobs/263722203

@leezh
Copy link
Contributor Author

leezh commented Aug 14, 2017

Yeah, realised it was missing the -DPCRE_STATIC flag and a few others. It should be fixed now.

env_pcre2.Append(CPPFLAGS=thirdparty_flags)
env_pcre2.Append(CPPFLAGS=["-DPCRE2_CODE_UNIT_WIDTH=" + width])

if (env['builtin_pcre2'] != 'no'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the above should go inside the if actually, when building against the system pcre2 (typically on Linux), we only want to build the module files themselves, not reference anything in thirdparty_dir.

Could you also add the platform/x11/detect.py detection of pcre2 with pkgconfig, as done for other opt-out thirdparty deps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If you're not on Linux, I could likely add it myself after merge if need be)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've moved as much as possible into the if block to allow external linking.

On X11, builtin_pcre2 is still by default set to yes, as it's not available on Ubuntu Trusty (and thus Travis-CI). Is there a way to automatically select yes/no if the native lib is not-found/found?

The pattern and replacement matching behaviour has been changed purely
due to the nature of switching to a standards-compliant library.

One mistake in the previous behaviour was that named groups didn't have
a number. This has been corrected.

As names are actually just an alias of numbered groups,
RegExMatch::get_name_dict() is now get_names() and is a dict
referring to the group number it represents.

Duplicate names are enabled and the with the first matching instance
used.

Due the lack of a suitable equivalent in PCRE2, RegExMatch::expand() was
removed.
@akien-mga akien-mga merged commit 0cee288 into godotengine:master Aug 31, 2017
@leezh leezh deleted the pcre2 branch September 1, 2017 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants