Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: expose 'with_flags' to allow compilation of unicode patterns #51

Closed
sirosen opened this issue Nov 2, 2024 · 3 comments · Fixed by #52
Closed

Feature request: expose 'with_flags' to allow compilation of unicode patterns #51

sirosen opened this issue Nov 2, 2024 · 3 comments · Fixed by #52
Labels
enhancement New feature or request

Comments

@sirosen
Copy link
Contributor

sirosen commented Nov 2, 2024

Context

I got to spend a few hours working on getting regress support for pattern into check-jsonschema, and it works great! ... On version 0.4.5! 😅

I don't fully understand the issue yet (need more time to research), but I think that I need to pass the "u" flag explicitly to Regress.

Purely if you're curious, my current work is here and is passing in CI with v0.4.5 pinned: https://github.com/python-jsonschema/check-jsonschema/compare/use-regress-for-patterns

MRE

The following python sample demonstrates a change in behavior since 0.4.5:

import regress
pattern_str = '^\\p{L}\\p{Z}\\p{N}$'
pattern = regress.Regex(pattern_str)
print(pattern.find("a 0"))

On 0.4.5 I get a match, and on 2024.8.1 I get None.

To save anyone the effort of looking up those character classes, roughly: L is "letters", Z is "spaces", and N is "numbers".

Change proposal

I'm not sure if the API in Python needs to reflect the regress crate perfectly.
I was thinking that the ideal might be...

pattern = regress.Regex(pattern_str, flags=regress.U)

So that we nicely match the re module's API.

I'd like to work on this when I can throw some cycles at this, if this seems like a good path.

@Julian
Copy link
Member

Julian commented Nov 4, 2024

That API looks reasonable to me! And your branch in check-jsonschema looks great!

@Julian Julian added the enhancement New feature or request label Nov 4, 2024
@sirosen
Copy link
Contributor Author

sirosen commented Nov 4, 2024

I've been putting together a branch with this and ran into quite a bit of difficulty trying to expose a bitwise-or friendly flag enum. I'm sure a PyO3 expert could solve it, but I'm somewhat stumped.

Even if I can figure it out, I feel like it adds disproportionately much new rust code just to expose a minor nicety.
For contrast, I had it working in no time just accepting flags as a string.

So I'm now proposing:

pattern = regress.Regex(pattern_str, flags="u")

Which would also accept flags=None (default) and flags="iu" (and whatever other string flags regress supports).

I'll have a PR up for this when I get back to my desk, later today, and I might share some of what I tried and struggled with back here.

sirosen added a commit to sirosen/regress that referenced this issue Nov 4, 2024
resolves crate-py#51

There is no support for `Flags` structs as provided by the `regress`
crate. But a string can be passed down and is converted to Flags by
the `with_flags` constructor.
Exposing flags as a struct/enum adds significant complexity deemd not
worth the addition code at this time. (See crate-py#51 for more details.)

A python test verifies that the alternate constructor pattern works
and provides support for the `"u"` flag.
sirosen added a commit to sirosen/regress that referenced this issue Nov 4, 2024
resolves crate-py#51

There is no support for `Flags` structs as provided by the `regress`
crate. But a string can be passed down and is converted to Flags by
the `with_flags` constructor.
Exposing flags as a struct/enum adds significant complexity deemed
not worth the additional code at this time.
(See crate-py#51 for more details.)

A python test verifies that the alternate constructor pattern works
and provides support for the `"u"` flag.
@sirosen
Copy link
Contributor Author

sirosen commented Nov 4, 2024

Okay, I put up #52 with my initial solution. Here's a patch on top of it which... didn't quite work:

flags-enum.diff
diff --git a/src/lib.rs b/src/lib.rs
index e2df317..ca0f1a3 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -1,5 +1,5 @@
 use pyo3::{create_exception, prelude::*, types::PySlice};
-use regress::{Match, Range, Regex};
+use regress::{Flags, Match, Range, Regex};
 
 create_exception!(regress, RegressError, pyo3::exceptions::PyException);
 
@@ -13,12 +13,15 @@ struct RegexPy {
 impl RegexPy {
     #[new]
     #[pyo3(signature = (value, flags=None))]
-    fn init(value: &str, flags: Option<&str>) -> PyResult<Self> {
+    fn init(value: &str, flags: Option<&FlagTypes>) -> PyResult<Self> {
         match flags {
-            Some(f) => match Regex::with_flags(value, f) {
-                Ok(inner) => Ok(RegexPy { inner }),
-                Err(e) => Err(RegressError::new_err(e.to_string())),
-            },
+            Some(f) => {
+                let regress_flags: Flags = Flags::from(f);
+                match Regex::with_flags(value, regress_flags) {
+                    Ok(inner) => Ok(RegexPy { inner }),
+                    Err(e) => Err(RegressError::new_err(e.to_string())),
+                }
+            }
             None => match Regex::new(value) {
                 Ok(inner) => Ok(RegexPy { inner }),
                 Err(e) => Err(RegressError::new_err(e.to_string())),
@@ -38,6 +41,42 @@ impl RegexPy {
     }
 }
 
+#[repr(transparent)]
+#[pyclass(name = "Flags", module = "regress", frozen)]
+#[derive(Clone)]
+struct FlagsPy {
+    inner: Flags,
+}
+
+#[pymethods]
+impl FlagsPy {
+    #[new]
+    fn init(value: &str) -> Self {
+        FlagsPy {
+            inner: Flags::from(value),
+        }
+    }
+}
+
+#[derive(FromPyObject)]
+enum FlagTypes {
+    #[pyo3(annotation = "regress.Flags")]
+    FlagsPy(FlagsPy),
+    #[pyo3(transparent, annotation = "str")]
+    String(String),
+}
+
+impl From<&FlagTypes> for Flags {
+    // convert any Flag type (FlagsPy or str) to regress Flags
+    #[inline]
+    fn from(f: &FlagTypes) -> Self {
+        match f {
+            FlagTypes::FlagsPy(flags_py) => flags_py.inner,
+            FlagTypes::String(s) => Flags::from(s.as_str()),
+        }
+    }
+}
+
 #[repr(transparent)]
 #[pyclass(name = "Match", module = "regress", frozen)]
 struct MatchPy {
@@ -70,6 +109,7 @@ impl MatchPy {
 fn regress_py(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
     m.add_class::<MatchPy>()?;
     m.add_class::<RegexPy>()?;
+    m.add_class::<FlagsPy>()?;
     m.add("RegressError", py.get_type_bound::<RegressError>())?;
     Ok(())
 }
diff --git a/tests/test_regress.py b/tests/test_regress.py
index c8e4f1d..a146425 100644
--- a/tests/test_regress.py
+++ b/tests/test_regress.py
@@ -72,3 +72,17 @@ def test_with_flags():
     flagless_match = flagless_regex.find("a 0")
     assert match is not None
     assert flagless_match is None
+
+
+def test_with_flags_struct():
+    # same as the above flags test, but using the struct variant
+    # "L" for letters, "Z" for spaces, "N" for numerics
+    pattern = r"^\p{L}\p{Z}\p{N}$"
+
+    regex = regress.Regex(pattern, flags=regress.Flags("u"))
+    flagless_regex = regress.Regex(pattern)
+
+    match = regex.find("a 0")
+    flagless_match = flagless_regex.find("a 0")
+    assert match is not None
+    assert flagless_match is None

Basically, I tried to add a FlagsPy construct, which would be exposed as regress.Flags. This was meant to be my first step towards a more sophisticated implementation. But it somewhat bombs out due to the way I defined the FlagTypes enum -- trying to put that into the init signature somehow blows up on me, with errors to the tune of

16  |     fn init(value: &str, flags: Option<&FlagTypes>) -> PyResult<Self> {
    |                                 ^^^^^^ the trait `PyClass` is not implemented for `&FlagTypes`

This surprised me. I was trying to make that enum express the Python types.UnionType of str | regress.Flags. I'm getting a bit out of my comfort zone with this, and given that the pure string version works and is sufficient for my needs... #52 is all I really want/need for the foreseeable future.

@Julian Julian closed this as completed in #52 Nov 8, 2024
@Julian Julian closed this as completed in 1dab439 Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants