You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think, it may be an intended behaviour, but I did't find it mentioned anywhere in the docs. Sorry, if it is already discussed somewhere I haven't looked ...
It seems, that in the unicode patterns like ur"..." regex implicitely sets the unicode flag (?u), while re doesn't seem to do that.
Ah, yes, if the pattern is a Unicode string then the matching defaults to Unicode, and if the pattern is a bytestring then the matching defaults to ASCII.
You can be explicit with regex.UNICODE or "(?u)" and regex.ASCII or "(?a)".
The justification is that if you're using Unicode strings then you probably want Unicode matching too. I'll make a note to update the docs at some point (I don't have any other changes planned).
I would be willing to make it the same as the 're' module if the general consensus is that it should be.
Thanks for confirmation; I was just a bit surprised seeing different results in a script (using re) and my general app (using regex normally), where I didn't expect a difference between these re engines.
I am happy with either behaviour; the (?u) can be simply added if needed and is more explicit; on the other hand the unicode flag is global and cannot be switched off - if one needed an unicode string pattern with special sequences to be interpreted in ascii, [a-zA-Z0-9_] would be necessary instead of \w (if I understand correctly).
But that being said, I have no strong personal preference, now that it is documented. It would depend on the inclusion policy into the standard library (e.g. whether to include this behaviour to the NEW flag).
Original report by Anonymous.
Hi,
I think, it may be an intended behaviour, but I did't find it mentioned anywhere in the docs. Sorry, if it is already discussed somewhere I haven't looked ...
It seems, that in the unicode patterns like ur"..." regex implicitely sets the unicode flag (?u), while re doesn't seem to do that.
Python 2.7.1, win XPp SP3, 32 bit Czech; regex r902c02d44f
regards,
Vlastimil Brom
The text was updated successfully, but these errors were encountered: