-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add PyString::chars #2451
add PyString::chars #2451
Conversation
pub fn chars(&self) -> impl ExactSizeIterator<Item = PyResult<char>> + '_ { | ||
unsafe { | ||
let len = ffi::PyUnicode_GetLength(self.as_ptr()); | ||
(0..len).map(move |i| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the implementation is based on indexing, would it make sense to expose this a the interface as well? Something like char_at(&self, index: usize) -> Option<char>
so that the iterator can be produced on the outside?
Thanks for the PR :)
I've benchmarked this method and found that it generally is around twice as slow on ascii strings compared to |
In my benchmarks this is slightly faster. But my usecase is quite specific. I have to modify some unicode chars. So any ascii data just gets returned as is. And all strings passed in are newly allocated and thus have no utf8 string cached. As it is slower in the general case. Maybe its best to close this. |
I see a use case where it forwards to |
Ah yes the way i check for ascii is indeed using the data method. What i dont completely understand is why the method is unsafe. The c bitfield is decoded using functions in python, and i would (maybe wrongly) assume that all functions provided by python are safe and cross platform. The docs dont seem to mentions anything https://docs.python.org/3/c-api/unicode.html |
Unfortunately the "function" for checking this ( |
Useful helper method to avoid allocating anything while iterating over a PyUnicode string. The AsUTF8 methods will allocate and cache the utf8 str on the python heap.
Please consider adding the following to your pull request: