You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's not a bug, just an inefficiency. The comment in the code excerpt I posted calls it out specifically:
In the worst case, we're reading the unread part of self._buffer twice here, once in the if condition and once when calling index. This is sub-optimal, but better than the alternative: wrapping .index in a try..except, because that is slower.
The call to __contains__ is going to iterate over every element in the buffer until EOL or the end of the buffer and if EOL is found index will iterate over all of the same elements again. Even though these are built-ins, it's still a performance hit, especially for large buffers. Using find removes the extra iteration.
Hi,
BufferedInputBase
may run through the buffer multiple times while runningreadline
:https://github.com/RaRe-Technologies/smart_open/blob/1770807b0c8da2a34f749afa17ff70a715bc85de/smart_open/s3.py#L307-L315
I think we can remove this inefficiency if we use
find
instead ofindex
. E.g.:Does this make sense or am I missing something?
-- Eric
The text was updated successfully, but these errors were encountered: