-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-74028: concurrent.futures.Executor.map
: introduce buffersize
param for lazier behavior
#125663
base: main
Are you sure you want to change the base?
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
1 similar comment
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
6a58c7d
to
21f7b8d
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Thanks for the PR. First, I think this is a big behavior change for Executor. I think we need to discuss it in the https://discuss.python.org/ first. In my personal opinion, I think this is not a good choice to add the
|
9eef605
to
e5c867a
Compare
Hi @Zheaoli, thank you for your comment!
You mean big alternative behavior, right? (the default behavior when ommitting
Fair, I will start a thread there and ping you.
I'm not sure to get it, could you detail that point? 🙏🏻
You are completely right, makes more sense! I have fixed that (commit) |
8bf7be7
to
769060e
Compare
For me, the basic |
Hi @Zheaoli
There may be a misunderstanding here, the goal of this PR is precisely to make I will recap the behaviors so that everybody is on the same page: built-in
|
769060e
to
be419ed
Compare
hey @rruuaanng, fyi I have applied your requested changes regarding the integration of unit tests into existing class 🙏🏻 |
Lib/concurrent/futures/_base.py
Outdated
args_iter = iter(zip(*iterables)) | ||
if buffersize: | ||
fs = collections.deque( | ||
self.submit(fn, *args) for args in islice(args_iter, buffersize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't buffersize empty? Can you introduce it? (Forgive me for not understanding it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely np, thank you for taking the time to review my proposal. To be sure to understand the question well, what do you mean by "Isn't buffersize empty?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rruuaanng , I have reworked the PR's description, I hope it makes things clearer!
e28a0f0
to
bb0e747
Compare
Hey @NewUserHa @AA-Turner @serhiy-storchaka, this may interest you given your recent activity on #14221 🙏🏻 |
Misc/NEWS.d/next/Library/2024-10-18-10-27-54.gh-issue-74028.4d4vVD.rst
Outdated
Show resolved
Hide resolved
Lib/concurrent/futures/_base.py
Outdated
if ( | ||
buffersize | ||
and (executor := executor_weakref()) | ||
and (args := next(args_iter, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args
may be empty, so you need to check for args is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you refering to the case where one call executor.map(func)
without any input iterable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You can't always assume that func
needs an input (or do you?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right! But in such a case we don't enter the while fs:
(fs
being empty in that case), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picnixz I have added unit tests checking the behavior with multiple input iterables and without any input iterables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in such a case we don't enter the while fs.
Not necessarily. What I meant is that you call executor.map
with an input iterable that yields args = ()
everytime.
Note that it also doesn't hurt to check is not None
because it's probably slightly faster since otherwise you need to call __bool__
on the args being yielded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for example a call like executor.map(func, [()])
? In such a call we get iterables = ([()],)
and args_iter = iter(zip(*([()],)))
and next(args_iter,)
will be ((),)
(not ()
). You may have missed the zip
ing in your reasoning?
In term of pure readability of the code I struggle to have an opinion, do you feel that (args := next(args_iter, None)) is not None
is more natural?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may have missed the ziping in your reasoning?
I did :) Sorry, my bad!
do you feel that (args := next(args_iter, None)) is not None is more natural?
I feel it would at least help avoiding questions like mine! (and it would still be probably slightly better performance wise but this claim is just my gut feeling).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picnixz oh yes I see... I have renamed args_iter
into a more self-explanatory zipped_iterables
, do you think it would be enough to avoid the confusion?
(Because I am scared that the addition of is not None
may misslead some of our fellow pythonistas wondering "wait, why is this not None check necessary here, what am I missing here 🤔?")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I like having the is not None
just so that I don't have to wonder what's args_iter
is precisely yielding. I can assume that it's yielding a tuple-like object, but I don't necessarily know the shape of that tuple. So is not None
discriminates probable items and the sentinel value. So I'd say it's still pythonic.
Performance-wise it should be roughly the same (one checks that the tuple's size != 0 and the other just compares if it's the None singleton but both are essentially a single comparison).
Now up to you. If others didn't observe (like me) that args_iter
never yields an empty tuple, then it's probably better to keep the is not None
check for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for your review @picnixz 🙏🏻 !
Lib/concurrent/futures/_base.py
Outdated
if ( | ||
buffersize | ||
and (executor := executor_weakref()) | ||
and (args := next(args_iter, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you refering to the case where one call executor.map(func)
without any input iterable?
1ccc07f
to
5d63a05
Compare
bd0b9b4
to
579ba31
Compare
No problem, but please refrain from force pushing. Everything will be squash-merged in the end.
https://devguide.python.org/getting-started/pull-request-lifecycle/#quick-guide Thank you! |
@hugovk ok, will merge main instead of rebasing next time, thanks for the pointer! 🙏🏻 |
Hi @gpshead, whenever you get a chance, your feedback on this would be really appreciated 🙏🏻 |
Context recap (#74028)
Let's consider that we have an input
iterable
andN = len(iterable)
.Current$O(N)$ in space (unecessarily expensive on large iterables, completely impossible to use on infinite iterables):$N$ tasks to the $N$ ). Following calls to
concurrent.futures.Executor.map
isThe call
results: Iterator = executor.map(func, iterable)
iterates over all the elements of theiterable
, submittingexecutor
(futures collected into a list of sizenext(results)
take the oldest future from the list (FIFO), then wait for its result and return it.Proposal: add an optional
buffersize
paramWith this proposal, the call$b$ elements of $b$ tasks to the
results: Iterator = executor.map(func, iterable, buffersize=b)
will iterate only over the firstiterable
, submittingexecutor
(futures stored in the bufferdeque
) and then will return the results iterator.Calls to
next(results)
will get the next input element fromiterable
and submit a task to theexecutor
for it (enqueuing another future), then wait for the oldest future in the buffer queue to complete (FIFO), then return the result.Benefits:
buffersize
the client code takes back the control over the speed of iteration over the inputiterable
: after an initial spike offunc
to fill the buffer, the iteration over inputiterable
will follow the rate of the iteration over theresults
(controlled by the client), which is critical whenfunc
involves talking to services that you don't want to overload.Why a new PR
It turns out it is very similar to the initial work of @MojoVampire in #707 back in 2017 (followed up by @graingert in #18566 and @Jason-Y-Z in #114975): use a queue of fixed size to hold the not-yet-yielded future results.
In addition this PR:
buffersize=None
(default)📚 Documentation preview 📚: https://cpython-previews--125663.org.readthedocs.build/