Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize io module #9

Closed
cgarciae opened this issue Sep 20, 2018 · 11 comments
Closed

optimize io module #9

cgarciae opened this issue Sep 20, 2018 · 11 comments

Comments

@cgarciae
Copy link
Owner

http://www.artificialworlds.net/blog/2017/06/12/making-100-million-requests-with-python-aiohttp/comment-page-1/#comment-186106

@andybalaam
Copy link

Using the benchmark described here: http://www.artificialworlds.net/blog/2017/06/12/making-100-million-requests-with-python-aiohttp/ to run this program

#!/usr/bin/env python3.6

from aiohttp import ClientSession
from pypeln import io
import asyncio
import sys

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

async def main():
    r = int(sys.argv[1])
    url = "http://localhost:8080/{}"

    async with ClientSession() as session:
        data = range(r)
        await io.each(lambda i: fetch(url, session), data, workers=1000, run = False)


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

It runs much slower than the async client described in that article:

$ ./timed ./client-pypeln 10000
Memory usage: 42960KB	Time: 157.83 seconds

@andybalaam
Copy link

This is still the case this morning even though I ran

pip3 install -U git+https://github.com/cgarciae/pypeln@develop

at about 2018-09-21 08:00 UTC.

Comparison with the numbers from that article:

$ ./timed ./client-async-sem 10000
Memory usage: 77912KB	Time: 18.10 seconds
$ ./timed ./client-async-as-completed 10000
Memory usage: 46780KB	Time: 17.86 seconds

@andybalaam
Copy link

$ ./timed python3.6 -m cProfile -o prf.txt ./client-pypeln 10000
Memory usage: 45884KB	Time: 156.17 seconds
$ python3 -c "from pstats import Stats; Stats('prf.txt').sort_stats('cumulative').print_stats()" | head -n 50
Fri Sep 21 09:37:20 2018    prf.txt

         8954144 function calls (8928893 primitive calls) in 156.028 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    288/1    0.012    0.000  156.052  156.052 {built-in method builtins.exec}
        1    0.000    0.000  156.052  156.052 ./client-pypeln:3(<module>)
        1    0.000    0.000  155.820  155.820 /usr/lib/python3.6/asyncio/base_events.py:433(run_until_complete)
        1    0.024    0.024  155.820  155.820 /usr/lib/python3.6/asyncio/base_events.py:405(run_forever)
    29866    0.348    0.000  155.796    0.005 /usr/lib/python3.6/asyncio/base_events.py:1336(_run_once)
    29866    0.100    0.000  146.737    0.005 /usr/lib/python3.6/selectors.py:428(select)
    29866  146.595    0.005  146.595    0.005 {method 'poll' of 'select.epoll' objects}
    67027    0.141    0.000    8.547    0.000 /usr/lib/python3.6/asyncio/events.py:143(_run)
    27701    0.069    0.000    6.944    0.000 /home/andrebal/.local/lib/python3.6/site-packages/pypeln/io.py:218(f_task)
    27701    0.060    0.000    6.847    0.000 ./client-pypeln:8(fetch)
    27694    0.069    0.000    6.599    0.000 /usr/lib/python3/dist-packages/aiohttp/client.py:778(__aenter__)
    27694    0.366    0.000    6.526    0.000 /usr/lib/python3/dist-packages/aiohttp/client.py:179(_request)
    10000    0.098    0.000    1.340    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:162(__init__)
    40002    0.122    0.000    1.326    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:144(__init__)
    30000    0.063    0.000    1.305    0.000 /usr/lib/python3/dist-packages/idna/core.py:286(ulabel)
    50000    0.074    0.000    1.300    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:48(__get__)
    30000    0.278    0.000    1.184    0.000 /usr/lib/python3/dist-packages/idna/core.py:231(check_label)
    20000    0.039    0.000    1.121    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:419(host)
    10000    0.023    0.000    0.976    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:644(_make_netloc)
    20000    0.064    0.000    0.975    0.000 /usr/lib/python3/dist-packages/idna/core.py:364(decode)
    10000    0.033    0.000    0.953    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:626(_encode_host)
    17694    0.077    0.000    0.912    0.000 /usr/lib/python3/dist-packages/aiohttp/connector.py:360(connect)
    10000    0.141    0.000    0.789    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:464(send)
    10000    0.042    0.000    0.731    0.000 /usr/lib/python3/dist-packages/idna/core.py:335(encode)
    10007    0.031    0.000    0.710    0.000 /usr/lib/python3.6/asyncio/selector_events.py:719(_read_ready)
    10000    0.021    0.000    0.693    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:236(update_host)
    10000    0.028    0.000    0.684    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:220(connection_key)
    20000    0.146    0.000    0.679    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:662(start)
   270000    0.292    0.000    0.592    0.000 /usr/lib/python3/dist-packages/idna/intranges.py:38(intranges_contain)
    10000    0.009    0.000    0.582    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:224(host)
    10007    0.086    0.000    0.521    0.000 /usr/lib/python3/dist-packages/aiohttp/client_proto.py:140(data_received)
    10000    0.014    0.000    0.466    0.000 /usr/lib/python3/dist-packages/idna/core.py:258(alabel)
     7401    0.003    0.000    0.415    0.000 /home/andrebal/.local/lib/python3.6/site-packages/pypeln/io.py:216(_each)
     7401    0.023    0.000    0.412    0.000 /home/andrebal/.local/lib/python3.6/site-packages/pypeln/io.py:99(_run_tasks)
    10000    0.130    0.000    0.393    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:272(update_auto_headers)
    10000    0.009    0.000    0.359    0.000 /usr/lib/python3/dist-packages/aiohttp/streams.py:142(on_eof)
    10000    0.041    0.000    0.353    0.000 /usr/lib/python3/dist-packages/aiohttp/http_writer.py:94(write_headers)
    10000    0.035    0.000    0.350    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:717(_response_eof)
    10007    0.237    0.000    0.341    0.000 {method 'feed_data' of 'aiohttp._http_parser.HttpParser' objects}
    60101    0.103    0.000    0.270    0.000 /usr/lib/python3.6/urllib/parse.py:154(hostname)
    10000    0.025    0.000    0.269    0.000 /usr/lib/python3/dist-packages/aiohttp/connector.py:110(release)
    50101    0.040    0.000    0.263    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:408(raw_host)
    17398    0.030    0.000    0.255    0.000 /home/andrebal/.local/lib/python3.6/site-packages/pypeln/utils_async.py:15(put)

Here is the raw profile: prf.txt

@cgarciae
Copy link
Owner Author

cgarciae commented Sep 21, 2018

@andybalaam thanks for looking into this. I am not able to reproduce your published results. I am getting the following numbers in which all 3 clients have similar performance and are consistent with the pypeln benchmark you are showing:

client-async-sem

➜ bash timed.sh python client-async-sem.py 10_000
Memory usage: 73760KB   Time: 151.98 seconds    CPU usage: 5%

Uses more more memory as expected.

client-async-as-completed

➜ bash timed.sh python client-async-as-completed.py 10_000
Memory usage: 48472KB   Time: 154.81 seconds    CPU usage: 100%

Uses a lot of CPU, this might be because of the use of asyncio.sleep(0), slightly slower but more memory efficient.

client-pypeln-io

➜ bash timed.sh python client-pypeln-io.py 10_000
Memory usage: 50720KB   Time: 151.93 seconds    CPU usage: 5%

By the numbers its pretty good balanced.

As I said before on a comment, your original code had to be slightly modified because now ClientSession only admits Async Context Managers. I made minimal modifications to be able to run them, you can check the sources here:

https://github.com/cgarciae/pypeln/tree/develop/benchmarks/100_million_downloads

I am using

➜ python --version
Python 3.6.3 :: Anaconda, Inc.

@cgarciae
Copy link
Owner Author

cgarciae commented Sep 21, 2018

@andybalaam I don't know if its a coincidence but I am getting times in the 17-20 seconds range on all 3 clients with 1_000 requests:

➜ bash timed.sh python client-async-sem.py 1_000
Memory usage: 45752KB   Time: 17.57 seconds     CPU usage: 5%
➜ bash timed.sh python client-async-as-completed.py 1_000
Memory usage: 45084KB   Time: 18.17 seconds     CPU usage: 100%
➜ bash timed.sh python client-pypeln-io.py 1_000
Memory usage: 46476KB   Time: 17.60 seconds     CPU usage: 4%

Maybe you added an extra 0 by accident on the blog and published 10000 instead of 1000?

@cgarciae
Copy link
Owner Author

@andybalaam figured it out! It was the limit in the asyncio.TCPConnector which is 100 by default.

@cgarciae
Copy link
Owner Author

@andybalaam
Copy link

Awesome! By the way I am usually referred to as "Andy".

@andybalaam
Copy link

Woohoo!

$ ./timed ./client-pypeln-io 10
Memory usage: 29080KB	Time: 3.29 seconds
$ ./timed ./client-pypeln-io 100
Memory usage: 30532KB	Time: 3.32 seconds
$ ./timed ./client-pypeln-io 1000
Memory usage: 48720KB	Time: 4.90 seconds
$ ./timed ./client-pypeln-io 10000
Memory usage: 51940KB	Time: 18.19 seconds

I updated my blog post to refer to yours. Fantastic work!

@cgarciae
Copy link
Owner Author

Excellent @andybalaam !
Changed Balaam to Andy on the post :)
Thanks for all the feedback and reference on your blog!

@andybalaam
Copy link

Thank you for improving on what I tried to do :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants