optimize io module #9

cgarciae · 2018-09-20T17:33:59Z

http://www.artificialworlds.net/blog/2017/06/12/making-100-million-requests-with-python-aiohttp/comment-page-1/#comment-186106

andybalaam · 2018-09-21T08:25:41Z

Using the benchmark described here: http://www.artificialworlds.net/blog/2017/06/12/making-100-million-requests-with-python-aiohttp/ to run this program

#!/usr/bin/env python3.6

from aiohttp import ClientSession
from pypeln import io
import asyncio
import sys

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

async def main():
    r = int(sys.argv[1])
    url = "http://localhost:8080/{}"

    async with ClientSession() as session:
        data = range(r)
        await io.each(lambda i: fetch(url, session), data, workers=1000, run = False)


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

It runs much slower than the async client described in that article:

$ ./timed ./client-pypeln 10000
Memory usage: 42960KB	Time: 157.83 seconds

andybalaam · 2018-09-21T08:27:57Z

This is still the case this morning even though I ran

pip3 install -U git+https://github.com/cgarciae/pypeln@develop

at about 2018-09-21 08:00 UTC.

Comparison with the numbers from that article:

$ ./timed ./client-async-sem 10000
Memory usage: 77912KB	Time: 18.10 seconds
$ ./timed ./client-async-as-completed 10000
Memory usage: 46780KB	Time: 17.86 seconds

andybalaam · 2018-09-21T08:42:40Z

$ ./timed python3.6 -m cProfile -o prf.txt ./client-pypeln 10000
Memory usage: 45884KB	Time: 156.17 seconds
$ python3 -c "from pstats import Stats; Stats('prf.txt').sort_stats('cumulative').print_stats()" | head -n 50
Fri Sep 21 09:37:20 2018    prf.txt

         8954144 function calls (8928893 primitive calls) in 156.028 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    288/1    0.012    0.000  156.052  156.052 {built-in method builtins.exec}
        1    0.000    0.000  156.052  156.052 ./client-pypeln:3(<module>)
        1    0.000    0.000  155.820  155.820 /usr/lib/python3.6/asyncio/base_events.py:433(run_until_complete)
        1    0.024    0.024  155.820  155.820 /usr/lib/python3.6/asyncio/base_events.py:405(run_forever)
    29866    0.348    0.000  155.796    0.005 /usr/lib/python3.6/asyncio/base_events.py:1336(_run_once)
    29866    0.100    0.000  146.737    0.005 /usr/lib/python3.6/selectors.py:428(select)
    29866  146.595    0.005  146.595    0.005 {method 'poll' of 'select.epoll' objects}
    67027    0.141    0.000    8.547    0.000 /usr/lib/python3.6/asyncio/events.py:143(_run)
    27701    0.069    0.000    6.944    0.000 /home/andrebal/.local/lib/python3.6/site-packages/pypeln/io.py:218(f_task)
    27701    0.060    0.000    6.847    0.000 ./client-pypeln:8(fetch)
    27694    0.069    0.000    6.599    0.000 /usr/lib/python3/dist-packages/aiohttp/client.py:778(__aenter__)
    27694    0.366    0.000    6.526    0.000 /usr/lib/python3/dist-packages/aiohttp/client.py:179(_request)
    10000    0.098    0.000    1.340    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:162(__init__)
    40002    0.122    0.000    1.326    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:144(__init__)
    30000    0.063    0.000    1.305    0.000 /usr/lib/python3/dist-packages/idna/core.py:286(ulabel)
    50000    0.074    0.000    1.300    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:48(__get__)
    30000    0.278    0.000    1.184    0.000 /usr/lib/python3/dist-packages/idna/core.py:231(check_label)
    20000    0.039    0.000    1.121    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:419(host)
    10000    0.023    0.000    0.976    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:644(_make_netloc)
    20000    0.064    0.000    0.975    0.000 /usr/lib/python3/dist-packages/idna/core.py:364(decode)
    10000    0.033    0.000    0.953    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:626(_encode_host)
    17694    0.077    0.000    0.912    0.000 /usr/lib/python3/dist-packages/aiohttp/connector.py:360(connect)
    10000    0.141    0.000    0.789    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:464(send)
    10000    0.042    0.000    0.731    0.000 /usr/lib/python3/dist-packages/idna/core.py:335(encode)
    10007    0.031    0.000    0.710    0.000 /usr/lib/python3.6/asyncio/selector_events.py:719(_read_ready)
    10000    0.021    0.000    0.693    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:236(update_host)
    10000    0.028    0.000    0.684    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:220(connection_key)
    20000    0.146    0.000    0.679    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:662(start)
   270000    0.292    0.000    0.592    0.000 /usr/lib/python3/dist-packages/idna/intranges.py:38(intranges_contain)
    10000    0.009    0.000    0.582    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:224(host)
    10007    0.086    0.000    0.521    0.000 /usr/lib/python3/dist-packages/aiohttp/client_proto.py:140(data_received)
    10000    0.014    0.000    0.466    0.000 /usr/lib/python3/dist-packages/idna/core.py:258(alabel)
     7401    0.003    0.000    0.415    0.000 /home/andrebal/.local/lib/python3.6/site-packages/pypeln/io.py:216(_each)
     7401    0.023    0.000    0.412    0.000 /home/andrebal/.local/lib/python3.6/site-packages/pypeln/io.py:99(_run_tasks)
    10000    0.130    0.000    0.393    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:272(update_auto_headers)
    10000    0.009    0.000    0.359    0.000 /usr/lib/python3/dist-packages/aiohttp/streams.py:142(on_eof)
    10000    0.041    0.000    0.353    0.000 /usr/lib/python3/dist-packages/aiohttp/http_writer.py:94(write_headers)
    10000    0.035    0.000    0.350    0.000 /usr/lib/python3/dist-packages/aiohttp/client_reqrep.py:717(_response_eof)
    10007    0.237    0.000    0.341    0.000 {method 'feed_data' of 'aiohttp._http_parser.HttpParser' objects}
    60101    0.103    0.000    0.270    0.000 /usr/lib/python3.6/urllib/parse.py:154(hostname)
    10000    0.025    0.000    0.269    0.000 /usr/lib/python3/dist-packages/aiohttp/connector.py:110(release)
    50101    0.040    0.000    0.263    0.000 /usr/lib/python3/dist-packages/yarl/__init__.py:408(raw_host)
    17398    0.030    0.000    0.255    0.000 /home/andrebal/.local/lib/python3.6/site-packages/pypeln/utils_async.py:15(put)

Here is the raw profile: prf.txt

cgarciae · 2018-09-21T16:27:37Z

@andybalaam thanks for looking into this. I am not able to reproduce your published results. I am getting the following numbers in which all 3 clients have similar performance and are consistent with the pypeln benchmark you are showing:

client-async-sem

➜ bash timed.sh python client-async-sem.py 10_000
Memory usage: 73760KB   Time: 151.98 seconds    CPU usage: 5%

Uses more more memory as expected.

client-async-as-completed

➜ bash timed.sh python client-async-as-completed.py 10_000
Memory usage: 48472KB   Time: 154.81 seconds    CPU usage: 100%

Uses a lot of CPU, this might be because of the use of asyncio.sleep(0), slightly slower but more memory efficient.

client-pypeln-io

➜ bash timed.sh python client-pypeln-io.py 10_000
Memory usage: 50720KB   Time: 151.93 seconds    CPU usage: 5%

By the numbers its pretty good balanced.

As I said before on a comment, your original code had to be slightly modified because now ClientSession only admits Async Context Managers. I made minimal modifications to be able to run them, you can check the sources here:

https://github.com/cgarciae/pypeln/tree/develop/benchmarks/100_million_downloads

I am using

➜ python --version
Python 3.6.3 :: Anaconda, Inc.

cgarciae · 2018-09-21T16:55:32Z

@andybalaam I don't know if its a coincidence but I am getting times in the 17-20 seconds range on all 3 clients with 1_000 requests:

➜ bash timed.sh python client-async-sem.py 1_000
Memory usage: 45752KB   Time: 17.57 seconds     CPU usage: 5%

➜ bash timed.sh python client-async-as-completed.py 1_000
Memory usage: 45084KB   Time: 18.17 seconds     CPU usage: 100%

➜ bash timed.sh python client-pypeln-io.py 1_000
Memory usage: 46476KB   Time: 17.60 seconds     CPU usage: 4%

Maybe you added an extra 0 by accident on the blog and published 10000 instead of 1000?

cgarciae · 2018-09-22T21:56:10Z

@andybalaam figured it out! It was the limit in the asyncio.TCPConnector which is 100 by default.

cgarciae · 2018-09-23T15:51:52Z

I made a post about this:
https://medium.com/@cgarciae/making-an-infinite-number-of-requests-with-python-aiohttp-pypeln-3a552b97dc95

andybalaam · 2018-09-24T11:07:40Z

Awesome! By the way I am usually referred to as "Andy".

andybalaam · 2018-09-24T11:18:07Z

Woohoo!

$ ./timed ./client-pypeln-io 10
Memory usage: 29080KB	Time: 3.29 seconds
$ ./timed ./client-pypeln-io 100
Memory usage: 30532KB	Time: 3.32 seconds
$ ./timed ./client-pypeln-io 1000
Memory usage: 48720KB	Time: 4.90 seconds
$ ./timed ./client-pypeln-io 10000
Memory usage: 51940KB	Time: 18.19 seconds

I updated my blog post to refer to yours. Fantastic work!

cgarciae · 2018-09-24T15:51:53Z

Excellent @andybalaam !
Changed Balaam to Andy on the post :)
Thanks for all the feedback and reference on your blog!

andybalaam · 2018-09-25T15:12:53Z

Thank you for improving on what I tried to do :-)

cgarciae closed this as completed Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize io module #9

optimize io module #9

cgarciae commented Sep 20, 2018

andybalaam commented Sep 21, 2018

andybalaam commented Sep 21, 2018

andybalaam commented Sep 21, 2018

cgarciae commented Sep 21, 2018 •

edited

Loading

cgarciae commented Sep 21, 2018 •

edited

Loading

cgarciae commented Sep 22, 2018

cgarciae commented Sep 23, 2018

andybalaam commented Sep 24, 2018

andybalaam commented Sep 24, 2018

cgarciae commented Sep 24, 2018

andybalaam commented Sep 25, 2018

optimize io module #9

optimize io module #9

Comments

cgarciae commented Sep 20, 2018

andybalaam commented Sep 21, 2018

andybalaam commented Sep 21, 2018

andybalaam commented Sep 21, 2018

cgarciae commented Sep 21, 2018 • edited Loading

cgarciae commented Sep 21, 2018 • edited Loading

cgarciae commented Sep 22, 2018

cgarciae commented Sep 23, 2018

andybalaam commented Sep 24, 2018

andybalaam commented Sep 24, 2018

cgarciae commented Sep 24, 2018

andybalaam commented Sep 25, 2018

cgarciae commented Sep 21, 2018 •

edited

Loading

cgarciae commented Sep 21, 2018 •

edited

Loading