-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scorecard: Use Sanic, cache fully-shaped GitHub response #5242
Conversation
… bit faster than flask, but this is mostly a dry run testing ease of conversion from flask. templates have been modified to avoid favicon.ico 404, and formatting into 80 line width for user.html
…hey aren't imported. I would rather they make those imports an explicit requirement...
<title>Scorecard: User {{ user }}</title> | ||
<link rel="shortcut icon" href="#" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again add the favicon fix. The rest is HTML formatting, not critical if you prefer the style you already had
…ry would run twice. don't cache timestamp diff, just timestamp; remove unnecessary globals use from flask v. add comments clarifying race conditions, shared between flask and current version. improve favicon.ico handling for chrome
Fixed a few issues, should be ready to go. Works great, very fast, and I think is evidence that we can start using this approach for our other python web server efforts. Only small trickiness is with Kubernetes, but there are known solutions (run_thread_pool_executor or deque) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments, and a meta-comment:
-
I had looked over async http libraries and had preferred aiohttp over sanic because (1) "aiohttp" is a blessed aio library, (2) performance seemed comparable, (4) aiohttp seemed like a simpler solution which was attractive because the microservices are looking more and more like services and less like web servers (even more so moving all the rendering to the front end with the web app, the legacy version of scorecard using jinja is not the representative case). Did you look at aiohttp?
-
From the code:
Global variables that are modified ...
I don't want to have to think about shared state and locking. I want a shared-nothing architecture in the microservices where the only globals are true constants and threads communication by sending immutable data through queues.
-
Finally, a meta-comment. I started reviewing this when it was just ujson, I did a bit of research about json packages to understand your choices and when I came back, the PR had expanded with all the async stuff. I would have approved the ujson stuff. The async stuff could have been a separate PR. Nobody wants to review a moving target, so the scope of a change should be roughly frozen when you assign a PR and additional changes should be minimized and restricted to that scope. You're welcome to have an open PR with no reviewer if you're still fleshing out the scope, of course. Thanks!
I also created a Starlette branch; which may be preferable, as Sanic brings with it a bit of controversy and a bunch of errors generate on Techempower benchmarks. I took a brief look at the bench source didn't see an immediate issue, so worry a bit about. Sanic. Starlette is a light layer on top of Uvicorn, one of the leading ASGI web servers. Similar to Sanic/Flaks interface: https://www.techempower.com/benchmarks/#section=data-r17&hw=ph&test=fortune&l=zijzen-1 Branch here, can issue a separate pr and close this one: https://github.com/akotlar/hail/tree/scorecard-starlette |
In response:
Meta comment. Ok. I didn't think it had been looked at, and expanded what it did pretty quickly, as I realized that ujson wasn't helping much. I can make a ujson-specific pr, but my goal was to test async library implementations in a simple applications, since we need a long term strategy for python web stuff that isn't Flask (or not just Flask) |
Can you point me to the benchmarks? The only head-to-head one I found was this: https://github.com/samuelcolvin/aiohttp-vs-sanic-vs-japronto where sanic is faster for smaller examples but aiohttp performs MUCH better (3x latency, 10x throughput) when interacting with a db. |
OK, great, thanks for clarifying! |
Sure, this is something I am focusing on improving moving forward. Thanks for your feedback. |
Sanic was chosen because it's a near drop-in for Flask, and is the most popular afaik library built around asyncio. It has 2x as many stars as aiohttp, slightly more forks. The original motivation for considering aiohttp: In short, one of the creators of asyncio discusses uvloop performance relative to other libraries. They key is: "However, the performance bottleneck in aiohttp turned out to be its HTTP parser, which is so slow, that it matters very little how fast the underlying I/O library is." As an aside I've spent some time reading about this over the last ~month, and besides relatively consistent messaging about the messiness of Python's ecosystem, performance and user experience are deeply important to me, so when I read things like: "I don’t think performance matter. I think asgi does not matter in 2018 in general. Usability and complexity matters. Python is not very good choice for high performance system in any case...For me high performance python is a fantasy, but i don’t do aiohttp/python anymore. In the end it is up to @asvetlov" from one of the creators of aiohttp, I'm not encouraged about the long-term health of the project. aio-libs/aiohttp#2902 In the second branch related to this pull request, linked above, I chose Starlette, and it is a thin abstraction, nearly identical performance, over Uvicorn + httptools, which were both written by Yury Selivanov, the asyncio person I mention above. Starlette and Uvicorn are currently the fastest options, (Sanic isn't tested), by a relatively large margin, on Techempower's benchmarks. If there is a reference standard benchmark of http library performance, Techempower is it: https://www.techempower.com/benchmarks/#section=data-r17 . Starlette is something like base Go performance (though 1/5-1/10th the performance of Go's fasthttp library for simple responses, and much closer for anything involving database calls) Sanic also uses httptools and uvloop, but has more stuff.. so yeah maybe a bit slower than Starlette, or not, but the diff will probably be small. Regarding the benchmark you linked, it is benchmarking the power of sleep. There is something deeply wrong with their results. Sanic has 1800 timeouts, vs 200 for aiohttp, and 3x the connection errors. Fine, so Sanic is super slow. But look at their non-db tests. Sanic is >2x as fast, 0 timeouts. They aren't using anything Sanic specific to query the database, and both use the same event loop. Adding asyncio Postgres to two programs that fundamentally differ mainly in how the handle http requests and responses, shows the one that is faster at http requests/responses (Sanic) becoming much slower, and in fact reversing its relationship to Aiohttp. This is strange to say the least. I was really curious about this, so I ran the bench. First, I upgraded Sanic to a recent version. Then I ran their test. In short, their results were not what I found. Sanic is 50% faster, and the timeouts are what you'd expect. 26 timeouts for Sanic, 45 for aiohttp. Sanic Run 1: Sanic Run 2: Sanic Run 3 (very large background task spike in last 1-2s of run): Aiohttp Run 1: Aiohttp Run 2: Aiohttp Run 3: Runs were interleaved two reduce chance that the programs would benefit from caching across runs. First run for each had somewhat more background tasks open. Starlette will be run later. |
Thanks for mentioning me, very interesting reading. You know, framework comparison is the very biased matter. Monthly download count is different: 4,7M for aiohttp vs 60K for Sanic. Sanic team is a champion in the library promotion, guys do their job perfectly well. Performance comparison is even harder. Sanic used to run multiple workers by default, aiohttp uses only one. On a real server it doesn't matter because usually the server is explicitly configured to run multiple web workers by gunicorn and (or) nginx config. Now Sanic switched to the single worker by default IIRC. aiohttp uses standard Very famous Tech Empower Benchmark demonstrates that sanic is faster than aiohttp on JSON serialization but cannot pass other tests. Please decide is it important or not. The last cherry: Sanic has super fast URL router because it caches matching results. The feature is extremely useful for getting awesome numbers with P.S. I don't know the best criteria for choosing. My advice is: look on user API and choose what you like. P.P.S. |
@asvetlov thanks for your reply, and for your work on the Sanic project! I was really curious about the Techempower issue. Do you know why Sanic, on past rounds failed to complete test subsets without error? I haven’t had much of a chance to look into that yet, but sanic-org/sanic#53 doesn’t divulge much, and my own attempts to give Sanic problems haven’t yielded anything worrisome (i.e asyncpg works great under 2000 simultaneous connection load, request standard deviation is about as tight as aiohttp, and number of extremes / timeouts is smaller than aiohttp). Techwmpower benchmark was on version 0.7, if not earlier (the linked file in the Techempower issue is 0.7), and that version may have been affected by the issue described here: sanic-org/sanic#1176 which seems to have been largely addressed. Edit: furthermore, other recent tests showed no significant issues with Sanic https://fgimian.github.io/blog/2018/06/05/python-api-framework-benchmarks/. Still the addressing the Techempower issues may help people feel more confident. |
To be clear, I'm working on aiohttp project, not Sanic. |
Gotcha, thanks, the trouble with reading your reply on the go. I had assumed you contributed to both from your reply. Glad to hear you're taking aiohttp performance seriously. Regarding flow-control, yep, that pr was merged sanic-org/sanic#1179. I haven't seen any further conversations from you or Sanic devs on the issue.
Part of my concerns over Sanic came from reading your post @ https://www.reddit.com/r/Python/comments/876msl/sanic_python_web_server_thats_written_to_die_fast/ ; this now seems outdated, and it would be interesting to hear the reply of a Sanic contributor. Re: "The last cherry: Sanic has super fast URL router because it caches matching results. The feature is extremely useful for getting awesome numbers with
Edit: To be clear, the bench mentioned above used 1 worker for Sanic and aiohttp, both using uvloop. Bench attached. |
This seems like a direction we don't want to go at the moment, closing. |
Use Sanic + ujson instead of Flask + flask.json, cache more, and fix missing favicon in HTML templates. Sanic should be 3+x faster than Flask, and ujson ~2-3x faster than the standard json lib.
I also optimize away the unnecessary re-generation of user_data (via get_users()) and json equivalent for /json.
This is deployed currently on scorecard.hail.is, and improves performance by ~20%, even for 1 single connection, and for that simple workload (response time from ~50ms to ~40ms)
Besides performance ( https://fgimian.github.io/blog/2018/06/05/python-api-framework-benchmarks/ ) Sanic also builds in a production-oriented web server, so need for WSGI or aWSGI, Gunicorn, etc. Can easily run multiple workers if desired
This serves as a demonstration or migration to faster web frameworks, and in particular to uvloop-based asyncio implementations.