-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tracemalloc utility view to API and improve k6 scripts for memory leak debugging #3046
Conversation
It does seem to kind of stabilise around 982MiB on the first run, but running the script a second time gets it creeping up again. I've got it up to 1GB now. There are further improvements to the script to make it more similar to production behaviour. It'd be good to add things like searching for heaps of sources the way the Gutenberg integration does, for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trace view seems valuable enough to me to make permanent (take out of the if
), move to its own file (instead of the URL config) and keep even after this project (enabled conditionally with the env var).
@@ -9,6 +9,15 @@ | |||
|
|||
import os | |||
|
|||
|
|||
if os.getenv("ENABLE_TRACE_VIEW", "0") == "1": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using decouple.config
would allow direct cast to bool
.
from decouple import config
TRACING = config('ENABLE_TRACE_VIEW', default=False, cast=bool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and I've confirmed the trace route doesn't work when the setting is enabled, but does work as expected when enabled! I'm not quite sure how to read the tracemalloc output, maybe we could implement something like the pretty top example from the docs?
k6-local: | ||
@API_TOKEN="" just k6 http://localhost:50280/v1/ --net=host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent, thanks for adding it!
@@ -1,6 +1,6 @@ | |||
import { group } from "k6" | |||
import { searchBy } from "./search.js" | |||
import { getProvider, getRandomWord } from "./utils.js" | |||
import { getProvider } from "./utils.js" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 40-41:
provider_image_page_20: createScenario("images", "20", "searchByProvider"),
provider_image_page_500: createScenario("images", "500", "searchByProvider"),
Drafting and set priority to low. It'd be nice to get the load testing script working reliably, but this quick job I've done does only a very small amount of what would actually make these scripts more useful today. Because we think we have a good handle on what caused the memory leak (#3047), this isn't as high priority anymore. |
Fixes
Related to #3028
Description
Two things in this PR help to reproduce and debug the memory leak:
/_trace/
view to the API to make it easy to see tracemalloc changes over time.With the k6 script I'm able to consistently reproduce the leak locally. The API's memory usage climbs seemingly indefinitely now. To do so:
just down -v && just api/init
. Deleting the volumes ensures you do not have any dead links cached in redis so that dead link filtering is exercised to the fullest extent.cd
intoutilities/load_testing
and runjust k6-local
docker stats
and observe the memory usage ofopenverse_web_1
You can run multiple
just k6-local
scripts in different terminals to increase the load but I haven't noticed a big difference when doing this, probably because it's a single worker locally. Speaking of which, you can increase the number of workers to 4 by removing-w 1
indocker-compose.yml
for theweb
service'scommand
. I've been able to observe jumps as big as 20MiB within 2 seconds when I do this locally.If you enable the trace view (set
ENABLE_TRACE_VIEW=1
in youapi/.env
), you can watchdocker stats
for when you see a jump in memory usage and then make a request to/_trace/
on your local API. With multiple workers this isn't reliable, so reduce the workers back to 1 if you increased them before (or tweak the implementation if you have ideas for how to make it work well).Testing Instructions
See above.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin