Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory-leak investigation #276

Closed
gi0baro opened this issue Apr 17, 2024 Discussed in #268 · 14 comments
Closed

Possible memory-leak investigation #276

gi0baro opened this issue Apr 17, 2024 Discussed in #268 · 14 comments
Labels
asgi Issue related to ASGI protocol bug Something isn't working help wanted Extra attention is needed
Milestone

Comments

@gi0baro
Copy link
Member

gi0baro commented Apr 17, 2024

Currently suspected protocol: ASGI.
Verification on RSGI and WSGI should be done as well.

Discussed in #268

Originally posted by cpoppema April 12, 2024
For two projects I replaced uvicorn with uvloop with granian and my monitoring tools show an increase in memory usage. I am wondering if this is something I have to accept with granian - I don't know if it'll flatten out at some point - or if there is an underlying issue I might be able to help debug. Benchmarking is all about request numbers but does not mention any memory increase, so I suspect an issue.

Figures below are from 2 services running fastapi + pydantic + sqlalchemy.

service 1, graph is from 7 days:

cpu before (uvicorn)/after (granian)
2024-04-12-1712919886_screenshot

requests before (uvicorn)/after (granian)
2024-04-12-1712919843_screenshot

memory before (uvicorn)/after (granian)
2024-04-12-1712919806_screenshot

service 2, graph is from 2 days:

cpu before (uvicorn)/after (granian)
2024-04-12-1712920252_screenshot

requests before (uvicorn)/after (granian)
2024-04-12-1712920168_screenshot

memory before (uvicorn)/after (granian)
2024-04-12-1712920305_screenshot

@gi0baro gi0baro added bug Something isn't working asgi Issue related to ASGI protocol labels Apr 17, 2024
@gi0baro gi0baro added this to the 1.3 milestone Apr 17, 2024
@gi0baro gi0baro added the help wanted Extra attention is needed label Apr 17, 2024
@gi0baro gi0baro changed the title Possible memory-leak analysis Possible memory-leak investigation Apr 17, 2024
@gi0baro
Copy link
Member Author

gi0baro commented Apr 22, 2024

@cpoppema jfyi I did some investigation on this during the weekend using https://github.com/bloomberg/memray, but I wasn't able to reproduce the memory increase trend you had on my machine (macOS). I need to schedule a similar test on a Linux machine, as it might be a target-specific issue.

@cpoppema
Copy link

@cpoppema jfyi I did some investigation on this during the weekend using https://github.com/bloomberg/memray, but I wasn't able to reproduce the memory increase trend you had on my machine (macOS). I need to schedule a similar test on a Linux machine, as it might be a target-specific issue.

If you tell me what flags to run, I can run memray no problem. As a quick example, if I wrap my service in memray:

docker compose run --rm -p 8000:8000 server python -m memray run --native --trace-python-allocators --follow-fork -o /app/memray.bin  /usr/local/bin/granian --interface asgi service.server:app --host=0.0.0.0 --loop uvloop --reload

and then use siege to run for 5 minutes.

$ siege -b -t 5m -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" http://127.0.0.1:8000/some_endpoint
{	"transactions":			        2833,
	"availability":			      100.00,
	"elapsed_time":			      299.22,
	"data_transferred":		       10.96,
	"response_time":		        2.63,
	"transaction_rate":		        9.47,
	"throughput":			        0.04,
	"concurrency":			       24.88,
	"successful_transactions":	        2833,
	"failed_transactions":		           0,
	"longest_transaction":		       33.97,
	"shortest_transaction":		        0.81
}

For 1.2.2 this prints 1.0 GiB

docker-compose run --rm --no-deps server python -m memray stats /app/memray.bin
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Location                                                                                          ┃  <Total Memory> ┃  Total Memory % ┃  Own Memory ┃  Own Memory % ┃  Allocation Count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ _PyEval_Vector at Python/ceval.c                                                                  │         1.000GB │          99.40% │      0.000B │         0.00% │                 3 │
...

For 1.2.3 this prints only 5.x MiB

docker compose run --rm --no-deps server python -m memray table /app/memray.bin
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Location                                                                                          ┃  <Total Memory> ┃  Total Memory % ┃  Own Memory ┃  Own Memory % ┃  Allocation Count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ _PyEval_Vector at Python/ceval.c                                                                  │         5.662MB │          90.83% │      0.000B │         0.00% │             46325 │
...

More stats:

docker compose run --rm --no-deps server python -m memray stats /app/memray.bin
📏 Total allocations:
	452844

📦 Total memory allocated:
	78.369MB

📊 Histogram of allocation size:
	min: 0.000B
	---------------------------------------------
	< 6.000B   :   1410 ▇
	< 18.000B  :  30607 ▇▇▇▇
	< 57.000B  : 195094 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 174.000B : 133406 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 533.000B :  82120 ▇▇▇▇▇▇▇▇▇▇▇
	< 1.593KB  :   7292 ▇
	< 4.869KB  :   1131 ▇
	< 14.886KB :    999 ▇
	< 45.502KB :    696 ▇
	<=139.087KB:     89 ▇
	---------------------------------------------
	max: 139.088KB

📂 Allocator type distribution:
	 PYMALLOC_MALLOC: 375169
	 PYMALLOC_CALLOC: 37466
	 PYMALLOC_REALLOC: 24259
	 MALLOC: 9738
	 REALLOC: 6012
	 MMAP: 192
	 CALLOC: 6
	 POSIX_MEMALIGN: 2

🥇 Top 5 largest allocating locations (by size):
	- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 12.287MB
	- <stack trace unavailable> -> 10.930MB
	- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 5.457MB
	- _create_fn:/usr/local/lib/python3.11/dataclasses.py:433 -> 5.451MB
	- get_data:<frozen importlib._bootstrap_external>:1131 -> 3.858MB

🥇 Top 5 largest allocating locations (by number of allocations):
	- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 81023
	- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 41467
	- _create_fn:/usr/local/lib/python3.11/dataclasses.py:433 -> 38391
	- <stack trace unavailable> -> 15700
	- _path_stat:<frozen importlib._bootstrap_external>:147 -> 14385                                                                      

but RES memory in htop did keep growing. Starting with 134M, ending up at around 612M after siege is finished after 5 minutes.

2024-04-22-1713792792_screenshot

2024-04-22-1713792930_screenshot

2024-04-22-1713793031_screenshot

@gi0baro
Copy link
Member Author

gi0baro commented Apr 22, 2024

@cpoppema the difference between 1.2.2 and 1.2.3 should be due to jemalloc in place of mimalloc (the latter reserves a bigger amount of memory for its arenas, probably).

Running memray that way is perfectly fine. It is quite strange to me it doesn't detect the difference in memory reported by htop.
Can you attach the .bin files so I can take a look at the data?
Also, current master uses PyO3 0.21, which probably changed some memory-related stuff. Might be worth trying also with that and dump the relevant memray output as well.

Thank you for the patience 🙏

@cpoppema
Copy link

It generates 3 .bin files per run (~3M, ~88M, 1.9G) here are the smallest of each run (1.2.3 and 1.3.0)

@gi0baro
Copy link
Member Author

gi0baro commented Apr 22, 2024

@cpoppema I'm afraid those files only contains data from the main process, not the actual workers. Would be nice to have also the other dumps (maybe compressed?).
It would be helpful to remove the --reload option when running with memray to avoid all the watchfiles allocations and relevant threads, also probably just 30s/1m of data would be enough to spot leaks.

@cpoppema
Copy link

@gi0baro Right. I've re-uploaded them as .tar.xz with just 30 seconds of siege. Looks like there is only two files without the --reload flag 🙂

@gi0baro
Copy link
Member Author

gi0baro commented Apr 22, 2024

@cpoppema so, given that we trust memray data:

 ❯ memray stats -n 25 ~/Downloads/memray.bin.8
📏 Total allocations:
	21511327

📦 Total memory allocated:
	4.234GB

📊 Histogram of allocation size:
	min: 0.000B
	-----------------------------------------------
	< 4.000B   :    49702 ▇
	< 21.000B  :   439303 ▇
	< 96.000B  : 12719302 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 445.000B :  7996012 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 1.998KB  :   247030 ▇
	< 9.184KB  :    24465 ▇
	< 42.195KB :    23474 ▇
	< 193.859KB:    12005 ▇
	< 890.660KB:       24 ▇
	<=3.996MB  :       10 ▇
	-----------------------------------------------
	max: 3.996MB

📂 Allocator type distribution:
	 PYMALLOC_MALLOC: 18461721
	 MALLOC: 1872067
	 PYMALLOC_REALLOC: 783705
	 PYMALLOC_CALLOC: 366234
	 MMAP: 17411
	 REALLOC: 10163
	 CALLOC: 26

🥇 Top 25 largest allocating locations (by size):
	- __init__:/usr/local/lib/python3.11/site-packages/pydantic/main.py:171 -> 1.344GB
	- _iter_file_finder_modules:/usr/local/lib/python3.11/pkgutil.py:168 -> 127.991MB
	- digest:/usr/local/lib/python3.11/hmac.py:159 -> 105.208MB
	- _spawn_asgi_lifespan_worker:/usr/local/lib/python3.11/site-packages/granian/server.py:230 -> 104.763MB
	- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 70.211MB
	- validate_core_schema:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:570 -> 47.243MB
	- _init_hmac:/usr/local/lib/python3.11/hmac.py:67 -> 44.418MB
	- _walk:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:202 -> 44.399MB
	- <stack trace unavailable> -> 44.268MB
	- get_data:<frozen importlib._bootstrap_external>:1131 -> 43.663MB
	- dump_python:/usr/local/lib/python3.11/site-packages/pydantic/type_adapter.py:333 -> 37.683MB
	- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 34.613MB
	- search:/usr/local/lib/python3.11/re/__init__.py:176 -> 33.504MB
	- model_validate:/usr/local/lib/python3.11/site-packages/pydantic/main.py:509 -> 30.636MB
	- walk:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:199 -> 30.465MB
	- digest:/usr/local/lib/python3.11/hmac.py:158 -> 28.764MB
	- _create_fn:/usr/local/lib/python3.11/dataclasses.py:433 -> 27.915MB
	- _gen_cache_key:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:246 -> 25.330MB
	- _gen_cache_key:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:272 -> 23.599MB
	- sub:/usr/local/lib/python3.11/re/__init__.py:185 -> 23.234MB
	- <listcomp>:/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/loading.py:226 -> 22.577MB
	- new:/usr/local/lib/python3.11/hmac.py:184 -> 22.522MB
	- create_schema_validator:/usr/local/lib/python3.11/site-packages/pydantic/plugin/_schema_validator.py:49 -> 21.853MB
	- __init__:/usr/local/lib/python3.11/typing.py:830 -> 21.201MB
	- <genexpr>:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:732 -> 20.580MB

🥇 Top 25 largest allocating locations (by number of allocations):
	- digest:/usr/local/lib/python3.11/hmac.py:159 -> 1475280
	- _spawn_asgi_lifespan_worker:/usr/local/lib/python3.11/site-packages/granian/server.py:230 -> 1117338
	- _init_hmac:/usr/local/lib/python3.11/hmac.py:67 -> 656003
	- _gen_cache_key:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:246 -> 531626
	- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 468663
	- _gen_cache_key:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:272 -> 347222
	- __init__:/usr/local/lib/python3.11/site-packages/pydantic/main.py:171 -> 261995
	- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 229360
	- _create_fn:/usr/local/lib/python3.11/dataclasses.py:433 -> 196035
	- <genexpr>:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:732 -> 167120
	- new:/usr/local/lib/python3.11/hmac.py:184 -> 164001
	- digest:/usr/local/lib/python3.11/hmac.py:158 -> 163921
	- _walk:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:202 -> 159390
	- get:/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/identity.py:222 -> 159268
	- validate_core_schema:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:570 -> 155330
	- <listcomp>:/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/loading.py:226 -> 140308
	- _populate_full:/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/loading.py:1323 -> 131341
	- _filter_on_values:/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/row.py:176 -> 119448
	- search:/usr/local/lib/python3.11/re/__init__.py:176 -> 116985
	- from_model:/app/service/core/entities.py:57 -> 113077
	- _op:/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/row.py:206 -> 108072
	- visit_has_cache_key_list:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:732 -> 107191
	- __eq__:/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/row.py:236 -> 102385
	- _path_stat:<frozen importlib._bootstrap_external>:147 -> 99823
	- isdir:<frozen genericpath>:42 -> 94415

my suspect for this to happen only in Granian

__init__:/usr/local/lib/python3.11/site-packages/pydantic/main.py:171 -> 1.344GB

is that probably Python is not deallocating objects between requests. Which is super-weird, but at least now I have a starting point to do some additional checks. Gonna post updates as soon as I make new discoveries.

@Zerotask
Copy link

Zerotask commented Apr 23, 2024

Interestingly gunicorn also has a max-requests option https://docs.gunicorn.org/en/stable/settings.html#max-requests

(...) This is a simple method to help limit the damage of memory leaks.

We also noticed memory leaks in our django application (with WSGI and ASGI), but I haven't got time to investigate it yet. Our Kubernetes pods will just restart from time to time. Restarting granian only instead would be faster and more efficient.

Related to #34

@sciyoshi
Copy link

Possibly related to #252 (comment).

@gi0baro
Copy link
Member Author

gi0baro commented Apr 26, 2024

@cpoppema this commit might be worth a test round, at least to see if @sciyoshi is right about the root cause.

@cpoppema
Copy link

cpoppema commented Apr 26, 2024

@cpoppema this commit might be worth a test round, at least to see if @sciyoshi is right about the root cause.

That looks very promising indeed! I'm not sure about the implications of that change, but comparing observed memory usage it seems to have fixed it 💪 , see graphs below 🎉🎉🎉 To generate the graphs I used mprof run --include-children --multiprocess <server> and mprof plot (pip install memory-profiler matplotlib) which shows memory over time after 5 minutes of sending requests with siege).

baseline granian 1.2.3:

granian-1-2-3-siege-5m

in contrast with uvicorn:

uvicorn-siege-5m

with granian master:

granian-1-3-1-siege-5m

@gi0baro
Copy link
Member Author

gi0baro commented Apr 27, 2024

Closing this as per changes in 1.3.
Thanks everybody who participated in the debugging!

@gi0baro gi0baro closed this as completed Apr 27, 2024
@Faolain
Copy link

Faolain commented Apr 29, 2024

Apologies in advance for commenting on an already closed issue however I am wondering if this should address the memory leak (Edit: Updated to 1.3 and still see the memory leak issue described above ) that is being caused by memory not being released by python itself described here w.r.t malloc:

Should I create a new issue for this in case there is something that can be done in granian?

@gi0baro
Copy link
Member Author

gi0baro commented Apr 29, 2024

Apologies in advance for commenting on an already closed issue however I am wondering if this should address the memory leak that is being caused by memory not being released by python itself described here w.r.t malloc:

Both those seems unrelated to this issue (I also participated in the uvicorn discussion you mentioned). I also frankly doubt the discussion in uvicorn has anything to do with the CPython one, given no ssl/tls is used. But I definitely have no time to investigate those.

Edit: Updated to 1.3 and still see the memory leak issue described above

Seems a bit unrelated as well. Can you please open up a new issue with numbers, tested alternatives and a MRE?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asgi Issue related to ASGI protocol bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants