Possible memory-leak investigation #276

gi0baro · 2024-04-17T10:22:02Z

Currently suspected protocol: ASGI.
Verification on RSGI and WSGI should be done as well.

Discussed in #268

^{Originally posted by cpoppema April 12, 2024}
For two projects I replaced uvicorn with uvloop with granian and my monitoring tools show an increase in memory usage. I am wondering if this is something I have to accept with granian - I don't know if it'll flatten out at some point - or if there is an underlying issue I might be able to help debug. Benchmarking is all about request numbers but does not mention any memory increase, so I suspect an issue.

Figures below are from 2 services running fastapi + pydantic + sqlalchemy.

service 1, graph is from 7 days:

cpu before (uvicorn)/after (granian)

requests before (uvicorn)/after (granian)

memory before (uvicorn)/after (granian)

service 2, graph is from 2 days:

cpu before (uvicorn)/after (granian)

requests before (uvicorn)/after (granian)

memory before (uvicorn)/after (granian)

gi0baro · 2024-04-22T11:36:43Z

@cpoppema jfyi I did some investigation on this during the weekend using https://github.com/bloomberg/memray, but I wasn't able to reproduce the memory increase trend you had on my machine (macOS). I need to schedule a similar test on a Linux machine, as it might be a target-specific issue.

cpoppema · 2024-04-22T14:06:54Z

@cpoppema jfyi I did some investigation on this during the weekend using https://github.com/bloomberg/memray, but I wasn't able to reproduce the memory increase trend you had on my machine (macOS). I need to schedule a similar test on a Linux machine, as it might be a target-specific issue.

If you tell me what flags to run, I can run memray no problem. As a quick example, if I wrap my service in memray:

docker compose run --rm -p 8000:8000 server python -m memray run --native --trace-python-allocators --follow-fork -o /app/memray.bin  /usr/local/bin/granian --interface asgi service.server:app --host=0.0.0.0 --loop uvloop --reload

and then use siege to run for 5 minutes.

$ siege -b -t 5m -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" http://127.0.0.1:8000/some_endpoint
{	"transactions":			        2833,
	"availability":			      100.00,
	"elapsed_time":			      299.22,
	"data_transferred":		       10.96,
	"response_time":		        2.63,
	"transaction_rate":		        9.47,
	"throughput":			        0.04,
	"concurrency":			       24.88,
	"successful_transactions":	        2833,
	"failed_transactions":		           0,
	"longest_transaction":		       33.97,
	"shortest_transaction":		        0.81
}

For 1.2.2 this prints 1.0 GiB

docker-compose run --rm --no-deps server python -m memray stats /app/memray.bin
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Location                                                                                          ┃  <Total Memory> ┃  Total Memory % ┃  Own Memory ┃  Own Memory % ┃  Allocation Count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ _PyEval_Vector at Python/ceval.c                                                                  │         1.000GB │          99.40% │      0.000B │         0.00% │                 3 │
...

For 1.2.3 this prints only 5.x MiB

docker compose run --rm --no-deps server python -m memray table /app/memray.bin
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Location                                                                                          ┃  <Total Memory> ┃  Total Memory % ┃  Own Memory ┃  Own Memory % ┃  Allocation Count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ _PyEval_Vector at Python/ceval.c                                                                  │         5.662MB │          90.83% │      0.000B │         0.00% │             46325 │
...

More stats:

docker compose run --rm --no-deps server python -m memray stats /app/memray.bin
📏 Total allocations:
	452844

📦 Total memory allocated:
	78.369MB

📊 Histogram of allocation size:
	min: 0.000B
	---------------------------------------------
	< 6.000B   :   1410 ▇
	< 18.000B  :  30607 ▇▇▇▇
	< 57.000B  : 195094 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 174.000B : 133406 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 533.000B :  82120 ▇▇▇▇▇▇▇▇▇▇▇
	< 1.593KB  :   7292 ▇
	< 4.869KB  :   1131 ▇
	< 14.886KB :    999 ▇
	< 45.502KB :    696 ▇
	<=139.087KB:     89 ▇
	---------------------------------------------
	max: 139.088KB

📂 Allocator type distribution:
	 PYMALLOC_MALLOC: 375169
	 PYMALLOC_CALLOC: 37466
	 PYMALLOC_REALLOC: 24259
	 MALLOC: 9738
	 REALLOC: 6012
	 MMAP: 192
	 CALLOC: 6
	 POSIX_MEMALIGN: 2

🥇 Top 5 largest allocating locations (by size):
	- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 12.287MB
	- <stack trace unavailable> -> 10.930MB
	- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 5.457MB
	- _create_fn:/usr/local/lib/python3.11/dataclasses.py:433 -> 5.451MB
	- get_data:<frozen importlib._bootstrap_external>:1131 -> 3.858MB

🥇 Top 5 largest allocating locations (by number of allocations):
	- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 81023
	- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 41467
	- _create_fn:/usr/local/lib/python3.11/dataclasses.py:433 -> 38391
	- <stack trace unavailable> -> 15700
	- _path_stat:<frozen importlib._bootstrap_external>:147 -> 14385

but RES memory in htop did keep growing. Starting with 134M, ending up at around 612M after siege is finished after 5 minutes.

gi0baro · 2024-04-22T14:25:41Z

@cpoppema the difference between 1.2.2 and 1.2.3 should be due to jemalloc in place of mimalloc (the latter reserves a bigger amount of memory for its arenas, probably).

Running memray that way is perfectly fine. It is quite strange to me it doesn't detect the difference in memory reported by htop.
Can you attach the .bin files so I can take a look at the data?
Also, current master uses PyO3 0.21, which probably changed some memory-related stuff. Might be worth trying also with that and dump the relevant memray output as well.

Thank you for the patience 🙏

cpoppema · 2024-04-22T14:59:03Z

It generates 3 .bin files per run (~3M, ~88M, 1.9G) here are the smallest of each run (1.2.3 and 1.3.0)

gi0baro · 2024-04-22T15:15:40Z

@cpoppema I'm afraid those files only contains data from the main process, not the actual workers. Would be nice to have also the other dumps (maybe compressed?).
It would be helpful to remove the --reload option when running with memray to avoid all the watchfiles allocations and relevant threads, also probably just 30s/1m of data would be enough to spot leaks.

cpoppema · 2024-04-22T16:08:33Z

@gi0baro Right. I've re-uploaded them as .tar.xz with just 30 seconds of siege. Looks like there is only two files without the --reload flag 🙂

gi0baro · 2024-04-22T20:02:41Z

@cpoppema so, given that we trust memray data:

 ❯ memray stats -n 25 ~/Downloads/memray.bin.8
📏 Total allocations:
	21511327

📦 Total memory allocated:
	4.234GB

📊 Histogram of allocation size:
	min: 0.000B
	-----------------------------------------------
	< 4.000B   :    49702 ▇
	< 21.000B  :   439303 ▇
	< 96.000B  : 12719302 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 445.000B :  7996012 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 1.998KB  :   247030 ▇
	< 9.184KB  :    24465 ▇
	< 42.195KB :    23474 ▇
	< 193.859KB:    12005 ▇
	< 890.660KB:       24 ▇
	<=3.996MB  :       10 ▇
	-----------------------------------------------
	max: 3.996MB

📂 Allocator type distribution:
	 PYMALLOC_MALLOC: 18461721
	 MALLOC: 1872067
	 PYMALLOC_REALLOC: 783705
	 PYMALLOC_CALLOC: 366234
	 MMAP: 17411
	 REALLOC: 10163
	 CALLOC: 26

🥇 Top 25 largest allocating locations (by size):
	- __init__:/usr/local/lib/python3.11/site-packages/pydantic/main.py:171 -> 1.344GB
	- _iter_file_finder_modules:/usr/local/lib/python3.11/pkgutil.py:168 -> 127.991MB
	- digest:/usr/local/lib/python3.11/hmac.py:159 -> 105.208MB
	- _spawn_asgi_lifespan_worker:/usr/local/lib/python3.11/site-packages/granian/server.py:230 -> 104.763MB
	- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 70.211MB
	- validate_core_schema:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:570 -> 47.243MB
	- _init_hmac:/usr/local/lib/python3.11/hmac.py:67 -> 44.418MB
	- _walk:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:202 -> 44.399MB
	- <stack trace unavailable> -> 44.268MB
	- get_data:<frozen importlib._bootstrap_external>:1131 -> 43.663MB
	- dump_python:/usr/local/lib/python3.11/site-packages/pydantic/type_adapter.py:333 -> 37.683MB
	- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 34.613MB
	- search:/usr/local/lib/python3.11/re/__init__.py:176 -> 33.504MB
	- model_validate:/usr/local/lib/python3.11/site-packages/pydantic/main.py:509 -> 30.636MB
	- walk:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:199 -> 30.465MB
	- digest:/usr/local/lib/python3.11/hmac.py:158 -> 28.764MB
	- _create_fn:/usr/local/lib/python3.11/dataclasses.py:433 -> 27.915MB
	- _gen_cache_key:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:246 -> 25.330MB
	- _gen_cache_key:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:272 -> 23.599MB
	- sub:/usr/local/lib/python3.11/re/__init__.py:185 -> 23.234MB
	- <listcomp>:/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/loading.py:226 -> 22.577MB
	- new:/usr/local/lib/python3.11/hmac.py:184 -> 22.522MB
	- create_schema_validator:/usr/local/lib/python3.11/site-packages/pydantic/plugin/_schema_validator.py:49 -> 21.853MB
	- __init__:/usr/local/lib/python3.11/typing.py:830 -> 21.201MB
	- <genexpr>:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:732 -> 20.580MB

🥇 Top 25 largest allocating locations (by number of allocations):
	- digest:/usr/local/lib/python3.11/hmac.py:159 -> 1475280
	- _spawn_asgi_lifespan_worker:/usr/local/lib/python3.11/site-packages/granian/server.py:230 -> 1117338
	- _init_hmac:/usr/local/lib/python3.11/hmac.py:67 -> 656003
	- _gen_cache_key:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:246 -> 531626
	- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 468663
	- _gen_cache_key:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:272 -> 347222
	- __init__:/usr/local/lib/python3.11/site-packages/pydantic/main.py:171 -> 261995
	- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 229360
	- _create_fn:/usr/local/lib/python3.11/dataclasses.py:433 -> 196035
	- <genexpr>:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:732 -> 167120
	- new:/usr/local/lib/python3.11/hmac.py:184 -> 164001
	- digest:/usr/local/lib/python3.11/hmac.py:158 -> 163921
	- _walk:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:202 -> 159390
	- get:/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/identity.py:222 -> 159268
	- validate_core_schema:/usr/local/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:570 -> 155330
	- <listcomp>:/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/loading.py:226 -> 140308
	- _populate_full:/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/loading.py:1323 -> 131341
	- _filter_on_values:/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/row.py:176 -> 119448
	- search:/usr/local/lib/python3.11/re/__init__.py:176 -> 116985
	- from_model:/app/service/core/entities.py:57 -> 113077
	- _op:/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/row.py:206 -> 108072
	- visit_has_cache_key_list:/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/cache_key.py:732 -> 107191
	- __eq__:/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/row.py:236 -> 102385
	- _path_stat:<frozen importlib._bootstrap_external>:147 -> 99823
	- isdir:<frozen genericpath>:42 -> 94415

my suspect for this to happen only in Granian

__init__:/usr/local/lib/python3.11/site-packages/pydantic/main.py:171 -> 1.344GB

is that probably Python is not deallocating objects between requests. Which is super-weird, but at least now I have a starting point to do some additional checks. Gonna post updates as soon as I make new discoveries.

Zerotask · 2024-04-23T08:40:20Z

Interestingly gunicorn also has a max-requests option https://docs.gunicorn.org/en/stable/settings.html#max-requests

(...) This is a simple method to help limit the damage of memory leaks.

We also noticed memory leaks in our django application (with WSGI and ASGI), but I haven't got time to investigate it yet. Our Kubernetes pods will just restart from time to time. Restarting granian only instead would be faster and more efficient.

Related to #34

sciyoshi · 2024-04-25T14:46:17Z

Possibly related to #252 (comment).

gi0baro · 2024-04-26T16:25:47Z

@cpoppema this commit might be worth a test round, at least to see if @sciyoshi is right about the root cause.

cpoppema · 2024-04-26T17:53:06Z

@cpoppema this commit might be worth a test round, at least to see if @sciyoshi is right about the root cause.

That looks very promising indeed! I'm not sure about the implications of that change, but comparing observed memory usage it seems to have fixed it 💪 , see graphs below 🎉🎉🎉 To generate the graphs I used mprof run --include-children --multiprocess <server> and mprof plot (pip install memory-profiler matplotlib) which shows memory over time after 5 minutes of sending requests with siege).

baseline granian 1.2.3:

in contrast with uvicorn:

with granian master:

gi0baro · 2024-04-27T00:40:36Z

Closing this as per changes in 1.3.
Thanks everybody who participated in the debugging!

Faolain · 2024-04-29T17:24:46Z

Apologies in advance for commenting on an already closed issue however I am wondering if this should address the memory leak (Edit: Updated to 1.3 and still see the memory leak issue described above ) that is being caused by memory not being released by python itself described here w.r.t malloc:

TLS/SSL asyncio leaks memory python/cpython#109534 (the "solution" some use is to do an malloc trim, or setting an env var which seems hacky)
and
discussed also here Memory Not Released After High-Load Operations encode/uvicorn#2078 (ctrl-f granian also compared to other implementations with similar results albeit older)

Should I create a new issue for this in case there is something that can be done in granian?

gi0baro · 2024-04-29T22:19:52Z

Apologies in advance for commenting on an already closed issue however I am wondering if this should address the memory leak that is being caused by memory not being released by python itself described here w.r.t malloc:

TLS/SSL asyncio leaks memory python/cpython#109534 (the "solution" some use is to do an malloc trim, or setting an env var which seems hacky)
and

discussed also here Memory Not Released After High-Load Operations encode/uvicorn#2078 (ctrl-f granian also compared to other implementations with similar results albeit older)

Both those seems unrelated to this issue (I also participated in the uvicorn discussion you mentioned). I also frankly doubt the discussion in uvicorn has anything to do with the CPython one, given no ssl/tls is used. But I definitely have no time to investigate those.

Edit: Updated to 1.3 and still see the memory leak issue described above

Seems a bit unrelated as well. Can you please open up a new issue with numbers, tested alternatives and a MRE?

gi0baro added bug Something isn't working asgi Issue related to ASGI protocol labels Apr 17, 2024

gi0baro added this to the 1.3 milestone Apr 17, 2024

gi0baro added the help wanted Extra attention is needed label Apr 17, 2024

gi0baro changed the title ~~Possible memory-leak analysis~~ Possible memory-leak investigation Apr 17, 2024

sciyoshi mentioned this issue Apr 25, 2024

1.2.0 doesn't shut down properly #252

Closed

sciyoshi mentioned this issue Apr 25, 2024

Set error result on awaitables when cancelling futures #279

Closed

gi0baro mentioned this issue Apr 26, 2024

Review PyFutureAwaitable cancellation flow #280

Merged

gi0baro closed this as completed Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory-leak investigation #276

Possible memory-leak investigation #276

gi0baro commented Apr 17, 2024

gi0baro commented Apr 22, 2024

cpoppema commented Apr 22, 2024

gi0baro commented Apr 22, 2024

cpoppema commented Apr 22, 2024

gi0baro commented Apr 22, 2024

cpoppema commented Apr 22, 2024

gi0baro commented Apr 22, 2024

Zerotask commented Apr 23, 2024 •

edited

Loading

sciyoshi commented Apr 25, 2024

gi0baro commented Apr 26, 2024

cpoppema commented Apr 26, 2024 •

edited

Loading

gi0baro commented Apr 27, 2024

Faolain commented Apr 29, 2024 •

edited

Loading

gi0baro commented Apr 29, 2024

Possible memory-leak investigation #276

Possible memory-leak investigation #276

Comments

gi0baro commented Apr 17, 2024

Discussed in #268

gi0baro commented Apr 22, 2024

cpoppema commented Apr 22, 2024

gi0baro commented Apr 22, 2024

cpoppema commented Apr 22, 2024

gi0baro commented Apr 22, 2024

cpoppema commented Apr 22, 2024

gi0baro commented Apr 22, 2024

Zerotask commented Apr 23, 2024 • edited Loading

sciyoshi commented Apr 25, 2024

gi0baro commented Apr 26, 2024

cpoppema commented Apr 26, 2024 • edited Loading

gi0baro commented Apr 27, 2024

Faolain commented Apr 29, 2024 • edited Loading

gi0baro commented Apr 29, 2024

Zerotask commented Apr 23, 2024 •

edited

Loading

cpoppema commented Apr 26, 2024 •

edited

Loading

Faolain commented Apr 29, 2024 •

edited

Loading