help request: 3.2.1 memory leak #10618

wklken · 2023-12-08T09:09:15Z

Description

after deployed online for 2 weeks, we reschedule the pods, then got the chart below. from 3.7G to 6G.

We have no ext-plugins.

about 45000 routes

I have some suspicion that it is caused by the prometheus plugin, when all routes are all presented, the keys in prometheus is stable?

is there any tool to analysis this, while we don't have xray.

Environment

APISIX version (run apisix version): 3.2.1
Operating system (run uname -a):
OpenResty / Nginx version (run openresty -V or nginx -V): openresty/1.21.4.1
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):

The text was updated successfully, but these errors were encountered:

wklken · 2023-12-08T09:16:08Z

the /metrics response data after online for about two week

size: 7.5M
lines: 49323
Data distribution

16098 api_request_duration_milliseconds_bucket
3674 api_request_duration_milliseconds_count
3674 api_request_duration_milliseconds_sum
4990 api_requests_total
7948 bapp_requests_total
12846 bandwidth
11 etcd_modify_indexes
1 etcd_reachable 1
1 http_requests_total 460517663
6 nginx_http_current_connections
1 nginx_metric_errors_total 0
1 node_info
24 shared_dict_capacity_bytes
24 shared_dict_free_space_bytes

boekkooi-lengoo · 2023-12-08T12:23:46Z

Hey @wklken

I have noticed a similar issue and was able to remove it by forcing the consumer to always be empty.

I use the following in my dockerfile to patch the issue.

# Patch https://github.com/apache/apisix/blob/3.7.0/apisix/plugins/prometheus/exporter.lua#L228 to avoid metrics per consumer.
RUN sed -i \
    -e 's/ctx.consumer_name or ""/""/g' \
    /usr/local/apisix/apisix/plugins/prometheus/exporter.lua

Hope this helps.

wklken · 2023-12-08T12:56:23Z

Thanks @boekkooi-lengoo

we have patched some settings to disable official metrics, which will cause the cpu 100% if too much records present.

https://github.com/TencentBlueKing/blueking-apigateway-apisix/blob/master/src/build/patches/001_change_prometheus_default_buckets.patch

currently only the bandwidth left.

I’m not certain whether the increasing memory usage is caused by the Prometheus plugin or not, nor do I understand why it is consuming so much memory.

monkeyDluffy6017 · 2023-12-28T07:57:23Z

@wklken have you solved your problem?

wklken · 2023-12-28T08:24:32Z

Not yet; we are waiting for the line (memory usage) to stabilize (about 4 weeks). If it does not show an increase, then perhaps the Prometheus plugin is the cause. Otherwise, we will need to investigate other plugins.

Any advices or tools for detecting the memory usage of each part of apisix?

@monkeyDluffy6017

monkeyDluffy6017 · 2023-12-28T08:29:11Z

please check if the memory leak happens in lua or in c

curl http://127.0.0.1:9180/apisix/admin/routes/test  \\n-H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '\n{\n    "uri": "/lua_memory_stat",\n    "plugins": {\n        "serverless-pre-function": {\n            "phase": "rewrite",\n            "functions" : ["return function() local mem = collectgarbage(\"count\") ngx.say(\"the memory allocated by lua is \", mem, \" kb\"); end"]\n        }\n    }\n}'

wklken · 2023-12-28T10:11:49Z

@monkeyDluffy6017

the memory usage of pod is 4.12G
the /lua_memory_stat from the apisix of the pod: the memory allocated by lua is 291922.04101562 kb

Vacant2333 · 2024-01-02T06:11:07Z

u can assign the issue to me , i will follow it

Vacant2333 · 2024-01-02T10:00:44Z

Do you think it's possible that this is the issue? would you try this method to resoulve it? @wklken

theweakgod · 2024-01-15T08:45:11Z

@wklken I think this will help you.
#9545
nginx-lua-prometheus memory leak fix
You can solve this problem by upgrading the version.

wklken · 2024-01-15T09:52:29Z

Do you think it's possible that this is the issue? would you try this method to resoulve it? @wklken

@Vacant2333 we don't use the service discovery on production.

wklken · 2024-01-15T09:54:51Z

@wklken I think this will help you. https://github.com/apache/apisix/pull/9545 [https://github.com/knyar/nginx-lua-prometheus/pull/151](nginx-lua-prometheus memory leak fix) You can solve this problem by upgrading the version.

thanks @theweakgod , I will check that.(apisix 3.2.1 use the nginx-lua-prometheus = 0.20220527)

Vacant2333 · 2024-01-18T07:05:28Z

@wklken I think this will help you. https://github.com/apache/apisix/pull/9545 [https://github.com/knyar/nginx-lua-prometheus/pull/151](nginx-lua-prometheus memory leak fix) You can solve this problem by upgrading the version.

thanks @theweakgod , I will check that.(apisix 3.2.1 use the nginx-lua-prometheus = 0.20220527)

It does seem to be one of the reasons. Is it possible to test this possibility (upgrade nginx-lua-prometheus) and see if memory continues to grow? how londgwill this take?

theweakgod · 2024-01-19T08:49:04Z

@wklken我想这会对你有帮助。https://github.com/apache/apisix/pull/9545 https://github.com/knyar/nginx-lua-prometheus/pull/151你可以解决这个问题升级版本出现问题。

谢谢@theweakgod，我会检查一下。（apisix 3.2.1 使用nginx-lua-prometheus = 0.20220527）

Has the problem been solved?

wenj91 · 2024-01-22T03:31:31Z

@wklken 这个问题有什么进展不？

wklken · 2024-01-22T03:45:27Z

@wklken 这个问题有什么进展不？

不能在生产上验, 暂时没有同样的环境可以验证, 需要想办法复现生产一样的流量压一段时间(近期都不一定有时间处理; 验证后我会更新到这个issue)

It cannot be tested in production. We do not have the same environment to verify for the time being. We need to find a way to reproduce the same traffic pressure in production for a period of time (I may not have time to deal with it in the near future; I will update to this issue after verification).

theweakgod · 2024-02-22T06:02:12Z

@wklken Has the problem been solved?

wenj91 · 2024-02-23T08:01:34Z

@wklken 这个问题有什么进展不？

不能在生产上验, 暂时没有同样的环境可以验证, 需要想办法复现生产一样的流量压一段时间(近期都不一定有时间处理; 验证后我会更新到这个issue)

It cannot be tested in production. We do not have the same environment to verify for the time being. We need to find a way to reproduce the same traffic pressure in production for a period of time (I may not have time to deal with it in the near future; I will update to this issue after verification).

提供个线索，在上传图片跟上传文件接口特别容易出现这种现象

wklken · 2024-02-23T08:35:17Z

We rolling update another release, and the memory didn't increase after about 1 week.

@theweakgod I still can't reproduce the memory increasing on my own cluster yet, will try again later.

theweakgod · 2024-02-23T08:43:35Z

@theweakgod I still can't reproduce the memory increasing on my own cluster yet, will try again later.

👌

theweakgod · 2024-02-23T08:55:54Z

@theweakgod I still can't reproduce the memory increasing.

need huge metrics

wklken · 2024-02-23T09:15:50Z

but from the

We rolling update another release, and the memory didn't increase after about 1 week.
@theweakgod I still can't reproduce the memory increasing on my own cluster yet, will try again later.

Here's a revised version of the text with corrected grammar and improved clarity:

From the provided chart:

During the first 7 days, there is an increase in memory usage, with all Prometheus label value combinations present.
Over the following 20 days, there are no new label value combinations, and the memory usage remains stable.

If the pull request bugfix: limit lookup table size is effective, the memory usage should not exceed 5.59 GB and should only show an increase for no more than 7 days.

github-actions · 2025-02-07T10:05:15Z

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2025-02-21T10:05:20Z

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

github-project-automation bot added this to Apache APISIX backlog Dec 8, 2023

wklken mentioned this issue Dec 8, 2023

help request: memory growing up #10392

Closed

moonming moved this to 🏗 In progress in Apache APISIX backlog Dec 10, 2023

monkeyDluffy6017 self-assigned this Dec 29, 2023

shreemaan-abhishek assigned Vacant2333 and unassigned monkeyDluffy6017 Jan 2, 2024

wklken mentioned this issue Nov 15, 2024

help request: memory leakage problem in Apisix GW #11680

Open

github-actions bot added the stale label Feb 7, 2025

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 21, 2025

github-project-automation bot moved this from 🏗 In progress to ✅ Done in Apache APISIX backlog Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

help request: 3.2.1 memory leak #10618

help request: 3.2.1 memory leak #10618

wklken commented Dec 8, 2023

wklken commented Dec 8, 2023

boekkooi-lengoo commented Dec 8, 2023 •

edited

Loading

wklken commented Dec 8, 2023

monkeyDluffy6017 commented Dec 28, 2023

wklken commented Dec 28, 2023

monkeyDluffy6017 commented Dec 28, 2023

wklken commented Dec 28, 2023

Vacant2333 commented Jan 2, 2024

Vacant2333 commented Jan 2, 2024 •

edited

Loading

theweakgod commented Jan 15, 2024 •

edited

Loading

wklken commented Jan 15, 2024

wklken commented Jan 15, 2024

Vacant2333 commented Jan 18, 2024

theweakgod commented Jan 19, 2024

wenj91 commented Jan 22, 2024

wklken commented Jan 22, 2024 •

edited

Loading

theweakgod commented Feb 22, 2024

wenj91 commented Feb 23, 2024

wklken commented Feb 23, 2024

theweakgod commented Feb 23, 2024

theweakgod commented Feb 23, 2024 •

edited

Loading

wklken commented Feb 23, 2024

github-actions bot commented Feb 7, 2025

github-actions bot commented Feb 21, 2025

help request: 3.2.1 memory leak #10618

help request: 3.2.1 memory leak #10618

Comments

wklken commented Dec 8, 2023

Description

Environment

wklken commented Dec 8, 2023

boekkooi-lengoo commented Dec 8, 2023 • edited Loading

wklken commented Dec 8, 2023

monkeyDluffy6017 commented Dec 28, 2023

wklken commented Dec 28, 2023

monkeyDluffy6017 commented Dec 28, 2023

wklken commented Dec 28, 2023

Vacant2333 commented Jan 2, 2024

Vacant2333 commented Jan 2, 2024 • edited Loading

theweakgod commented Jan 15, 2024 • edited Loading

wklken commented Jan 15, 2024

wklken commented Jan 15, 2024

Vacant2333 commented Jan 18, 2024

theweakgod commented Jan 19, 2024

wenj91 commented Jan 22, 2024

wklken commented Jan 22, 2024 • edited Loading

theweakgod commented Feb 22, 2024

wenj91 commented Feb 23, 2024

wklken commented Feb 23, 2024

theweakgod commented Feb 23, 2024

theweakgod commented Feb 23, 2024 • edited Loading

wklken commented Feb 23, 2024

github-actions bot commented Feb 7, 2025

github-actions bot commented Feb 21, 2025

boekkooi-lengoo commented Dec 8, 2023 •

edited

Loading

Vacant2333 commented Jan 2, 2024 •

edited

Loading

theweakgod commented Jan 15, 2024 •

edited

Loading

wklken commented Jan 22, 2024 •

edited

Loading

theweakgod commented Feb 23, 2024 •

edited

Loading