bug: apisix 内存管理-过期监控key不释放问题 #9627

wood-zhang · 2023-06-08T13:04:24Z

Current Behavior

如果打开了prometheus插件，并且upstream使用了k8s服务发现或者upstream ip随着发布而改变的话，在apisix中就会产生过多的监控key，从而导致内存不断增长，如果不重启apisix最终OOM

Expected Behavior

我期望在upstream ip改变的时候有一个自动检测机制，把内存里的监控指标中不存在的node节点的key进行释放

Error Logs

No response

Steps to Reproduce

我通过将指标中的node纬度关闭从而规避了因发布而导致的upstream ip 改变产生过多key的问题

Environment

APISIX version (run apisix version):
Operating system (run uname -a):
OpenResty / Nginx version (run openresty -V or nginx -V):
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):

The text was updated successfully, but these errors were encountered:

kingluo · 2023-06-09T04:11:41Z

The metrics will be flushed (eventually) to shared memory, and the shared memory is sized lru cache (i.e. eviction when the cache is full), which is not counted into the nginx memory, so no worry about the OOM.
The only worry point is the lookup cache for each metric instance, but that's also sized in the latest version.

Which version of APISIX do you use?
And show your configuration please.

wood-zhang · 2023-06-20T02:13:34Z

version：3.3.0

wood-zhang · 2023-06-20T02:20:54Z

apisix:    # universal configurations
  node_listen:
    - port: 9080    # APISIX listening port
      enable_http2: false
    - port: 9081
      enable_http2: true
  enable_heartbeat: true
  enable_admin: true
  enable_admin_cors: true
  enable_debug: false

  enable_dev_mode: false                       # Sets nginx worker_processes to 1 if set to true
  enable_reuseport: true                       # Enable nginx SO_REUSEPORT switch if set to true.
  enable_ipv6: true # Enable nginx IPv6 resolver
  enable_server_tokens: false # Whether the APISIX version number should be shown in Server header

  # proxy_protocol:                   # Proxy Protocol configuration
  #   listen_http_port: 9181          # The port with proxy protocol for http, it differs from node_listen and admin_listen.
  #                                   # This port can only receive http request with proxy protocol, but node_listen & admin_listen
  #                                   # can only receive http request. If you enable proxy protocol, you must use this port to
  #                                   # receive http request with proxy protocol
  #   listen_https_port: 9182         # The port with proxy protocol for https
  #   enable_tcp_pp: true             # Enable the proxy protocol for tcp proxy, it works for stream_proxy.tcp option
  #   enable_tcp_pp_to_upstream: true # Enables the proxy protocol to the upstream server

  proxy_cache:                         # Proxy Caching configuration
    cache_ttl: 10s                     # The default caching time if the upstream does not specify the cache time
    zones:                             # The parameters of a cache
    - name: disk_cache_one             # The name of the cache, administrator can be specify
                                       # which cache to use by name in the admin api
      memory_size: 50m                 # The size of shared memory, it's used to store the cache index
      disk_size: 1G                    # The size of disk, it's used to store the cache data
      disk_path: "/tmp/disk_cache_one" # The path to store the cache data
      cache_levels: "1:2"              # The hierarchy levels of a cache
  #  - name: disk_cache_two
  #    memory_size: 50m
  #    disk_size: 1G
  #    disk_path: "/tmp/disk_cache_two"
  #    cache_levels: "1:2"

  router:
    http: radixtree_uri  # radixtree_uri: match route by uri(base on radixtree)
                                # radixtree_host_uri: match route by host + uri(base on radixtree)
                                # radixtree_uri_with_parameter: match route by uri with parameters
    ssl: 'radixtree_sni'        # radixtree_sni: match route by SNI(base on radixtree)
  stream_proxy:                 # TCP/UDP proxy
    only: false
    tcp:                        # TCP proxy port list
      - 8001
  # dns_resolver:
  #
  #   - 127.0.0.1
  #
  #   - 172.20.0.10
  #
  #   - 114.114.114.114
  #
  #   - 223.5.5.5
  #
  #   - 1.1.1.1
  #
  #   - 8.8.8.8
  #
  dns_resolver_valid: 30
  resolver_timeout: 5
  ssl:
    enable: true
    listen:
      - port: 9443
        enable_http2: true
    ssl_protocols: "TLSv1.2 TLSv1.3"
    ssl_ciphers: "xxxxx"
    ssl_trusted_certificate: "/etcd-ssl/ca.pem"

nginx_config:    # config for render the template to genarate nginx.conf
  http_server_configuration_snippet: |
    proxy_ignore_client_abort on;
  error_log: "/dev/stderr"
  error_log_level: "error"    # warn,error
  worker_processes: "8"
  enable_cpu_affinity: true
  worker_rlimit_nofile: 102400  # the number of files a worker process can open, should be larger than worker_connections
  event:
    worker_connections: 65535
  http:
    enable_access_log: true
    access_log: "/dev/stdout"
    access_log_format: '{\"timestamp\":\"$time_iso8601\",\"server_addr\":\"$server_addr\",\"remote_addr\":\"$remote_addr\",\"remote_port\":\"$realip_remote_port\",\"all_cookie\":\"$http_cookie\",\"http_host\":\"$http_host\",\"query_string\":\"$query_string\",\"request_method\":\"$request_method\",\"uri\":\"$uri\",\"service\":\"apisix_backend\",\"request_uri\":\"$request_uri\",\"status\":\"$status\",\"body_bytes_sent\":\"$body_bytes_sent\",\"request_time\":\"$request_time\",\"upstream_response_time\":\"$upstream_response_time\",\"upstream_addr\":\"$upstream_addr\",\"upstream_status\":\"$upstream_status\",\"http_referer\":\"$http_referer\",\"http_user_agent\":\"$http_user_agent\",\"http_x_forwarded_for\":\"$http_x_forwarded_for\",\"spanId\":\"$http_X_B3_SpanId\",\"http_token\":\"$http_token\",\"http_authorizationv2\":\"$http_authorizationv2\",\"content-type\":\"$content_type\",\"content-length\":\"$content_length\",\"traceId\":\"$http_X_B3_TraceId\"}'
    access_log_format_escape: json
    lua_shared_dict:
      prometheus-metrics: 800m
      discovery: 300m
      kubernetes: 200m

    keepalive_timeout: 60s         # timeout during which a keep-alive client connection will stay open on the server side.
    client_header_timeout: 60s     # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client
    client_body_timeout: 60s       # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client
    send_timeout: 10s              # timeout for transmitting a response to the client.then the connection is closed
    underscores_in_headers: "on"   # default enables the use of underscores in client request header fields
    real_ip_header: "X-Forwarded-For"    # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
    real_ip_recursive: on # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
    #real_ip_from:                  # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
    #  - 127.0.0.1
    #  - 'unix:'
    real_ip_from:
      - 127.0.0.1/24
      - 'unix:'      
      - 10.28.0.0/14
      - 10.32.0.0/17
discovery:
  kubernetes:
    client:
      token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    service:
      host: ${KUBERNETES_SERVICE_HOST}
      port: ${KUBERNETES_SERVICE_PORT}
      schema: https
plugins:    # plugin list
  - api-breaker
  - authz-keycloak
  - basic-auth
  - batch-requests
  - consumer-restriction
  - cors
  - client-control
  - echo
  - fault-injection
  - file-logger
  - grpc-transcode
  - grpc-web
  - hmac-auth
  - http-logger
  - ip-restriction
  - ua-restriction
  - jwt-auth
  - kafka-logger
  - key-auth
  - limit-conn
  - limit-count
  - limit-req
  - node-status
  - openid-connect
  - authz-casbin
  - prometheus
  - proxy-cache
  - proxy-mirror
  - proxy-rewrite
  - redirect
  - referer-restriction
  - request-id
  - request-validation
  - response-rewrite
  - serverless-post-function
  - serverless-pre-function
  - sls-logger
  - syslog
  - tcp-logger
  - udp-logger
  - uri-blocker
  - wolf-rbac
  - zipkin
  - traffic-split
  - gzip
  - real-ip
  - ext-plugin-pre-req
  - ext-plugin-post-req
stream_plugins:
  - mqtt-proxy
  - ip-restriction
  - limit-conn
plugin_attr:
  prometheus:
    enable_export_server: true
    export_addr:
      ip: 0.0.0.0
      port: 9091
    export_uri: /apisix/prometheus/metrics
    metric_prefix: apisix_

deployment:
  role: traditional
  role_traditional:
    config_provider: etcd

  admin:
    allow_admin:    # http://nginx.org/en/docs/http/ngx_http_access_module.html#allow
      - 127.0.0.1/24
      - 172.16.174.0/24

    #   - "::/64"
    admin_listen:
      ip: 0.0.0.0
      port: 9180
    # Default token when use API to call for Admin API.
    # *NOTE*: Highly recommended to modify this value to protect APISIX's Admin API.
    # Disabling this configuration item means that the Admin API does not
    # require any authentication.
    admin_key:
      # admin: can everything for configuration data
      - name: "admin"
        key: xxxxx
        role: admin
      # viewer: only can view configuration data
      - name: "viewer"
        key: xxxxx
        role: viewer
    https_admin: false
    admin_api_mtls:
      admin_ssl_ca_cert: "/etcd-ssl/ca.pem"
      admin_ssl_cert: "/etcd-ssl/etcd.pem"
      admin_ssl_cert_key: "/etcd-ssl/etcd-key.pem"
  etcd:
    host:                          # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
      - "https://xx.xx:2379"             # multiple etcd address
    prefix: "/apisix"    # configuration prefix in etcd
    timeout: 30    # 30 seconds
    tls:
      ssl_trusted_certificate: "/etcd-ssl/ca.pem"
      cert: "/etcd-ssl/etcd.pem"
      key: "/etcd-ssl/etcd-key.pem"
      verify: true
      sni: "xxx.com"

hansedong · 2023-08-04T02:50:30Z

It is obvious that there are problems with the mechanism of this Prometheus exporter, which can be seen from four aspects:

When there are more routes and upstreams, the metrics data grows exponentially.
Due to APISIX Metrics only increasing and not decreasing, historical data keeps accumulating.
Although there is an LRU mechanism to control the size of Prometheus Lua shared memory within the set value, this is not a fundamental solution. Once the LRU mechanism is triggered, Metrics Error will continue to increase. We hope that Metrics Error can help us identify issues in a reasonable manner.
Although in the new version, APISIX has moved the server for prometheus metrics exporter to a privileged process, reducing P100 issues, due to Metrics only increasing and not decreasing, it also puts significant pressure on the privilege server. In our production environment as an example, we have 150k metrics data points. Each time Prometheus pulls data it causes Nginx worker CPU usage to reach 100% and lasts for about 5-10 seconds.

In fact nginx-lua-prometheus provides counter:del() and gauge:del() methods to delete Labels. The APISIX Prometheus plugin may need to delete Prometheus Metric data at certain times.

Currently, our approach is similar; however we are more aggressive by only retaining type-level and route-level data while removing everything else.

before:

metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", "service", "consumer", "node", unpack(extra_labels("http_latency"))},  
    buckets)

after：

metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", unpack(extra_labels("http_latency"))},  
    buckets)

moonming · 2023-12-18T08:11:13Z

It is obvious that there are problems with the mechanism of this Prometheus exporter, which can be seen from four aspects:

When there are more routes and upstreams, the metrics data grows exponentially.

Due to APISIX Metrics only increasing and not decreasing, historical data keeps accumulating.

Although there is an LRU mechanism to control the size of Prometheus Lua shared memory within the set value, this is not a fundamental solution. Once the LRU mechanism is triggered, Metrics Error will continue to increase. We hope that Metrics Error can help us identify issues in a reasonable manner.

Although in the new version, APISIX has moved the server for prometheus metrics exporter to a privileged process, reducing P100 issues, due to Metrics only increasing and not decreasing, it also puts significant pressure on the privilege server. In our production environment as an example, we have 150k metrics data points. Each time Prometheus pulls data it causes Nginx worker CPU usage to reach 100% and lasts for about 5-10 seconds.

In fact nginx-lua-prometheus provides counter:del() and gauge:del() methods to delete Labels. The APISIX Prometheus plugin may need to delete Prometheus Metric data at certain times.

Currently, our approach is similar; however we are more aggressive by only retaining type-level and route-level data while removing everything else.

before:
metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", "service", "consumer", "node", unpack(extra_labels("http_latency"))},  
    buckets)
after：
metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", unpack(extra_labels("http_latency"))},  
    buckets)

@hansedong well said 👍
Only retaining type and route level data is not a universal implementation idea, and other users may not accept it.

We are trying to find a general proposal, for example: set the TTL of these prom metrics data in LRU to 10 minutes (of course it can be adjusted, here is just an example), and then this memory issue can be solved. What do you think?

hansedong · 2023-12-18T09:14:07Z

@hansedong well said 👍 Only retaining type and route level data is not a universal implementation idea, and other users may not accept it.

We are trying to find a general proposal, for example: set the TTL of these prom metrics data in LRU to 10 minutes (of course it can be adjusted, here is just an example), and then this memory issue can be solved. What do you think?

This is a good idea. As I understand it, the TTL mechanism can preserve data for specific metrics (which are updated regularly) and also allow for the deletion of expired metrics.
If this feature is released, I am willing to do some testing.

Sn0rt · 2023-12-20T08:38:12Z

hi. this case has been reproduced by test case. pls take a look.
now. the test case 1 can't pass now. the test case 2 can pass.

Sn0rt · 2023-12-20T10:39:21Z

It is obvious that there are problems with the mechanism of this Prometheus exporter, which can be seen from four aspects:

When there are more routes and upstreams, the metrics data grows exponentially.

Due to APISIX Metrics only increasing and not decreasing, historical data keeps accumulating.

Although there is an LRU mechanism to control the size of Prometheus Lua shared memory within the set value, this is not a fundamental solution. Once the LRU mechanism is triggered, Metrics Error will continue to increase. We hope that Metrics Error can help us identify issues in a reasonable manner.

Although in the new version, APISIX has moved the server for prometheus metrics exporter to a privileged process, reducing P100 issues, due to Metrics only increasing and not decreasing, it also puts significant pressure on the privilege server. In our production environment as an example, we have 150k metrics data points. Each time Prometheus pulls data it causes Nginx worker CPU usage to reach 100% and lasts for about 5-10 seconds.

In fact nginx-lua-prometheus provides counter:del() and gauge:del() methods to delete Labels. The APISIX Prometheus plugin may need to delete Prometheus Metric data at certain times.
Currently, our approach is similar; however we are more aggressive by only retaining type-level and route-level data while removing everything else.
before:
metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", "service", "consumer", "node", unpack(extra_labels("http_latency"))},  
    buckets)
after：
metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", unpack(extra_labels("http_latency"))},  
    buckets)
@hansedong well said 👍 Only retaining type and route level data is not a universal implementation idea, and other users may not accept it.

We are trying to find a general proposal, for example: set the TTL of these prom metrics data in LRU to 10 minutes (of course it can be adjusted, here is just an example), and then this memory issue can be solved. What do you think?

I think this ttl solution is a bit troublesome, because the upstream Prometheus library does not set the ttl parameter location when setting the vaule. If you want the ttl solution, you need to change the upstream library.

The more important point is that latency is a histogram data type, and ttl cannot be used to automatically reclaim resources.
For example, if the ttl is 1h, if an upstram has not been accessed within 1h, then the latency corresponding to the node needs to be cleared. The summary data corresponding to the latency will be inaccurate next time the node is accessed.

Sn0rt · 2023-12-25T06:30:08Z

I have inspect I found kong looks has same problem

maybe we need a timer to maintain the share dict
Or ignore this problem like kong and no longer ignore the upstream field. Avoid occupying a lot of memory.

hansedong · 2023-12-25T07:02:03Z

@Sn0rt

I feel that if we ignore this problem, it also means that the problem that the metrics data keeps increasing cannot be solved. If the metrics are too large, there is still the problem that the privileged worker calculates the CPU time. Before this, our online environment (before APISIX transferred the Metrics computation to a special privileged worker) would restart APISIX every once in a while
If it's not easy to introduce the TTL mechanism in the current APISIX, is it possible to dynamically delete the relevant Metrics when upstream and route are deleted?

Sn0rt · 2023-12-25T08:29:56Z

@Sn0rt

I feel that if we ignore this problem, it also means that the problem that the metrics data keeps increasing cannot be solved. If the metrics are too large, there is still the problem that the privileged worker calculates the CPU time. Before this, our online environment (before APISIX transferred the Metrics computation to a special privileged worker) would restart APISIX every once in a while

If it's not easy to introduce the TTL mechanism in the current APISIX, is it possible to dynamically delete the relevant Metrics when upstream and route are deleted?

Thank you for your continued attention. After discussion with @membphis, I found that my previous understanding of metrics was wrong.

We use the TTL scheme to recycle metrics that have not been reset for a long time, which has no impact on grafana's display.

liugang594 · 2023-12-27T07:29:14Z

It is obvious that there are problems with the mechanism of this Prometheus exporter, which can be seen from four aspects:

When there are more routes and upstreams, the metrics data grows exponentially.

Due to APISIX Metrics only increasing and not decreasing, historical data keeps accumulating.

Although there is an LRU mechanism to control the size of Prometheus Lua shared memory within the set value, this is not a fundamental solution. Once the LRU mechanism is triggered, Metrics Error will continue to increase. We hope that Metrics Error can help us identify issues in a reasonable manner.

Although in the new version, APISIX has moved the server for prometheus metrics exporter to a privileged process, reducing P100 issues, due to Metrics only increasing and not decreasing, it also puts significant pressure on the privilege server. In our production environment as an example, we have 150k metrics data points. Each time Prometheus pulls data it causes Nginx worker CPU usage to reach 100% and lasts for about 5-10 seconds.

In fact nginx-lua-prometheus provides counter:del() and gauge:del() methods to delete Labels. The APISIX Prometheus plugin may need to delete Prometheus Metric data at certain times.
Currently, our approach is similar; however we are more aggressive by only retaining type-level and route-level data while removing everything else.
before:
metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", "service", "consumer", "node", unpack(extra_labels("http_latency"))},  
    buckets)
after：
metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", unpack(extra_labels("http_latency"))},  
    buckets)
@hansedong well said 👍 Only retaining type and route level data is not a universal implementation idea, and other users may not accept it.

We are trying to find a general proposal, for example: set the TTL of these prom metrics data in LRU to 10 minutes (of course it can be adjusted, here is just an example), and then this memory issue can be solved. What do you think?

Do we have plan for TTL ?

Sn0rt · 2023-12-27T08:20:13Z

It is obvious that there are problems with the mechanism of this Prometheus exporter, which can be seen from four aspects:

When there are more routes and upstreams, the metrics data grows exponentially.

Due to APISIX Metrics only increasing and not decreasing, historical data keeps accumulating.

Although there is an LRU mechanism to control the size of Prometheus Lua shared memory within the set value, this is not a fundamental solution. Once the LRU mechanism is triggered, Metrics Error will continue to increase. We hope that Metrics Error can help us identify issues in a reasonable manner.

Although in the new version, APISIX has moved the server for prometheus metrics exporter to a privileged process, reducing P100 issues, due to Metrics only increasing and not decreasing, it also puts significant pressure on the privilege server. In our production environment as an example, we have 150k metrics data points. Each time Prometheus pulls data it causes Nginx worker CPU usage to reach 100% and lasts for about 5-10 seconds.

In fact nginx-lua-prometheus provides counter:del() and gauge:del() methods to delete Labels. The APISIX Prometheus plugin may need to delete Prometheus Metric data at certain times.
Currently, our approach is similar; however we are more aggressive by only retaining type-level and route-level data while removing everything else.
before:
metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", "service", "consumer", "node", unpack(extra_labels("http_latency"))},  
    buckets)
after：
metrics.latency = prometheus:histogram("http_latency",  
    "HTTP request latency in milliseconds per service in APISIX",  
    {"type", "route", unpack(extra_labels("http_latency"))},  
    buckets)
@hansedong well said 👍 Only retaining type and route level data is not a universal implementation idea, and other users may not accept it.
We are trying to find a general proposal, for example: set the TTL of these prom metrics data in LRU to 10 minutes (of course it can be adjusted, here is just an example), and then this memory issue can be solved. What do you think?
Do we have plan for TTL ?

APISIX uses the knyar/nginx-lua-prometheus library to set the metric. The ttl solution would be better if it is supported by the underlying library.

Currently in discussion with the maintainer of knyar/nginx-lua-prometheus knyar/nginx-lua-prometheus#164, in any case this issue is already being advanced.

liugang594 · 2023-12-27T08:31:49Z

I tried it and so some simple test, it looks good.

Sn0rt · 2024-01-09T10:46:49Z

I tried it and so some simple test, it looks good.

Judging from your implementation, you should not implement ttl in the underlying library knyar/nginx-lua-prometheus, right? Instead, it traverses the shared memory used by prometheus and sets exptime to achieve it?

Sn0rt · 2024-01-09T11:51:57Z

I'll re-evaluate and the fault report of my task is wrong.

First of all, it is true that multiple changes to the upstream will affect the metric, but these metrics are all placed in the share memory. If the share dict is full, it will be automatically evicted according to LRU.
The second one has no evidence that oom is caused by prometheus

ngx.shared.DICT.set
syntax: success, err, forcible = ngx.shared.DICT:set(key, value, exptime?, flags?)

context: init_by_lua*, set_by_lua*, rewrite_by_lua*, access_by_lua*, content_by_lua*, header_filter_by_lua*, body_filter_by_lua*, log_by_lua*, ngx.timer., balancer_by_lua, ssl_certificate_by_lua*, ssl_session_fetch_by_lua , ssl_session_store_by_lua, ssl_client_hello_by_lua*

Unconditionally sets a key-value pair into the shm-based dictionary ngx.shared.DICT. Returns three values:

success: boolean value to indicate whether the key-value pair is stored or not.
err: textual error message, can be "no memory".
forcible: a boolean value to indicate whether other valid items have been removed forcibly when out of storage in the shared memory zone.

@liugang594 @hansedong

monkeyDluffy6017 · 2024-02-18T08:19:45Z

@hansedong The TTL feature is merged, would you like to do some testing?

hansedong · 2024-02-19T03:16:49Z

@hansedong The TTL feature is merged, would you like to do some testing?

Yes I'd love to, I plan to upgrade one APISIX gateway of the microservice scenario to test the effect of the new feature

god12311 · 2024-04-18T03:28:03Z

@moonming

god12311 · 2024-04-18T03:28:15Z

@moonming Hello, this problem has been fixed in version 3.9.0. When will it be updated in version 3.2.2?

moonming · 2024-04-18T03:50:15Z

@moonming Hello, this problem has been fixed in version 3.9.0. When will it be updated in version 3.2.2?

No, we will keep new features and bug fixes in the master branch.

god12311 · 2024-04-18T05:58:35Z

@moonming Hello, this problem has been fixed in version 3.9.0. When will it be updated in version 3.2.2?

No, we will keep new features and bug fixes in the master branch.

If it is only fixed in the new version, as a long-term support version, how should we solve this kind of problem that affects production stability? Should we consider updating?

xuzonghao · 2024-07-17T14:03:11Z

The 3.9 version of apisix still has the issue of the http_status upstream_status key not being updated. When can this be resolved？

lingsamuel added this to Apache APISIX backlog Jun 21, 2023

lingsamuel added the feature-request label Jun 21, 2023

lingsamuel moved this to 📋 Backlog in Apache APISIX backlog Jun 26, 2023

moonming moved this from 📋 Backlog to 🏗 In progress in Apache APISIX backlog Dec 14, 2023

Sn0rt mentioned this issue Dec 20, 2023

ci: new test for metric upstream node #10675

Closed

5 tasks

monkeyDluffy6017 assigned Sn0rt Dec 29, 2023

Vacant2333 mentioned this issue Jan 2, 2024

help request: 3.2.1 memory leak #10618

Closed

monkeyDluffy6017 mentioned this issue Jan 18, 2024

feature requirement: new option expiration for recycle the share dict knyar/nginx-lua-prometheus#164

Open

This was referenced Jan 25, 2024

feat: support key expires api7/nginx-lua-prometheus#1

Merged

feat: support expire prometheus metrics #10869

Merged

moonming moved this from 🏗 In progress to 👀 In review in Apache APISIX backlog Jan 29, 2024

monkeyDluffy6017 closed this as completed in #10869 Feb 18, 2024

github-project-automation bot moved this from 👀 In review to ✅ Done in Apache APISIX backlog Feb 18, 2024

Sn0rt removed their assignment Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: apisix 内存管理-过期监控key不释放问题 #9627

bug: apisix 内存管理-过期监控key不释放问题 #9627

wood-zhang commented Jun 8, 2023 •

edited

Loading

kingluo commented Jun 9, 2023

wood-zhang commented Jun 20, 2023

wood-zhang commented Jun 20, 2023 •

edited by shreemaan-abhishek

Loading

hansedong commented Aug 4, 2023 •

edited

Loading

moonming commented Dec 18, 2023

hansedong commented Dec 18, 2023

Sn0rt commented Dec 20, 2023

Sn0rt commented Dec 20, 2023 •

edited

Loading

Sn0rt commented Dec 25, 2023 •

edited

Loading

hansedong commented Dec 25, 2023

Sn0rt commented Dec 25, 2023

liugang594 commented Dec 27, 2023

Sn0rt commented Dec 27, 2023

liugang594 commented Dec 27, 2023

Sn0rt commented Jan 9, 2024

Sn0rt commented Jan 9, 2024 •

edited

Loading

monkeyDluffy6017 commented Feb 18, 2024

hansedong commented Feb 19, 2024

god12311 commented Apr 18, 2024

god12311 commented Apr 18, 2024

moonming commented Apr 18, 2024

god12311 commented Apr 18, 2024

xuzonghao commented Jul 17, 2024

bug: apisix 内存管理-过期监控key不释放问题 #9627

bug: apisix 内存管理-过期监控key不释放问题 #9627

Comments

wood-zhang commented Jun 8, 2023 • edited Loading

Current Behavior

Expected Behavior

Error Logs

Steps to Reproduce

Environment

kingluo commented Jun 9, 2023

wood-zhang commented Jun 20, 2023

wood-zhang commented Jun 20, 2023 • edited by shreemaan-abhishek Loading

hansedong commented Aug 4, 2023 • edited Loading

moonming commented Dec 18, 2023

hansedong commented Dec 18, 2023

Sn0rt commented Dec 20, 2023

Sn0rt commented Dec 20, 2023 • edited Loading

Sn0rt commented Dec 25, 2023 • edited Loading

hansedong commented Dec 25, 2023

Sn0rt commented Dec 25, 2023

liugang594 commented Dec 27, 2023

Sn0rt commented Dec 27, 2023

liugang594 commented Dec 27, 2023

Sn0rt commented Jan 9, 2024

Sn0rt commented Jan 9, 2024 • edited Loading

monkeyDluffy6017 commented Feb 18, 2024

hansedong commented Feb 19, 2024

god12311 commented Apr 18, 2024

god12311 commented Apr 18, 2024

moonming commented Apr 18, 2024

god12311 commented Apr 18, 2024

xuzonghao commented Jul 17, 2024

wood-zhang commented Jun 8, 2023 •

edited

Loading

wood-zhang commented Jun 20, 2023 •

edited by shreemaan-abhishek

Loading

hansedong commented Aug 4, 2023 •

edited

Loading

Sn0rt commented Dec 20, 2023 •

edited

Loading

Sn0rt commented Dec 25, 2023 •

edited

Loading

Sn0rt commented Jan 9, 2024 •

edited

Loading