Release v3.6.2/v3.6.3 runs into timeouts for some panels #389

feuerrot · 2022-12-05T15:04:43Z

What happened:
renovate upgraded our grafana-image-renderer docker container from version 3.6.1@sha256:b78baa730828d2f64e2d6a41f0314124147c3405c3890a444529bab6b1cebb6c to version 3.6.2@sha256:cbb508cee8a55c9f2f850e33bfcf7e487757e309a5a417df82d2d590b1333698.

Afterwards some of our automated render requests for reports started to fail: The chrome process inside the container runs at 100% cpu until the configured timeout. I can reproduce it by switching between the two container versions.

What you expected to happen:
Dashboards render without any problems whether we're using the v3.6.1 or v3.6.2 docker container.

How to reproduce it (as minimally and precisely as possible):
Docker Container env:

RENDERING_MODE=clustered
RENDERING_CLUSTERING_MODE=default
RENDERING_CLUSTERING_MAX_CONCURRENCY=50
RENDERING_CLUSTERING_TIMEOUT=120
ENABLE_METRICS=true
LOG_LEVEL=debug

Grafana configuration:

[rendering]
server_url = http://localhost:8081/render
callback_url = http://localhost:3000/
concurrent_render_request_limit = 1000

Panel JSON

{
  "id": 2,
  "gridPos": {
    "h": 10,
    "w": 24,
    "x": 0,
    "y": 0
  },
  "type": "timeseries",
  "title": "$path: Ladezeiten",
  "repeat": "path",
  "repeatDirection": "h",
  "maxPerRow": 2,
  "transformations": [],
  "interval": "1m",
  "datasource": {
    "uid": "000000016",
    "type": "prometheus"
  },
  "description": "",
  "fieldConfig": {
    "defaults": {
      "custom": {
        "drawStyle": "line",
        "lineInterpolation": "linear",
        "barAlignment": 0,
        "lineWidth": 1,
        "fillOpacity": 0,
        "gradientMode": "none",
        "spanNulls": false,
        "showPoints": "auto",
        "pointSize": 5,
        "stacking": {
          "mode": "none",
          "group": "A"
        },
        "axisPlacement": "auto",
        "axisLabel": "",
        "axisColorMode": "text",
        "scaleDistribution": {
          "type": "linear"
        },
        "axisCenteredZero": false,
        "hideFrom": {
          "tooltip": false,
          "viz": false,
          "legend": false
        },
        "thresholdsStyle": {
          "mode": "off"
        }
      },
      "color": {
        "mode": "palette-classic"
      },
      "mappings": [],
      "thresholds": {
        "mode": "absolute",
        "steps": [
          {
            "color": "green",
            "value": null
          },
          {
            "color": "red",
            "value": 80
          }
        ]
      },
      "max": 10,
      "min": 0,
      "unit": "s"
    },
    "overrides": []
  },
  "options": {
    "tooltip": {
      "mode": "single",
      "sort": "none"
    },
    "legend": {
      "showLegend": true,
      "displayMode": "table",
      "placement": "bottom",
      "calcs": [
        "last",
        "min",
        "max"
      ]
    }
  },
  "targets": [
    {
      "exemplar": true,
      "expr": "holt_winters(headless_navigate_cached_duration_seconds{project=\"REDACTED\",path=~\"$path\"}[1d], 0.1, 0.5)",
      "hide": false,
      "interval": "$__interval",
      "legendFormat": "{{ environment }}",
      "refId": "B",
      "datasource": {
        "uid": "000000016",
        "type": "prometheus"
      }
    }
  ],
  "timeFrom": null,
  "scopedVars": {
    "path": {
      "text": "/",
      "value": "/",
      "selected": true
    }
  }
}

Container Logs for both versions

3.6.1:
{"collectDefaultMetrics":true,"level":"info","message":"Metrics enabled","requestDurationBuckets":"1,5,7,9,11,13,15,20,30"}
{"level":"info","maxConcurrency":50,"message":"using clustered browser","mode":"default","timeout":120}
{"config":{"acceptLanguage":null,"args":["--no-sandbox","--disable-gpu"],"chromeBin":"/usr/bin/chromium-browser","clustering":{"maxConcurrency":50,"mode":"default","monitor":false,"timeout":120},"deviceScaleFactor":1,"dumpio":false,"emulateNetworkConditions":false,"headed":false,"height":500,"ignoresHttpsErrors":false,"maxDeviceScaleFactor":4,"maxHeight":3000,"maxWidth":3080,"mode":"clustered","timingMetrics":false,"verboseLogging":false,"width":1000},"level":"debug","message":"Browser initialized"}
{"level":"info","message":"HTTP Server started, listening at http://localhost:8081"}
{"level":"debug","message":"Launching Browser cluster with puppeteer-cluster"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/000000015/icinga2-default?var-hostname=REDACTED&var-service=check_REDACTED&var-command=prometheus_alert&panelId=1&orgId=5&width=640&height=280&theme=light&from=now-7d&to=now&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/icinga2-reports?from=1667257200000&height=500&orgId=1&panelId=6&theme=light&timeout=10000&to=1669762800000&var-host=REDACTED&var-service=check_REDACTED&width=1000&render=1"}
{"file":"/tmp/e0775560.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:43 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=280&renderKey=REDACTED&timeout=60&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F000000015%2Ficinga2-default%3Fvar-hostname%3DREDACTED%26var-service%3Dcheck_REDACTED%26var-command%3Dprometheus_alert%26panelId%3D1%26orgId%3D5%26width%3D640%26height%3D280%26theme%3Dlight%26from%3Dnow-7d%26to%3Dnow%26render%3D1&width=640 HTTP/1.1\" 200 14703 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/000000015/icinga2-default?var-hostname=REDACTED&var-service=check_REDACTED&var-command=prometheus_alert&panelId=1&orgId=5&width=640&height=280&theme=light&from=now-7d&to=now&render=1"}
{"file":"/tmp/0a4a7388.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:45 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Ficinga2-reports%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D6%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-host%3DREDACTED%26var-service%3Dcheck_REDACTED%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 61113 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/icinga2-reports?from=1667257200000&height=500&orgId=1&panelId=6&theme=light&timeout=10000&to=1669762800000&var-host=REDACTED&var-service=check_REDACTED&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"file":"/tmp/a04b0494.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:49 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D2%26theme%3Dlight%26timeout%3D100000%26to%3D1669762800000%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 56671 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"file":"/tmp/90a5f675.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:53 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3DREDACTED%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 22349 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}
{"file":"/tmp/c118bbe3.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:56 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1668210604609%26height%3D500%26orgId%3D1%26panelId%3D2%26render%3D1%26theme%3Dlight%26timeout%3D100000%26to%3D1668597640274%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 57743 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}


3.6.2:
{"collectDefaultMetrics":true,"level":"info","message":"Metrics enabled","requestDurationBuckets":"1,5,7,9,11,13,15,20,30"}
{"level":"info","maxConcurrency":50,"message":"using clustered browser","mode":"default","timeout":120}
{"config":{"acceptLanguage":null,"args":["--no-sandbox","--disable-gpu"],"chromeBin":"/usr/bin/chromium-browser","clustering":{"maxConcurrency":50,"mode":"default","monitor":false,"timeout":120},"deviceScaleFactor":1,"dumpio":false,"emulateNetworkConditions":false,"headed":false,"height":500,"ignoresHttpsErrors":false,"maxDeviceScaleFactor":4,"maxHeight":3000,"maxWidth":3080,"mode":"clustered","timingMetrics":false,"verboseLogging":false,"width":1000},"level":"debug","message":"Browser initialized"}
{"level":"info","message":"HTTP Server started, listening at http://localhost:8081"}
{"level":"debug","message":"Launching Browser cluster with puppeteer-cluster"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/icinga2-reports?from=1667257200000&height=500&orgId=1&panelId=6&theme=light&timeout=10000&to=1669762800000&var-host=REDACTED&var-service=check_REDACTED&width=1000&render=1"}
{"file":"/tmp/47de72d3.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:31:02 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Ficinga2-reports%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D6%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-host%3DREDACTED%26var-service%3Dcheck_REDACTED%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 60954 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/icinga2-reports?from=1667257200000&height=500&orgId=1&panelId=6&theme=light&timeout=10000&to=1669762800000&var-host=REDACTED&var-service=check_REDACTED&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:32:08 +0000] \"GET /render/version HTTP/1.1\" 200 19 \"-\" \"Grafana/9.2.6\"\n"}
{"err":"Error: Navigation failed because browser has disconnected!\n    at new LifecycleWatcher (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:95:223)\n    at Frame.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Frame.js:194:25)\n    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1155:88)\n    at /usr/src/app/build/browser/browser.js:260:29\n    at ClusteredBrowser.withTimingMetrics (/usr/src/app/build/browser/browser.js:411:20)\n    at ClusteredBrowser.takeScreenshot (/usr/src/app/build/browser/browser.js:256:24)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/src/app/build/browser/clustered.js:66:32","level":"error","message":"Error while trying to prepare page for screenshot","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"err":"Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.\n    at CDPSession.send (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:285:35)\n    at ExecutionContext._ExecutionContext_evaluate (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:210:46)\n    at ExecutionContext.evaluateHandle (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:158:107)\n    at WaitTask.rerun (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/IsolatedWorld.js:473:37)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)","level":"error","message":"Error while waiting for the panels to load","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"level":"error","message":"Request failed","stack":"Error: Timeout hit: 120000\n    at /usr/src/app/node_modules/puppeteer-cluster/dist/util.js:69:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/usr/src/app/node_modules/puppeteer-cluster/dist/util.js:5:58)","url":"/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D2%26theme%3Dlight%26timeout%3D100000%26to%3D1669762800000%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000"}
{"level":"error","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:33:04 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D2%26theme%3Dlight%26timeout%3D100000%26to%3D1669762800000%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"err":"Error: Navigation failed because browser has disconnected!\n    at new LifecycleWatcher (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:95:223)\n    at Frame.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Frame.js:194:25)\n    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1155:88)\n    at /usr/src/app/build/browser/browser.js:260:29\n    at ClusteredBrowser.withTimingMetrics (/usr/src/app/build/browser/browser.js:411:20)\n    at ClusteredBrowser.takeScreenshot (/usr/src/app/build/browser/browser.js:256:24)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/src/app/build/browser/clustered.js:66:32","level":"error","message":"Error while trying to prepare page for screenshot","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"err":"Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.\n    at CDPSession.send (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:285:35)\n    at ExecutionContext._ExecutionContext_evaluate (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:210:46)\n    at ExecutionContext.evaluateHandle (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:158:107)\n    at WaitTask.rerun (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/IsolatedWorld.js:473:37)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)","level":"error","message":"Error while waiting for the panels to load","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"level":"error","message":"Request failed","stack":"Error: Timeout hit: 120000\n    at /usr/src/app/node_modules/puppeteer-cluster/dist/util.js:69:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/usr/src/app/node_modules/puppeteer-cluster/dist/util.js:5:58)","url":"/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3DREDACTED%26width%3D1000%26render%3D1&width=1000"}
{"level":"error","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:35:54 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3DREDACTED%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}
{"err":"Error: Navigation failed because browser has disconnected!\n    at new LifecycleWatcher (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:95:223)\n    at Frame.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Frame.js:194:25)\n    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1155:88)\n    at /usr/src/app/build/browser/browser.js:260:29\n    at ClusteredBrowser.withTimingMetrics (/usr/src/app/build/browser/browser.js:411:20)\n    at ClusteredBrowser.takeScreenshot (/usr/src/app/build/browser/browser.js:256:24)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/src/app/build/browser/clustered.js:66:32","level":"error","message":"Error while trying to prepare page for screenshot","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}
{"err":"Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.\n    at CDPSession.send (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:285:35)\n    at ExecutionContext._ExecutionContext_evaluate (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:210:46)\n    at ExecutionContext.evaluateHandle (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:158:107)\n    at WaitTask.rerun (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/IsolatedWorld.js:473:37)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)","level":"error","message":"Error while waiting for the panels to load","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}
{"level":"error","message":"Request failed","stack":"Error: Timeout hit: 120000\n    at /usr/src/app/node_modules/puppeteer-cluster/dist/util.js:69:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/usr/src/app/node_modules/puppeteer-cluster/dist/util.js:5:58)","url":"/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1668210604609%26height%3D500%26orgId%3D1%26panelId%3D2%26render%3D1%26theme%3Dlight%26timeout%3D100000%26to%3D1668597640274%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000"}
{"level":"error","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:37:58 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1668210604609%26height%3D500%26orgId%3D1%26panelId%3D2%26render%3D1%26theme%3Dlight%26timeout%3D100000%26to%3D1668597640274%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}

Anything else we need to know?:

Environment:

Grafana Image Renderer version: v3.6.1 -> v3.6.2
Grafana version: v9.2.6
Installed plugin or remote renderer service: remote, using a docker container on the same host
OS Grafana Image Renderer is installed on: Grafana itself and the Docker daemon are running on CentOS 7.9.2009
User OS & Browser: Debian testing, Firefox 106.0b5
Others: -

The text was updated successfully, but these errors were encountered:

jgournet · 2022-12-09T02:45:34Z

I know such a comment usually does not help, but just to say you're not alone:
we're getting the same issue.
Reverting to 3.6.1 "solves" it

joanlopez · 2023-01-02T14:52:28Z

Hi,

Thanks for reporting it!

The only change introduced from v3.6.1 that could cause this issue is this one, unless it's related with any of the dependencies update.

Any thoughts @spinillos? 🤔

Maybe worth trying to reproduce it / what happens when de/applying these changes and see if we can, at least, identify the set of changes that caused it @feuerrot @jgournet

Thanks!

WesselAtWork · 2023-01-04T16:29:50Z

We are also having a similar issue.

Every render on every panel on the specific dashboard succeeds except one: the only state-timeline in the dashboard.
At first I thought it was because the timeline is still in beta but as it turns out it seems to be related to this issue.

When we reverted back to v3.6.1 the panel got rendered perfectly.

Also another thing that happend was a MAJOR speed up; on v3.6.2 it easily took 3-7m for the entire dashboard to render (failing without completing so might even take longer) now (on v3.6.1) it only takes 1-2m and succeeds flawlessly!
Memory and CPU are also WAY down and stayed down.

v3.6.2 Resources

(Scale 1h)

CPU

Memory

Memory (Full lifetime ~3h)

v3.6.1 Resources

(Scale 5m [!!!])

CPU

Memory

Debug

When the panel failed we observed this error log in the image render. (v3.6.2)

{"level":"error","message":"Request failed","stack":"TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r1036745 is guaranteed to work.\n    at Timeout.onTimeout (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:301:20)\n    at listOnTimeout (node:internal/timers:559:17)\n    at processTimers (node:internal/timers:502:7)","url":"/render?deviceScaleFactor=1.000000&domain=grafana.monitorns&encoding=&height=440&renderKey=xxxxxxxxxxxxxxxxxxxxxxx&timeout=60&timezone=&url=http%3A%2F%2Fgrafana.monitorns%3A80%2Fd-solo%2Fdashuid%2F_%3Ffrom%3Dnow-24h%26height%3D440%26panelId%3D111%26theme%3Dlight%26to%3Dnow%26width%3D560%26render%3D1&width=560"}

{"level":"error","message":"x.x.x.x - - [Z] \"GET /render?deviceScaleFactor=1.000000&domain=grafana.monitorns&encoding=&height=440&renderKey=xxxxxxxxxxxxxxxxxxxxxxx&timeout=60&timezone=&url=http%3A%2F%2Fgrafana.monitorns%3A80%2Fd-solo%2Fdashuid%2F_%3Ffrom%3Dnow-24h%26height%3D440%26panelId%3D111%26theme%3Dlight%26to%3Dnow%26width%3D560%26render%3D1&width=560 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.3.1\"\n"}

On grafana we observed these logs

logger=rendering renderer=http t=Z level=info msg=Rendering path="d-solo/dashuid/_?from=now-24h&height=440&panelId=111&theme=light&to=now&width=560"

logger=rendering renderer=http t=Z level=debug msg="calling remote rendering service" url="http://grafana-image-renderer.monitorns:9999/render?deviceScaleFactor=1.000000&domain=grafana.monitorns&encoding=&height=440&renderKey=xxxxxxxxxxxxxxxxxxxxxxx&timeout=60&timezone=&url=http%3A%2F%2Fgrafana.monitorns%3A80%2Fd-solo%2FSRExOVRVx%2F_%3Ffrom%3Dnow-24h%26height%3D440%26panelId%3D111%26theme%3Dlight%26to%3Dnow%26width%3D560%26render%3D1&width=560"

logger=rendering renderer=http t=Z level=error msg="Remote rendering request failed" error="500 Internal Server Error" url="http://grafana.monitorns:80/d-solo/dashuid/_?from=now-24h&height=440&panelId=111&theme=light&to=now&width=560&render=1"

logger=context userId=x orgId=1 uname=sa-userx t=Z level=error msg="Rendering failed." error="remote rendering request failed, status code: 500, status: 500 Internal Server Error"

logger=context userId=x orgId=1 uname=sa-userx t=Z level=error msg="Request Completed" method=GET path=/render/d-solo/dashuid/_ status=500 remote_addr=x.x.x.x time_ms=31457 duration=31.457985829s size=1344 referer= handler=/render/*

logger=rendering renderer=http t=Z level=debug msg="calling remote rendering service" url=http://grafana-image-renderer.monitorns:9999/render/version

logger=cleanup t=Z level=info msg="Completed cleanup jobs" duration=398.773195ms

Seeing the time out, I tried to run the request with timeout=500 but that didn't change anything because it failed (printed the image-renderer errors) within 1m of the request being sent

For now using v3.6.1 is working for us

WesselAtWork · 2023-01-16T19:39:06Z

@feuerrot Are you using the grafana helm chart?
I noticed that the liveness probe is very short (1 sec) and I was seeing a lot of restarts on the pod.

jgournet · 2023-01-17T03:05:34Z

not OP, but: we use prometheus-kube-prometheus-stack helm chart - we don't have restarts issues (but we do have the issues from this ticket)

WesselAtWork · 2023-01-17T08:55:48Z

We actually use the exact same helm chart!
(It uses OP Grafana as a sub-chart anyway)

Sadly the chart doesn't allow us to change the liveness probe timeout directly.

Seems a value of 10s on the timeout guaranteed the containers to be up.

As an extra also saw that increasing the replicas seemed to increase stability without increasing the resource requirement with a small (20%) speed-up.

Will test to see if increasing the replicas with the default liveness works

WesselAtWork · 2023-01-18T08:53:52Z

not OP, but: we use prometheus-kube-prometheus-stack helm chart - we don't have restarts issues (but we do have the issues from this ticket)

@jgournet
What is the liveness timeout on your deployment? Might be that our k8s defaults to 1s timeout. Yours might be more generous.

jgournet · 2023-01-18T20:56:47Z

Well, now that I am checking ... turns out that I forgot:
Due to some old issues, we actually disable renderer from the original helm chart and patch the deploy to add a barebone one:

spec:
  template:
    spec:
      containers:
      - image: grafana/grafana-image-renderer:latest
        imagePullPolicy: IfNotPresent
        name: grafana-image-renderer
        ports:
        - containerPort: 8081
          name: web
          protocol: TCP
        resources:
          limits:
            cpu: 1000m
            memory: 1024Mi
          requests:
            cpu: 100m
            memory: 256Mi

so yeah ... ignore that part for me :)
(I'll try adding more CPU though)

feuerrot · 2023-01-24T11:05:58Z

@WesselAtWork no, we just use puppet to deploy the image renderer container on the same host as grafana:

  if $deploy_image_renderer {
    include profile::docker
    docker::run { 'image_renderer':
      image   => $image_renderer_image,
      net     => 'host',
      env     => [
        'RENDERING_MODE=clustered',
        'RENDERING_CLUSTERING_MODE=default',
        'RENDERING_CLUSTERING_MAX_CONCURRENCY=50',
        'RENDERING_CLUSTERING_TIMEOUT=120',
        'ENABLE_METRICS=true',
        'LOG_LEVEL=debug',
      ],
      require => Class['profile::docker']
    }
[…]

I also tried the current version 3.6.3, which fails too:

{"collectDefaultMetrics":true,"level":"info","message":"Metrics enabled","requestDurationBuckets":"1,5,7,9,11,13,15,20,30"}
{"level":"info","maxConcurrency":50,"message":"using clustered browser","mode":"default","timeout":120}
{"config":{"acceptLanguage":null,"args":["--no-sandbox","--disable-gpu"],"chromeBin":"/usr/bin/chromium-browser","clustering":{"maxConcurrency":50,"mode":"default","monitor":false,"timeout":120},"deviceScaleFactor":1,"dumpio":false,"emulateNetworkConditions":false,"headed":false,"height":500,"ignoresHttpsErrors":false,"maxDeviceScaleFactor":4,"maxHeight":3000,"maxWidth":3080,"mode":"clustered","pageZoomLevel":1,"timingMetrics":false,"verboseLogging":false,"width":1000},"level":"debug","message":"Browser initialized"}
{"level":"info","message":"HTTP Server started, listening at http://localhost:8081"}
{"level":"debug","message":"Launching Browser cluster with puppeteer-cluster"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=[…]&width=1000&render=1"}
{"err":"Error: Navigation failed because browser has disconnected!\n    at new LifecycleWatcher (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:95:223)\n    at Frame.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Frame.js:194:25)\n    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1155:88)\n    at /usr/src/app/build/browser/browser.js:260:29\n    at ClusteredBrowser.withTimingMetrics (/usr/src/app/build/browser/browser.js:412:20)\n    at ClusteredBrowser.takeScreenshot (/usr/src/app/build/browser/browser.js:256:24)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/src/app/build/browser/clustered.js:66:32","level":"error","message":"Error while trying to prepare page for screenshot","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=[…]&width=1000&render=1"}
{"err":"Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.\n    at CDPSession.send (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:285:35)\n    at ExecutionContext._ExecutionContext_evaluate (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:210:46)\n    at ExecutionContext.evaluateHandle (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:158:107)\n    at WaitTask.rerun (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/IsolatedWorld.js:473:37)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)","level":"error","message":"Error while waiting for the panels to load","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=[…]&width=1000&render=1"}
{"level":"error","message":"Request failed","stack":"Error: Timeout hit: 120000\n    at /usr/src/app/node_modules/puppeteer-cluster/dist/util.js:69:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/usr/src/app/node_modules/puppeteer-cluster/dist/util.js:5:58)","url":"/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=[…]&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3D[…]%26width%3D1000%26render%3D1&width=1000"}
{"level":"error","message":"::ffff:127.0.0.1 - - [24/Jan/2023:10:33:16 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=[…]&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3D[…]%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.3.2\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=[…]&width=1000&render=1"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [24/Jan/2023:10:33:52 +0000] \"GET /render/version HTTP/1.1\" 200 19 \"-\" \"Grafana/9.3.2\"\n"}

feuerrot · 2023-02-22T09:55:18Z

At least for our test cases the problem seems to be fixed in the latest version (v3.6.4).

astrolemonade · 2023-05-25T13:50:12Z

It happens with 3.7.1 again @feuerrot

astrolemonade · 2023-05-25T14:35:33Z

TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r1036745 is guaranteed to work.

astrolemonade · 2023-05-25T14:51:44Z

https://storage.googleapis.com/chromium-find-releases-static/index.html#r1036745
The version returned is

and the version from the 3.7.1 image is

Chromium 112.0.5615.165

feuerrot · 2023-06-01T10:17:03Z

@astrolemonade I haven't been able to test v3.7.1 yet.

feuerrot changed the title ~~Release v3.6.2 runs into timeouts for some dashboards~~ Release v3.6.2 runs into timeouts for some panels Dec 5, 2022

feuerrot changed the title ~~Release v3.6.2 runs into timeouts for some panels~~ Release v3.6.2/v3.6.3 runs into timeouts for some panels Jan 24, 2023

feuerrot closed this as completed Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v3.6.2/v3.6.3 runs into timeouts for some panels #389

Release v3.6.2/v3.6.3 runs into timeouts for some panels #389

feuerrot commented Dec 5, 2022

jgournet commented Dec 9, 2022

joanlopez commented Jan 2, 2023

WesselAtWork commented Jan 4, 2023 •

edited

Loading

WesselAtWork commented Jan 16, 2023

jgournet commented Jan 17, 2023

WesselAtWork commented Jan 17, 2023

WesselAtWork commented Jan 18, 2023

jgournet commented Jan 18, 2023

feuerrot commented Jan 24, 2023

feuerrot commented Feb 22, 2023

astrolemonade commented May 25, 2023

astrolemonade commented May 25, 2023

astrolemonade commented May 25, 2023

feuerrot commented Jun 1, 2023

Release v3.6.2/v3.6.3 runs into timeouts for some panels #389

Release v3.6.2/v3.6.3 runs into timeouts for some panels #389

Comments

feuerrot commented Dec 5, 2022

jgournet commented Dec 9, 2022

joanlopez commented Jan 2, 2023

WesselAtWork commented Jan 4, 2023 • edited Loading

v3.6.2 Resources

CPU

Memory

Memory (Full lifetime ~3h)

v3.6.1 Resources

CPU

Memory

Debug

WesselAtWork commented Jan 16, 2023

jgournet commented Jan 17, 2023

WesselAtWork commented Jan 17, 2023

WesselAtWork commented Jan 18, 2023

jgournet commented Jan 18, 2023

feuerrot commented Jan 24, 2023

feuerrot commented Feb 22, 2023

astrolemonade commented May 25, 2023

astrolemonade commented May 25, 2023

astrolemonade commented May 25, 2023

feuerrot commented Jun 1, 2023

WesselAtWork commented Jan 4, 2023 •

edited

Loading