-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v3.6.2/v3.6.3 runs into timeouts for some panels #389
Comments
I know such a comment usually does not help, but just to say you're not alone: |
Hi, Thanks for reporting it! The only change introduced from Any thoughts @spinillos? 🤔 Maybe worth trying to reproduce it / what happens when de/applying these changes and see if we can, at least, identify the set of changes that caused it @feuerrot @jgournet Thanks! |
We are also having a similar issue. Every render on every panel on the specific dashboard succeeds except one: the only state-timeline in the dashboard. When we reverted back to v3.6.1 the panel got rendered perfectly. Also another thing that happend was a MAJOR speed up; on v3.6.2 it easily took 3-7m for the entire dashboard to render (failing without completing so might even take longer) now (on v3.6.1) it only takes 1-2m and succeeds flawlessly! v3.6.2 Resources(Scale 1h) CPUMemoryMemory (Full lifetime ~3h)v3.6.1 Resources(Scale 5m [!!!]) CPUMemoryDebugWhen the panel failed we observed this error log in the image render. (v3.6.2) {"level":"error","message":"Request failed","stack":"TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r1036745 is guaranteed to work.\n at Timeout.onTimeout (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:301:20)\n at listOnTimeout (node:internal/timers:559:17)\n at processTimers (node:internal/timers:502:7)","url":"/render?deviceScaleFactor=1.000000&domain=grafana.monitorns&encoding=&height=440&renderKey=xxxxxxxxxxxxxxxxxxxxxxx&timeout=60&timezone=&url=http%3A%2F%2Fgrafana.monitorns%3A80%2Fd-solo%2Fdashuid%2F_%3Ffrom%3Dnow-24h%26height%3D440%26panelId%3D111%26theme%3Dlight%26to%3Dnow%26width%3D560%26render%3D1&width=560"}
{"level":"error","message":"x.x.x.x - - [Z] \"GET /render?deviceScaleFactor=1.000000&domain=grafana.monitorns&encoding=&height=440&renderKey=xxxxxxxxxxxxxxxxxxxxxxx&timeout=60&timezone=&url=http%3A%2F%2Fgrafana.monitorns%3A80%2Fd-solo%2Fdashuid%2F_%3Ffrom%3Dnow-24h%26height%3D440%26panelId%3D111%26theme%3Dlight%26to%3Dnow%26width%3D560%26render%3D1&width=560 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.3.1\"\n"} On grafana we observed these logs
Seeing the time out, I tried to run the request with For now using v3.6.1 is working for us |
@feuerrot Are you using the grafana helm chart? |
not OP, but: we use prometheus-kube-prometheus-stack helm chart - we don't have restarts issues (but we do have the issues from this ticket) |
We actually use the exact same helm chart! Sadly the chart doesn't allow us to change the liveness probe timeout directly. Seems a value of 10s on the timeout guaranteed the containers to be up. As an extra also saw that increasing the replicas seemed to increase stability without increasing the resource requirement with a small (20%) speed-up. Will test to see if increasing the replicas with the default liveness works |
@jgournet |
Well, now that I am checking ... turns out that I forgot:
so yeah ... ignore that part for me :) |
@WesselAtWork no, we just use puppet to deploy the image renderer container on the same host as grafana:
I also tried the current version 3.6.3, which fails too:
|
At least for our test cases the problem seems to be fixed in the latest version (v3.6.4). |
It happens with 3.7.1 again @feuerrot |
|
https://storage.googleapis.com/chromium-find-releases-static/index.html#r1036745 and the version from the 3.7.1 image is Chromium 112.0.5615.165 |
@astrolemonade I haven't been able to test v3.7.1 yet. |
What happened:
renovate upgraded our grafana-image-renderer docker container from version
3.6.1@sha256:b78baa730828d2f64e2d6a41f0314124147c3405c3890a444529bab6b1cebb6c
to version3.6.2@sha256:cbb508cee8a55c9f2f850e33bfcf7e487757e309a5a417df82d2d590b1333698
.Afterwards some of our automated render requests for reports started to fail: The chrome process inside the container runs at 100% cpu until the configured timeout. I can reproduce it by switching between the two container versions.
What you expected to happen:
Dashboards render without any problems whether we're using the v3.6.1 or v3.6.2 docker container.
How to reproduce it (as minimally and precisely as possible):
Docker Container env:
Grafana configuration:
Panel JSON
Container Logs for both versions
Environment:
The text was updated successfully, but these errors were encountered: