Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated the PostgreSQL Prometheus documentation and dashboard #581

Merged

Conversation

algchoo
Copy link
Contributor

@algchoo algchoo commented Jul 7, 2023

Changes

Details

As a result of downloading the json, editing that, and re-uploading, a few extra keys are now present, unsure if removing them is a good idea (ie. breakdowns, dimensions, measures/etc)

During testing, we learned that

  • the postgres dashboard json had errors, on top of the changes that were needed as a result of changes in the exporter. It was all just incorrect metrics in various charts, or duplicates, should be good to go now
  • Certain metrics coming from the exporter are now enabled/disabled with a flag and so the flag was added as an argument in the documentation.
  • Various metrics had a _total suffix added to them and updates were made to the dashboard json to account for this.

@github-actions github-actions bot requested a review from EvanSimpson July 7, 2023 18:43
Copy link
Collaborator

@yqlu yqlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EvanSimpson let's discuss this next week -- of the current wave of exporter doc updates this one involves changing the metric names outright, thereby breaking the dashboards.

It feels wrong to bump the db version, exporter version, update the dashboard to a new working state but everyone who set up the older version of the integration against postgres 14 now has a non-functional dashboard (especially since the integration detection using the characteristic metric pg_settings_max_connections will continue to work against both old and new setups).

@@ -7,7 +7,7 @@ platforms:
exporter_metadata:
name: GKE Prometheus Exporter
doc_url: https://github.com/prometheus-community/postgres_exporter
minimum_supported_version: v0.11.1
minimum_supported_version: v0.13.1
default_metrics:
- name: prometheus.googleapis.com/pg_stat_database_numbackends/gauge
prometheus_name: pg_stat_database_numbackends
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the list of metrics below need to be updated as well

@EvanSimpson
Copy link
Collaborator

@EvanSimpson let's discuss this next week -- of the current wave of exporter doc updates this one involves changing the metric names outright, thereby breaking the dashboards.

It feels wrong to bump the db version, exporter version, update the dashboard to a new working state but everyone who set up the older version of the integration against postgres 14 now has a non-functional dashboard (especially since the integration detection using the characteristic metric pg_settings_max_connections will continue to work against both old and new setups).

Yeah, this looks like a breaking change for existing users. I've updated the integration guide to include criteria for breaking changes and how to version dashboards and integrations to keep all users in a working state. It looks like this warrants a version split. https://docs.google.com/document/d/16luyU9r9YBPOxTL1F5T-n1m0RiCh22qf-oH_p8K_A34/edit

@algchoo
Copy link
Contributor Author

algchoo commented Jul 18, 2023

@yqlu @EvanSimpson

After looking into this further, I learned that I am mistaken about the changes to the metric names. PostgreSQL prometheus metrics were flowing into the same project from another system, which created confusion. After stopping that and looking at the release notes for the exporter, I found this in the 0.13.0 notes:

  • Convert pg_stat_database to new collector #685

Tested PostgreSQL 15.3 against the previous exporter versions (0.12 and 0.11) and the integration/dashboard is working as expected, but with version 0.13.x I don't get the pg_stat_database metrics I am expecting to see, even after including a few different flags (the database and stat_database collectors should be enabled by default, in theory, they shouldn't be required)

        image: quay.io/prometheuscommunity/postgres-exporter:v0.13.0
        args:
        - --collector.stat_statements
        - --collector.stat_database
        - --collector.database

I'm thinking either there's an issue with how I'm configuring the collector to run, or perhaps there is something I missed with the new version of PostgreSQL. Based on the change in the release notes, I don't think the names for the metrics did change, so we might not have to split versions.

@yqlu
Copy link
Collaborator

yqlu commented Jul 19, 2023

Thanks for investigating! Could this be related to prometheus-community/postgres_exporter#818 ? From the comments on that thread, that issue can be spotted via the logs on the exporter.

If so, maybe we should hold off on updating the documentation to support 0.13.x for now?

@Sticksman
Copy link

let me add a logger to it, maybe it'll spit out some errors

@algchoo
Copy link
Contributor Author

algchoo commented Jul 26, 2023

@yqlu I think that for now we should hold off, as you suggested. Would you like this PR closed for now? I left a comment on this thread regarding what I'm seeing for metrics when using PostgreSQL 15.3 and version 0.13.2 of the exporter. I was still unable to see the expected pg_stat_database metrics.

@algchoo
Copy link
Contributor Author

algchoo commented Aug 2, 2023

@yqlu It was determined that there is a race condition that causes some collectors to fail in the exporter. The latest information is here. I went ahead and tested out the exporter after letting PostgreSQL start completely and after that, I saw the expected metrics. That means the dashboard doesn't have to be updated and with a fix for the problem on the way I am wondering how we'd like to handle that in the docs, if at all?

@yqlu
Copy link
Collaborator

yqlu commented Aug 2, 2023

As discussed, let's:

  • leave minimum_supported_version untouched in both documentation.yaml and prometheus_metadata.yaml
  • if the metrics are not going to change, leave the dashboard untouched
  • hold off on the version bumps in the recommended yaml snippet, but keep an eye on that thread... once we have an operational example (maybe not v0.13.2, but a future version that includes this fix?) then update the yaml snippet hosted on the GMP repo directly -- the official docs point to that copy.

@Sticksman
Copy link

Watch for this to be merged and released

@algchoo
Copy link
Contributor Author

algchoo commented Aug 24, 2023

As discussed, let's:

  • leave minimum_supported_version untouched in both documentation.yaml and prometheus_metadata.yaml
  • if the metrics are not going to change, leave the dashboard untouched
  • hold off on the version bumps in the recommended yaml snippet, but keep an eye on that thread... once we have an operational example (maybe not v0.13.2, but a future version that includes this fix?) then update the yaml snippet hosted on the GMP repo directly -- the official docs point to that copy.

The PR that was mentioned has been merged, but the tag for the container that includes the fix is only on master and not a specific version. I'll continue to keep an eye out for the next release and will update the yaml snippet when that happens.

@algchoo
Copy link
Contributor Author

algchoo commented Sep 21, 2023

@yqlu The release of 0.14.0 has happened. Here's the PR for updating the yaml snippet. Did we want to also get this PR updated/merged?

@yqlu yqlu merged commit 6161f53 into GoogleCloudPlatform:master Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants