postgres/prometheus couch2pg errors: 'converting NULL to int64 is unsupported' and more #70

mrjones-plip · 2023-06-24T15:35:29Z

When running the couch2pg Data setup, I'm getting errors and the data appears to not be imported.

Here's the output from docker logs cht-monitoring-postgres-exporter-1. The real cht instance name has been replaced with cht_postgres_name_here:

ts=2023-06-24T15:31:03.368Z caller=server.go:74 level=info msg="Established new database connection" fingerprint=172.17.0.1:5432
ts=2023-06-24T15:31:03.633Z caller=collector.go:190 level=error target=postgresql://172.17.0.1:5432/cht_postgres_name_here msg="collector failed" name=postmaster duration_seconds=0.265314435 err="sql: Scan error on column index 0, name \"pg_postmaster_start_time\": converting driver.Value type time.Time (\"2023-05-17 10:11:24.021895 +0000 UTC\") to a float64: invalid syntax"
ts=2023-06-24T15:31:04.343Z caller=postgres_exporter.go:622 level=info msg="Semantic version changed" server=172.17.0.1:5432 from=0.0.0 to=9.6.13
ts=2023-06-24T15:31:04.476Z caller=collector.go:190 level=error target=postgresql://172.17.0.1:5432/cht_postgres_name_here msg="collector failed" name=statio_user_tables duration_seconds=1.107925937 err="sql: Scan error on column index 5, name \"idx_blks_read\": converting NULL to int64 is unsupported"
ts=2023-06-24T15:31:04.741Z caller=collector.go:190 level=error target=postgresql://172.17.0.1:5432/cht_postgres_name_here msg="collector failed" name=process_idle duration_seconds=1.37263144 err="sql: Scan error on column index 3, name \"seconds\": unsupported Scan, storing driver.Value type []uint8 into type *[]int64"
ts=2023-06-24T15:31:05.276Z caller=collector.go:190 level=error target=postgresql://172.17.0.1:5432/cht_postgres_name_here msg="collector failed" name=stat_user_tables duration_seconds=1.9078570670000001 err="sql: Scan error on column index 5, name \"idx_scan\": converting NULL to int64 is unsupported"
ts=2023-06-24T15:31:05.541Z caller=collector.go:190 level=error target=postgresql://172.17.0.1:5432/cht_postgres_name_here msg="collector failed" name=replication duration_seconds=2.172937022 err="sql: Scan error on column index 0, name \"pg_postmaster_start_time\": converting driver.Value type time.Time (\"2023-05-17 10:11:24.021895 +0000 UTC\") to a float64: invalid syntax"

The text was updated successfully, but these errors were encountered:

mrjones-plip · 2023-06-29T21:16:45Z

OK!!! After a deep dive into this I have a path forward, but not specific answer just yet.

tl;dr

we should figure out how to turn off ALL metrics aside from what is in the queries yaml. We tested both --disable-default-metrics and --disable-settings-metrics and it didn't work.

discoveries & workarounds

The biggie is two points:

Errors above don't actually stop the postgres scrape from running! If you expose port 9090 for the prometheus service and check Targets (http://localhost:9090/targets), you actually see this error context deadline exceeded for the postgres target::

Get "http://postgres-exporter:9187/probe?auth_module=postgress_server_here%3A5434%2Fdatabase_here&target=postgresql%3A%2F%2Fpostgress_server_here%3A5434%2Fdatabase_here": context deadline exceeded
You can work around deadline exceeded by updating the scrape config file and add on line 3 scrape_timeout: 1m. Scrape will now succeed after you down/up the services.

scrape_timeout is just a work around because when you DO get it to work you find out that:

it sucks down a 4k+ lines of metrics
is over 350kb of data
takes 10s of seconds to run (average of about 13sec with n=~10)

This is because there's dozens of metrics it fetches for each database and we have 20+ databases in RDBMs. Multiply that out and you get the above huge stats.

helpful points

here is a copy of the output of /metrics on the exporter (Medic google drive)
If you want to test what the /metrics endpoint for postgres is going to return, but don't want to set query time to 1s, you can:
1. enable postgres metrics, per the docs
2. figure out the IP of your postgres service (it changes every down/up of docker). It's likely called cht-watchdog-postgres-exporter-1: docker inspect $(docker ps -q ) --format='{{ printf "%-50s" .Name}} {{range .NetworkSettings.Networks}}{{.IPAddress}} {{end}}'
3. run this curl command using the IP you got from prior step, ensuring that both postgres_server and cht are updated to be correct from your auth_modules: curl -v http://192.168.192.3:9187/probe\?auth_module\=postgres_server%3A5432%2Fcht\&target\=postgresql%3A%2F%2Fpostgres_server%3A5432%2Fcht > postgres.metrics.txt
If you want to get ride of the original converting NULL to int64 errors, you can update the compose file to use master per this issue, which presumably will make it's way to latest soon. Do that by editing your .env file (copied from the example) to have POSTGRES_EXPORTER_VERSION=master and restart
per the fix in the last postgress issue, you can make changes in the docker file to have more flags for when the exporter is called.
It's possible that maybe setting all the [no-]collector.* flags indivudually will work instead of the super disable-*-metrics ones as we tried above?

mrjones-plip · 2023-06-29T21:17:34Z

@m5r - it would be great if you could solve this next week while @jkuester and I are out!

cc @tatilepizs who can assist.

medic-ci · 2023-07-28T14:34:01Z

🎉 This issue has been resolved in version 1.8.2 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

mrjones-plip added the Type: Bug Fix something that isn't working as intended label Jun 24, 2023

sclu1034 mentioned this issue Jun 30, 2023

Several of the new collectors fail with incorrect data types prometheus-community/postgres_exporter#818

Closed

jkuester mentioned this issue Jul 21, 2023

fix(#70): disable all non-custom Postgres metrics #79

Merged

jkuester closed this as completed in #79 Jul 28, 2023

medic-ci added the released label Jul 28, 2023

jkuester added a commit that referenced this issue Jul 28, 2023

fix(#70): disable all non-custom Postgres metrics (#79)

2dcaea6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

postgres/prometheus couch2pg errors: 'converting NULL to int64 is unsupported' and more #70

postgres/prometheus couch2pg errors: 'converting NULL to int64 is unsupported' and more #70

mrjones-plip commented Jun 24, 2023

mrjones-plip commented Jun 29, 2023 •

edited

Loading

mrjones-plip commented Jun 29, 2023

medic-ci commented Jul 28, 2023

postgres/prometheus couch2pg errors: 'converting NULL to int64 is unsupported' and more #70

postgres/prometheus couch2pg errors: 'converting NULL to int64 is unsupported' and more #70

Comments

mrjones-plip commented Jun 24, 2023

mrjones-plip commented Jun 29, 2023 • edited Loading

tl;dr

discoveries & workarounds

helpful points

mrjones-plip commented Jun 29, 2023

medic-ci commented Jul 28, 2023

mrjones-plip commented Jun 29, 2023 •

edited

Loading