Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic SIGSEGV while emitting metrics #10703

Closed
andrejvanderzee opened this issue Jan 15, 2021 · 4 comments
Closed

Panic SIGSEGV while emitting metrics #10703

andrejvanderzee opened this issue Jan 15, 2021 · 4 comments

Comments

@andrejvanderzee
Copy link
Contributor

andrejvanderzee commented Jan 15, 2021

One of our Vaults (v1.6.1) panicked while emitting metrics:

Jan 15 01:14:11 ip-172-23-21-4 vault[1319]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x3bfbd39]
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]: goroutine 64705 [running]:
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]: github.com/hashicorp/vault/vault.(*Core).findKvMounts(0xc000918000, 0x0, 0x0, 0x0)
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]:         /gopath/src/github.com/hashicorp/vault/vault/core_metrics.go:245 +0xb9
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]: github.com/hashicorp/vault/vault.(*Core).kvSecretGaugeCollector(0xc000918000, 0x5843480, 0xc03136e720, 0xc030fe0e30, 0x18643d3, 0xbff858f8dd5ce625, 0x23080fd8ad89, 0x82ad3a0)
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]:         /gopath/src/github.com/hashicorp/vault/vault/core_metrics.go:332 +0x32
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]: github.com/hashicorp/vault/helper/metricsutil.(*GaugeCollectionProcess).collectAndFilterGauges(0xc0061a75f0)
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]:         /gopath/src/github.com/hashicorp/vault/helper/metricsutil/gauge_process.go:162 +0x1e4
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]: github.com/hashicorp/vault/helper/metricsutil.(*GaugeCollectionProcess).Run(0xc0061a75f0)
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]:         /gopath/src/github.com/hashicorp/vault/helper/metricsutil/gauge_process.go:244 +0x9f
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]: created by github.com/hashicorp/vault/vault.(*Core).emitMetrics
Jan 15 01:14:11 ip-172-23-21-4 vault[1319]:         /gopath/src/github.com/hashicorp/vault/vault/core_metrics.go:222 +0xaaa
Jan 15 01:14:12 ip-172-23-21-4 systemd[1]: vault.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

For now just one incident.

@andrejvanderzee andrejvanderzee changed the title Panic SIGSEGV while emiting metirs Panic SIGSEGV while emitting metrics Jan 15, 2021
@HridoyRoy
Copy link
Contributor

Hi @andrejvanderzee , we would need more information around this particular vault setup to repro the issue and debug. Is there anything more you can provide, such as the cluster setup and additional logs? Thanks!

@andrejvanderzee
Copy link
Contributor Author

andrejvanderzee commented Jan 18, 2021

There was nothing more to see in the logs. Its a three-node cluster with the following config:

storage "raft" {
  path    = "/mnt/vault/storage/"
  node_id = "node_0"
  performance_multiplier = 1
  retry_join { leader_api_addr = "http://172.23.20.4:8200" }
  retry_join { leader_api_addr = "http://172.23.21.4:8200" }
  retry_join { leader_api_addr = "http://172.23.22.4:8200" }
}

disable_mlock = true

listener "tcp" {
  address = "172.23.20.4:8200"
  tls_disable = 1
}

listener "tcp" {
  address = "127.0.0.1:8200"
  tls_disable = 1
}

telemetry {
  prometheus_retention_time = "30s",
  disable_hostname = true
}

plugin_directory = "/etc/vault/vault-plugins"
api_addr = "http://172.23.20.4:8200"
cluster_addr = "https://172.23.20.4:8201"

Nate that the Vault node was running already for a while, most likely days. So it was not "during post-seal" like the linked issue.

@HridoyRoy
Copy link
Contributor

Hi @andrejvanderzee , the above PR should fix the issue. We will also be backporting it to 1.6

@HridoyRoy
Copy link
Contributor

The PR has been merged and backported, so the issue should be fixed in 1.7 as well as the upcoming 1.6.2 release. I'm going to close this issue, but please feel free to reopen if the problem persists with those versions.

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants