Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy public listener bound to incorrect port #11630

Closed
agrahamlincoln opened this issue Nov 23, 2021 · 1 comment
Closed

Envoy public listener bound to incorrect port #11630

agrahamlincoln opened this issue Nov 23, 2021 · 1 comment
Assignees
Labels
needs-investigation The issue described is detailed and complex. theme/envoy/xds Related to Envoy support theme/mesh-gw Track mesh gateway work type/bug Feature does not function as expected

Comments

@agrahamlincoln
Copy link

When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.

Overview of the Issue

Background on our install:

We run the consul k8s helm chart to deploy consul on our EKS clusters. In addition we run consul agents on EC2 instances that join the mesh. We manage our consul service config on EC2 via consul and run our envoy sidecars via a systemd unit template. We've ran into this issue over time as we add more services to our EC2 instances.

We observed a few cases recently in consul service mesh where the envoy-sidecar process was attempting to bind the public_listener, but failed because the port was already used. When inspecting what was bound to the requested port, we found another envoy sidecar using the port. This "other" envoy sidecar was bound to a port that is different from what consul has configured.

Most of these occurrences seem to relate to the addition of new services on a host.

Reproduction Steps

One case where we saw this occur recently:

  1. Ec2 instance (call it api-0) has a consul service registered called promtail
  2. operator installs postgres_exporter service - by adding a json file to /opt/consul/services and calling consul reload
  3. operator starts the envoy sidecar for postgres_exporter by running systemctl start consul-sidecar@postgres_exporter.service this starts an envoy sidecar with a command like consul-sidecar start postgres_exporter
  4. public_listener for postgres_exporter fails to bind, promtail is already bound to port 21000 and postgres_exporter expects to bind to 21000
  5. operator finds what envoy is bound on 21000 and restarts it. promtail's sidecar is now bound to 21001

Consul info for both Client and Server

Client info
agent:
	check_monitors = 1
	check_ttls = 0
	checks = 19
	services = 12
build:
	prerelease = 
	revision = ee4911a9
	version = 1.10.3
consul:
	acl = disabled
	known_servers = 3
	server = false
runtime:
	arch = amd64
	cpu_count = 4
	goroutines = 244
	max_procs = 4
	os = linux
	version = go1.16.7
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 138
	failed = 16
	health_score = 0
	intent_queue = 0
	left = 127
	member_time = 655889
	members = 179
	query_queue = 0
	query_time = 2
Server info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease = 
	revision = ee4911a9
	version = 1.10.3
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = 172.60.1.144:8300
	server = true
raft:
	applied_index = 5308002
	commit_index = 5308002
	fsm_pending = 0
	last_contact = 54.237714ms
	last_log_index = 5308002
	last_log_term = 846
	last_snapshot_index = 5294239
	last_snapshot_term = 846
	latest_configuration = [{Suffrage:Voter ID:08fd2f61-3b9e-ca48-6760-8527d5e5b9b5 Address:172.60.1.144:8300} {Suffrage:Voter ID:8206ee0b-96b9-b389-dcb5-34afdd544aa2 Address:172.60.1.152:8300} {Suffrage:Voter ID:9e27d86a-f0a3-bca2-d4df-03864b61d241 Address:172.60.0.18:8300}]
	latest_configuration_index = 0
	num_peers = 2
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 846
runtime:
	arch = amd64
	cpu_count = 4
	goroutines = 642
	max_procs = 4
	os = linux
	version = go1.16.7
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 138
	failed = 1
	health_score = 2
	intent_queue = 0
	left = 112
	member_time = 655888
	members = 149
	query_queue = 0
	query_time = 2
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 250
	members = 3
	query_queue = 0
	query_time = 1

Operating system and Environment details

We're using EKS 1.18 for servers and k8s-workloads
Our EC2 instances are either Centos7, or AL2

I've built custom binaries based on 1.10.3 with a patch for #8283 and #11422 - nothing else was changed

Log Fragments

Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use -log-level=TRACE on the client and server to capture the maximum log detail.

@Amier3 Amier3 added type/bug Feature does not function as expected theme/envoy/xds Related to Envoy support labels Nov 23, 2021
@Amier3 Amier3 added the theme/mesh-gw Track mesh gateway work label Dec 6, 2021
@dhiaayachi dhiaayachi self-assigned this Dec 6, 2021
@Amier3 Amier3 added the needs-investigation The issue described is detailed and complex. label Dec 6, 2021
@agrahamlincoln
Copy link
Author

#8254 appears to be the cause of this. I'll close this issue in favor of that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-investigation The issue described is detailed and complex. theme/envoy/xds Related to Envoy support theme/mesh-gw Track mesh gateway work type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

3 participants