Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric "server_interactive_sessions_total" missing IoT sessions #3755

Closed
travelton opened this issue May 21, 2020 · 6 comments · Fixed by #4344
Closed

Metric "server_interactive_sessions_total" missing IoT sessions #3755

travelton opened this issue May 21, 2020 · 6 comments · Fixed by #4344
Assignees
Labels
bug c-sn Internal Customer Reference IoT Issues related to Teleport IoT/node tunnelling functionality observability Used for metrics and insight into Teleport.
Milestone

Comments

@travelton
Copy link
Contributor

Description

What happened:
The metrics server_interactive_sessions_total, exported via --diag-addr, is not counting interactive sessions for IoT sessions.

What you expected to happen:
IoT sessions should be counted in server_interactive_sessions_total metric.

How to reproduce it (as minimally and precisely as possible):

  1. Enable --diag-addr=127.0.0.1:3000 on Teleport Cluster.
  2. Ensure an IoT node is joined.
  3. curl http://127.0.0.1:3000/metrics | grep server_interactive_sessions_total.
  4. Expect sessions to be 0.
  5. Connect to IoT node.
  6. Expect sessions to be 1. (It's not, it stays zero)

Environment

  • Teleport version (use teleport version): 4.2.9
@travelton travelton added bug observability Used for metrics and insight into Teleport. labels May 21, 2020
@benarent benarent added the IoT Issues related to Teleport IoT/node tunnelling functionality label May 21, 2020
@webvictim
Copy link
Contributor

webvictim commented May 21, 2020

In addition (while validating this report), I noticed that the server_interactive_sessions_total counter on my personal Teleport installation (1 directly connected node, ~6 IoT nodes) behaves weirdly.

No sessions connected:

$ curl -s http://localhost:3434/metrics | grep sessions
# HELP server_interactive_sessions_total Number of active sessions
# TYPE server_interactive_sessions_total gauge
server_interactive_sessions_total -1

Start a session (to the non-IoT node), then check again:

$ curl -s http://localhost:3434/metrics | grep sessions
# HELP server_interactive_sessions_total Number of active sessions
# TYPE server_interactive_sessions_total gauge
server_interactive_sessions_total 0

Disconnect session and check again:

$ curl -s http://localhost:3434/metrics | grep sessions
# HELP server_interactive_sessions_total Number of active sessions
# TYPE server_interactive_sessions_total gauge
server_interactive_sessions_total -1

The value for sessions shouldn't be able to go below zero.

@russjones russjones added this to the 5.0 Codename TBD milestone May 26, 2020
@webvictim
Copy link
Contributor

Screenshot 2020-07-10 at 17 06 02

:(

@benarent benarent added the c-sn Internal Customer Reference label Jul 10, 2020
@russjones
Copy link
Contributor

Best: 2
Worst: 3

@russjones russjones assigned awly and fspmarshall and unassigned awly Jul 21, 2020
@russjones russjones assigned awly and unassigned fspmarshall Aug 18, 2020
awly pushed a commit that referenced this issue Aug 19, 2020
`session.Close` can get called multiple times, from different deferred
cleanups. The associated metric decrement should only happen on the
first call, to map 1:1 with increments.
Without this, we could end up with negative
`server_interactive_sessions_total` counts.

Fixes #3755
@awly
Copy link
Contributor

awly commented Aug 19, 2020

Figured out the issue @webvictim was having: #4228

As for IoT nodes - the sessions counter works as expected on the node itself. @travelton, which teleport process were you watching those metrics on? Node or proxy?
We only count the sessions on the node (doesn't matter whether IoT and regular).
Maybe we should be keeping an aggregate counter on the proxy?

@zmb3
Copy link
Collaborator

zmb3 commented Aug 19, 2020

Pretty sure @travelton filed this on my behalf. We were watching metrics on the proxy. Would love to see an aggregate counter maintained by the proxy, as the nodes themselves are often deployed in hard-to-reach network segments, making scraping them directly more difficult than scraping the proxy.

@awly
Copy link
Contributor

awly commented Sep 18, 2020

@zmb3 added proxy_ssh_sessions_total, a new metric on the proxy for an aggregate counter.
It will appear in 4.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c-sn Internal Customer Reference IoT Issues related to Teleport IoT/node tunnelling functionality observability Used for metrics and insight into Teleport.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants