-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NETOBSERV-1334: DNS metrics and dashboards #489
Conversation
@jotak: This pull request references NETOBSERV-1334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #489 +/- ##
==========================================
+ Coverage 62.63% 64.50% +1.87%
==========================================
Files 56 58 +2
Lines 6816 7007 +191
==========================================
+ Hits 4269 4520 +251
+ Misses 2233 2179 -54
+ Partials 314 308 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@jotak: This pull request references NETOBSERV-1334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:3ffe39c make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-3ffe39c Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-3ffe39c
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
@jotak: This pull request references NETOBSERV-1334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:9d7011d make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-9d7011d Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-9d7011d
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:8e94fc8 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-8e94fc8 Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-8e94fc8
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
@jotak: This pull request references NETOBSERV-1334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
includeList: | ||
- node_ingress_bytes_total | ||
- workload_ingress_bytes_total | ||
- namespace_flows_total | ||
# includeList: | ||
# - "node_ingress_bytes_total" | ||
# - "workload_ingress_bytes_total" | ||
# - "namespace_flows_total" | ||
# - "namespace_drop_packets_total" | ||
# - "namespace_rtt_seconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jotak can you explain this change? we've already started updating our CRDs to include those three defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes there is a subtlety. The default metrics are still these ones, but in CRD, the default is to not have the includeList
specified. The reason is that, when you set values in includeList, even just keeping this exact same default metrics list, it's considered as a setting explictly set. In a following release, imagine that we add new metrics to the default list: people starting from scratch would get this default, but people upgrading would have the explicit list in settings, hence they would not get them.
And in addition it is also fixing a problem when converting from beta2 to beta1 (NETOBSERV-1405) , where setting default values in this sample was preventing to get updated defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you can leave them in your CR, it's still the defaults metrics that we expect, and they are defined in code: https://github.com/netobserv/network-observability-operator/blob/main/pkg/metrics/predefined_metrics.go#L37-L42 (drop and rtt metrics being only generated when the corresponding features are enabled too)
- Add DNS metrics as predefined metric (per node/namespace/workload) - namespace_dns_latency_seconds is enabled by default - update metrics doc - add dashboards: per-code dns request rate, and dns latencies histogram - to add histogram feature to the dashboards I had to refactor dashboards builder: - define a clear dashboard model, that abstracts Grafana and hopefully should be reusable later for perses - separate business logic (dashboard.go) from that model (model.go) - since histograms is now there I could add the missing RTT dashboard - fixed a bug where drops dashboards were displayed even though the feature is disabled
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:1539e7f make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-1539e7f Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-1539e7f
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{Key: "DnsId", Type: flpapi.PromFilterPresence}, | ||
}, | ||
Labels: dnsLabels, | ||
ValueScale: 1000, // ms => s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a problem in the metric definition itself (it's recommended to use base units) but I agree that it's debatable in the dashboard.
I wish the scale would auto-adapt but this isn't the case :-(
My rationale to use seconds is that we're more interested in tracking high latencies than low ones ... but I agree that ms would most of the time be better. So yeah, let me see that...
/ok-to-test |
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:75120c0 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-75120c0 Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-75120c0
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
/label qe-approved |
@jotak: This pull request references NETOBSERV-1334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
thanks @Amoghrd ! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jotak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
Dependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.