From a5e89ac54ce323da854ffd836f90f434ad407fd0 Mon Sep 17 00:00:00 2001 From: Mahamed Date: Thu, 19 Aug 2021 18:05:38 +0100 Subject: [PATCH 1/8] update metrics docs --- docs/admin/collecting-metrics/README.md | 86 +++++++++++++------------ 1 file changed, 46 insertions(+), 40 deletions(-) diff --git a/docs/admin/collecting-metrics/README.md b/docs/admin/collecting-metrics/README.md index 8c076048389..6e3d645f0f7 100644 --- a/docs/admin/collecting-metrics/README.md +++ b/docs/admin/collecting-metrics/README.md @@ -1,6 +1,47 @@ -# Collecting Metrics with OpenTelemetry +# Collecting Metrics in Knative -You can set up the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) to receive metrics from Knative components and distribute them to Prometheus. +Knative offers two solutions for collecting metrics: +- [Prometheus](https://prometheus.io/) +- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) + +[Grafana](https://grafana.com/oss/) dashboards are available for metrics collected directly with Prometheus. + +You can also set up the OpenTelemetry Collector to receive metrics from Knative components and distribute them to other metrics providers that support OpenTelemetry. + +## About Prometheus + +[Prometheus](https://prometheus.io/) is an open-source tool for collecting and +aggregating timeseries metrics. It can be used to scrape the OpenTelemetry collector that you created in the previous step. + +## Setting up Prometheus + +1. Install the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator/helm) by entering the following command: + + ```bash + helm repo add prometheus-community https://prometheus-community.github.io/helm-charts + helm repo update + helm install prometheus prometheus-community/kube-prometheus-stack -n default + ``` + + !!! caution + You will need to ensure that the helm chart has following values configured, otherwise the ServiceMonitors/Podmonitors will not work. + ```yaml + kube-state-metrics: + metricLabelsAllowlist: + - pods=[*] + - deployments=[app.kubernetes.io/name,app.kubernetes.io/component,app.kubernetes.io/instance] + prometheus: + prometheusSpec: + serviceMonitorSelectorNilUsesHelmValues: false + podMonitorSelectorNilUsesHelmValues: false + +1. Apply the ServiceMonitors/PodMonitors to collect metrics from Knative. + + ```bash + kubectl apply -f https://raw.githubusercontent.com/knative-sandbox/monitoring/main/servicemonitor.yaml + ``` +1. Grafana dashboards can be imported from https://github.com/knative-sandbox/monitoring/tree/main/grafana. + ## About OpenTelemetry @@ -18,6 +59,9 @@ In the following example, you can configure a single collector instance using a !!! tip For more complex deployments, you can automate some of these steps by using the [OpenTelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator). + +!!! caution + The Grafana dashboards at https://github.com/knative-sandbox/monitoring/tree/main/grafana don't work with metrics scraped from OpenTelemetry Collector. ![Diagram of components reporting to collector, which is scraped by Prometheus](system-diagram.svg) @@ -68,41 +112,3 @@ In the following example, you can configure a single collector instance using a 1. Fetch `http://localhost:8889/metrics` to see the exported metrics. -## About Prometheus - -[Prometheus](https://prometheus.io/) is an open-source tool for collecting and -aggregating timeseries metrics. It can be used to scrape the OpenTelemetry collector that you created in the previous step. - -## Setting up Prometheus - -1. Install the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator) by entering the following command: - - ```bash - kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/master/bundle.yaml - ``` - - !!! caution - The manifest provided installs the Prometheus Operator into the `default` namespace. If you want to install the Operator in a different namespace, you must download the [YAML manifest](https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/master/bundle.yaml) and update any namespace references to your target namespace. - -1. Create a `ServiceMonitor` object to track the OpenTelemetry collector. -1. Create a `ServiceAccount` object with the ability to read Kubernetes services and pods, so that Prometheus can track the resource endpoints. -1. Apply the `prometheus.yaml` file to create a Prometheus instance, by entering the following command: - - ```bash - kubectl apply -f prometheus.yaml - ``` - - -### Make the Prometheus instance public - -By default, the Prometheus instance is only exposed on a private service named `prometheus-operated`. - -To access the console in your web browser: - -1. Enter the command: - - ```bash - kubectl port-forward --namespace metrics service/prometheus-operated 9090 - ``` - -1. Access the console in your browser via http://localhost:9090. From a59d8a10ea71dd64f021087827872fc0f1dbc90d Mon Sep 17 00:00:00 2001 From: upodroid Date: Fri, 15 Oct 2021 19:49:59 +0100 Subject: [PATCH 2/8] add skonto's feedback --- docs/admin/collecting-metrics/README.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/docs/admin/collecting-metrics/README.md b/docs/admin/collecting-metrics/README.md index 6e3d645f0f7..69d51241a29 100644 --- a/docs/admin/collecting-metrics/README.md +++ b/docs/admin/collecting-metrics/README.md @@ -1,6 +1,6 @@ # Collecting Metrics in Knative -Knative offers two solutions for collecting metrics: +Knative offers two popular architectures for collecting metrics: - [Prometheus](https://prometheus.io/) - [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) @@ -10,19 +10,20 @@ You can also set up the OpenTelemetry Collector to receive metrics from Knative ## About Prometheus -[Prometheus](https://prometheus.io/) is an open-source tool for collecting and -aggregating timeseries metrics. It can be used to scrape the OpenTelemetry collector that you created in the previous step. +[Prometheus](https://prometheus.io/) is an open-source tool for collecting, +aggregating timeseries metrics and alerting. It can be used to scrape the OpenTelemetry Collector that you created in the previous step when Prometheus is used a standalone monitoring and alerting system. ## Setting up Prometheus -1. Install the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator/helm) by entering the following command: +1. Install the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator/helm) by using [Helm](https://helm.sh/docs/intro/using_helm/): ```bash helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update - helm install prometheus prometheus-community/kube-prometheus-stack -n default + helm install prometheus prometheus-community/kube-prometheus-stack -n default -f values.yaml + # values.yaml contains at minimum the configuration below ``` - + !!! caution You will need to ensure that the helm chart has following values configured, otherwise the ServiceMonitors/Podmonitors will not work. ```yaml @@ -42,6 +43,11 @@ aggregating timeseries metrics. It can be used to scrape the OpenTelemetry colle ``` 1. Grafana dashboards can be imported from https://github.com/knative-sandbox/monitoring/tree/main/grafana. +1. If you are using the Grafana Helm Chart with the Dashboard Sidecar configured, you can load the dashboards by applying the following configmap. + + ```bash + kubectl apply -f https://raw.githubusercontent.com/knative-sandbox/monitoring/main/grafana/dashboards.yaml + ``` ## About OpenTelemetry From 02511fe457e38d47e5aca44d4195c05b8e413d2a Mon Sep 17 00:00:00 2001 From: upodroid Date: Fri, 15 Oct 2021 19:56:55 +0100 Subject: [PATCH 3/8] adjust indentation --- docs/admin/collecting-metrics/README.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/admin/collecting-metrics/README.md b/docs/admin/collecting-metrics/README.md index 4af023e1750..ca8fbc9a8ee 100644 --- a/docs/admin/collecting-metrics/README.md +++ b/docs/admin/collecting-metrics/README.md @@ -26,17 +26,17 @@ aggregating timeseries metrics and alerting. It can be used to scrape the OpenTe !!! caution You will need to ensure that the helm chart has following values configured, otherwise the ServiceMonitors/Podmonitors will not work. - ```yaml - kube-state-metrics: - metricLabelsAllowlist: - - pods=[*] - - deployments=[app.kubernetes.io/name,app.kubernetes.io/component,app.kubernetes.io/instance] - prometheus: - prometheusSpec: - serviceMonitorSelectorNilUsesHelmValues: false - podMonitorSelectorNilUsesHelmValues: false + ```yaml + kube-state-metrics: + metricLabelsAllowlist: + - pods=[*] + - deployments=[app.kubernetes.io/name,app.kubernetes.io/component,app.kubernetes.io/instance] + prometheus: + prometheusSpec: + serviceMonitorSelectorNilUsesHelmValues: false + podMonitorSelectorNilUsesHelmValues: false -1. Apply the ServiceMonitors/PodMonitors to collect metrics from Knative. +1. Apply the ServiceMonitors/PodMonitors to cqollect metrics from Knative. ```bash kubectl apply -f https://raw.githubusercontent.com/knative-sandbox/monitoring/main/servicemonitor.yaml From e004bea28d47913db9de2e7406d860e341ff5fd3 Mon Sep 17 00:00:00 2001 From: upodroid Date: Fri, 15 Oct 2021 19:59:41 +0100 Subject: [PATCH 4/8] fix trailing spaces --- docs/admin/collecting-metrics/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/admin/collecting-metrics/README.md b/docs/admin/collecting-metrics/README.md index ca8fbc9a8ee..7a8717278ae 100644 --- a/docs/admin/collecting-metrics/README.md +++ b/docs/admin/collecting-metrics/README.md @@ -20,7 +20,7 @@ aggregating timeseries metrics and alerting. It can be used to scrape the OpenTe ```bash helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update - helm install prometheus prometheus-community/kube-prometheus-stack -n default -f values.yaml + helm install prometheus prometheus-community/kube-prometheus-stack -n default -f values.yaml # values.yaml contains at minimum the configuration below ``` @@ -35,14 +35,14 @@ aggregating timeseries metrics and alerting. It can be used to scrape the OpenTe prometheusSpec: serviceMonitorSelectorNilUsesHelmValues: false podMonitorSelectorNilUsesHelmValues: false - + 1. Apply the ServiceMonitors/PodMonitors to cqollect metrics from Knative. ```bash kubectl apply -f https://raw.githubusercontent.com/knative-sandbox/monitoring/main/servicemonitor.yaml ``` 1. Grafana dashboards can be imported from https://github.com/knative-sandbox/monitoring/tree/main/grafana. - + 1. If you are using the Grafana Helm Chart with the Dashboard Sidecar configured, you can load the dashboards by applying the following configmap. ```bash @@ -79,7 +79,7 @@ In the following example, you can configure a single collector instance using a !!! tip For more complex deployments, you can automate some of these steps by using the [OpenTelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator). - + !!! caution The Grafana dashboards at https://github.com/knative-sandbox/monitoring/tree/main/grafana don't work with metrics scraped from OpenTelemetry Collector. From 6f0f4e4f9e8c6061398abcb9a1ba1d241dba2c6a Mon Sep 17 00:00:00 2001 From: upodroid Date: Wed, 20 Oct 2021 23:37:54 +0100 Subject: [PATCH 5/8] apply more recommendations --- docs/admin/collecting-metrics/README.md | 4 +- docs/admin/collecting-metrics/prometheus.yaml | 144 ------------------ 2 files changed, 2 insertions(+), 146 deletions(-) delete mode 100644 docs/admin/collecting-metrics/prometheus.yaml diff --git a/docs/admin/collecting-metrics/README.md b/docs/admin/collecting-metrics/README.md index 7a8717278ae..f711b9ce5ee 100644 --- a/docs/admin/collecting-metrics/README.md +++ b/docs/admin/collecting-metrics/README.md @@ -1,6 +1,6 @@ # Collecting Metrics in Knative -Knative offers two popular architectures for collecting metrics: +Knative supports different popular tools for collecting metrics: - [Prometheus](https://prometheus.io/) - [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) @@ -11,7 +11,7 @@ You can also set up the OpenTelemetry Collector to receive metrics from Knative ## About Prometheus [Prometheus](https://prometheus.io/) is an open-source tool for collecting, -aggregating timeseries metrics and alerting. It can be used to scrape the OpenTelemetry Collector that you created in the previous step when Prometheus is used a standalone monitoring and alerting system. +aggregating timeseries metrics and alerting. It can also be used to scrape the OpenTelemetry Collector that is demonstrated below when Prometheus is used. ## Setting up Prometheus diff --git a/docs/admin/collecting-metrics/prometheus.yaml b/docs/admin/collecting-metrics/prometheus.yaml deleted file mode 100644 index 3b6695daac7..00000000000 --- a/docs/admin/collecting-metrics/prometheus.yaml +++ /dev/null @@ -1,144 +0,0 @@ -apiVersion: v1 -kind: ServiceAccount -metadata: - name: prometheus - namespace: metrics ---- -# Note: For general cluster use, you may want to use a ClusteRole and -# ClusterRoleBinding to grant Prometheus the ability to list all services and -# pods in the cluster. For this use case, we only need to grant access to the -# same namespace, and can use a Role and RoleBinding. -apiVersion: rbac.authorization.k8s.io/v1 -kind: Role -metadata: - name: watch-services-and-pods - namespace: metrics -rules: -- apiGroups: - - "" - resources: - - services - - endpoints - - pods - verbs: ["get", "list", "watch"] ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: RoleBinding -metadata: - name: prom-watch-services-and-pods - namespace: metrics -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: Role - name: watch-services-and-pods -subjects: - - kind: ServiceAccount - name: prometheus ---- -apiVersion: v1 -kind: ConfigMap -metadata: - name: prom-config - namespace: metrics -data: - prometheus.yaml: | - global: - scrape_interval: 30s - scrape_timeout: 10s - evaluation_interval: 30s - - rule_files: - - /etc/prometheus/config/prometheus-rules-*.yaml - - scrape_configs: - - job_name: otel-collector - honor_labels: true - honor_timestamps: true - metrics_path: /metrics - # Note that we *don't want* to use relabel to collect labels here, - # because these are the labels of the opentelemetry collector. - relabel_configs: - - action: keep - source_labels: [__meta_kubernetes_service_label_app] - regex: otel-export - - action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] - regex: prom-export - kubernetes_sd_configs: - - role: endpoints - namespaces: - names: - - metrics - prometheus-rules-example.yaml: | - groups: - - name: example - rules: - - record: pod:http_requests:irate5m - expr: label_replace(rate(knative_dev_internal_serving_revision_app_request_latencies_count[5m]), "service", "$1", "pod_name", "(.*)-deployment-.+-.+") - - record: service:http_requests:irate5m - expr: sum(pod:http_requests:irate5m) by (service) - - record: pod:http_latency:buckets5m - expr: sum(label_replace(rate(knative_dev_internal_serving_revision_app_request_latencies_bucket[5m]), "service", "$1", "pod_name", "(.*)-deployment-.+-.+")) by (pod_name,service,le) - - record: service:http_latency:buckets5m - expr: sum by (service,le)(pod:http_latency:buckets5m) / ignoring(le) group_left service:http_requests:irate5m ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: prometheus - namespace: metrics -spec: - selector: - matchLabels: - app: prometheus - replicas: 1 # Each replica will hold all data in memory. - template: - metadata: - labels: - app: prometheus - spec: - containers: - - name: prometheus - image: quay.io/prometheus/prometheus - args: - - --config.file=/etc/prometheus/config/prometheus.yaml - - --storage.tsdb.path=/prometheus - - --storage.tsdb.retention.time=24h - - --storage.tsdb.no-lockfile - - --web.console.templates=/etc/prometheus/consoles - - --web.console.libraries=/etc/prometheus/console_libraries - - --web.enable-admin-api - - --web.enable-lifecycle - - --web.route-prefix=/ - resources: - # This is a small sizing; adjust as needed for your environment. - requests: - memory: 200Mi - cpu: 50m - ports: - - name: ui - containerPort: 9090 - volumeMounts: - - name: config - mountPath: etc/prometheus/config - - name: prometheus-emptydir - mountPath: /prometheus - volumes: - - name: config - configMap: - name: prom-config - - name: prometheus-emptydir - emptyDir: {} ---- -apiVersion: v1 -kind: Service -metadata: - name: prometheus - namespace: metrics -spec: - selector: - app: prometheus - ports: - - name: ui - port: 9090 - targetPort: 9090 From 0cf580e34faf16fce1c41164a52b6062d3a83cdc Mon Sep 17 00:00:00 2001 From: upodroid Date: Wed, 20 Oct 2021 23:39:42 +0100 Subject: [PATCH 6/8] close yaml block --- docs/admin/collecting-metrics/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/admin/collecting-metrics/README.md b/docs/admin/collecting-metrics/README.md index f711b9ce5ee..b2081df82b3 100644 --- a/docs/admin/collecting-metrics/README.md +++ b/docs/admin/collecting-metrics/README.md @@ -35,6 +35,7 @@ aggregating timeseries metrics and alerting. It can also be used to scrape the O prometheusSpec: serviceMonitorSelectorNilUsesHelmValues: false podMonitorSelectorNilUsesHelmValues: false + ``` 1. Apply the ServiceMonitors/PodMonitors to cqollect metrics from Knative. From a6908a2296a52b248d3f84c64e492955ca0ef50a Mon Sep 17 00:00:00 2001 From: upodroid Date: Thu, 21 Oct 2021 12:45:15 +0100 Subject: [PATCH 7/8] readd deleted file --- .../collecting-metrics/collector.yaml | 96 +++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 docs/admin/observability/collecting-metrics/collector.yaml diff --git a/docs/admin/observability/collecting-metrics/collector.yaml b/docs/admin/observability/collecting-metrics/collector.yaml new file mode 100644 index 00000000000..43a499ac40b --- /dev/null +++ b/docs/admin/observability/collecting-metrics/collector.yaml @@ -0,0 +1,96 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: otel-collector-config + namespace: metrics +data: + collector.yaml: | + receivers: + opencensus: + endpoint: "0.0.0.0:55678" + + exporters: + logging: + prometheus: + endpoint: "0.0.0.0:8889" + extensions: + health_check: + pprof: + zpages: + service: + extensions: [health_check, pprof, zpages] + pipelines: + metrics: + receivers: [opencensus] + processors: [] + exporters: [prometheus] +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: otel-collector + namespace: metrics + labels: + app: otel-collector +spec: + selector: + matchLabels: + app: otel-collector + replicas: 1 # This can be increased for a larger system. + template: + metadata: + labels: + app: otel-collector + spec: + containers: + - name: collector + args: + - --config=/conf/collector.yaml + image: otel/opentelemetry-collector:latest + resources: + requests: # Note: these are suitable for a small instance, but may need to be increased for a large instance. + memory: 100Mi + cpu: 50m + ports: + - name: otel + containerPort: 55678 + - name: prom-export + containerPort: 8889 + - name: zpages # A /debug page + containerPort: 55679 + volumeMounts: + - mountPath: /conf + name: config + volumes: + - name: config + configMap: + name: otel-collector-config + items: + - key: collector.yaml + path: collector.yaml +--- +apiVersion: v1 +kind: Service +metadata: + name: otel-collector + namespace: metrics +spec: + selector: + app: "otel-collector" + ports: + - port: 55678 + name: otel +--- +apiVersion: v1 +kind: Service +metadata: + name: otel-export + namespace: metrics + labels: + app: otel-export +spec: + selector: + app: otel-collector + ports: + - port: 8889 + name: prom-export From 89a9332f9ed154319857d3e935ec007a219f76f2 Mon Sep 17 00:00:00 2001 From: upodroid Date: Thu, 21 Oct 2021 12:53:16 +0100 Subject: [PATCH 8/8] fix errant typo --- .../observability/collecting-metrics/collecting-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/observability/collecting-metrics/collecting-metrics.md b/docs/admin/observability/collecting-metrics/collecting-metrics.md index b0eeeda0b69..62713827baa 100644 --- a/docs/admin/observability/collecting-metrics/collecting-metrics.md +++ b/docs/admin/observability/collecting-metrics/collecting-metrics.md @@ -37,7 +37,7 @@ aggregating timeseries metrics and alerting. It can also be used to scrape the O podMonitorSelectorNilUsesHelmValues: false ``` -1. Apply the ServiceMonitors/PodMonitors to cqollect metrics from Knative. +1. Apply the ServiceMonitors/PodMonitors to collect metrics from Knative. ```bash kubectl apply -f https://raw.githubusercontent.com/knative-sandbox/monitoring/main/servicemonitor.yaml