Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: metrics #2

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open

feat: metrics #2

wants to merge 45 commits into from

Conversation

MichaelThamm
Copy link
Contributor

@MichaelThamm MichaelThamm commented Feb 6, 2025

This PR will enable the features:

bundle: kubernetes
applications:
  av:
    charm: avalanche-k8s
    channel: latest/stable
    revision: 42
    base: [email protected]/stable
    resources:
      avalanche-image: 18
    scale: 1
    trust: true
  otel:
    charm: ./[email protected]
    resources:
      opentelemetry-collector-image: ghcr.io/canonical/opentelemetry-collector:dev
    scale: 1
    trust: true
  prom:
    charm: prometheus-k8s
    channel: latest/stable
    revision: 226
    base: [email protected]/stable
    resources:
      prometheus-image: 151
    scale: 1
    trust: true
relations:
- - av:metrics-endpoint
  - otel:metrics-endpoint
- - prom:receive-remote-write
  - otel:send-remote-write

The metrics from avalanche are consumed by otel-collector and remote-written to prometheus
image

Further tests

  • Relate otelcol to multiple prometheus units for remote-writing
  • juju refresh otelcol to check that the alerts are refreshed without re-relation
  • juju refresh otelcol to check that the existing alert rules are removed and not leftover

@MichaelThamm
Copy link
Contributor Author

MichaelThamm commented Feb 6, 2025

@lucabello I branched off of your feature/first-implementation branch so I can pull in your changes later. I noticed there were some changes regarding the config builder that you did not push. I am pinging you to show you my intentions and also as a request for an initial review to agree on direction.

This PR mainly contributes this in charm.py

@MichaelThamm MichaelThamm marked this pull request as ready for review February 13, 2025 21:34
@sed-i sed-i changed the base branch from main to feature/first-implementation February 13, 2025 22:31
@sed-i sed-i requested a review from a team as a code owner February 13, 2025 22:31
src/charm.py Outdated
"endpoint": self.remote_write.endpoints[0][
"url"
], # TODO Fix this for scalability
"tls": {"insecure": True},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we plan to support toggling this for when the tls relation is added?

src/charm.py Outdated
rule_file = Path(self.metrics_rules_paths.dest) / f"juju_{topology_identifier}.rules"
rule_file.write_text(yaml.safe_dump(rule))
logger.debug(f"updated alert rules file {rule_file.as_posix()}")
self.remote_write.reload_alerts()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything before this line doesn't really need self and could be a free function. This could also help with not needing to keep remote_write in self.

src/charm.py Outdated
@@ -76,6 +105,70 @@ def _pebble_checks(self) -> Dict[str, Any]:
}
return checks

def _prometheus_scrape(self, config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think _prometheus_scrape and _prometheus_remote_write should be in charm.py; the whole point of having a config manager is to let it handle config stuff and keep it out of the charm

As a consequence, I would move updating alert rules outside this function, and call it directly in reconcile instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand config.py is a "generic" config builder, and most of this method is the receiver config, but if it doesn't belong to either maybe we should think about a third place to put this in?

I think we only need to discuss this in a meeting for me to better understand the design

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the method just handled adding to the config then I would agree but it is a bit more involved. I am happy to discuss in a meeting!

@MichaelThamm MichaelThamm changed the base branch from feature/first-implementation to main February 19, 2025 15:36
Comment on lines +57 to +58
src=charm_root.joinpath(*self._rules_source_path.split("/")),
dest=charm_root.joinpath(*self._rules_destination_path.split("/")),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: self._rules_*_path is used only once. Should we put it here as a string literal instead?

self, alert_rules_path=self.metrics_rules_paths.dest
)

self._reconcile()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to instantiate remote_write and metrics_consumer inside the reconciler (and without making them part of self)?

@@ -8,6 +8,7 @@
class Ports(Enum):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SimpleNamespace a better fit here?

Comment on lines 45 to 57
av_app_name = "avalanche-k8s"
otel_app_name = "otel-collector-k8s"
prom_app_name = "prometheus-k8s"
await ops_test.model.deploy(av_app_name)
await ops_test.model.deploy(charm, otel_app_name, resources=_charm_resources())
await ops_test.model.deploy(prom_app_name, trust=True)
# WHEN they are related to scrape and remote-write
await ops_test.model.integrate(
f"{av_app_name}:metrics-endpoint", f"{otel_app_name}:metrics-endpoint"
)
await ops_test.model.integrate(
f"{otel_app_name}:send-remote-write", f"{prom_app_name}:receive-remote-write"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace this with a literal bundle?

@MichaelThamm
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants