-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: metrics #2
base: main
Are you sure you want to change the base?
Conversation
@lucabello I branched off of your This PR mainly contributes this in charm.py |
…to feature/metrics
…to feature/metrics
…orter add methods
…to feature/metrics
src/charm.py
Outdated
"endpoint": self.remote_write.endpoints[0][ | ||
"url" | ||
], # TODO Fix this for scalability | ||
"tls": {"insecure": True}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we plan to support toggling this for when the tls relation is added?
src/charm.py
Outdated
rule_file = Path(self.metrics_rules_paths.dest) / f"juju_{topology_identifier}.rules" | ||
rule_file.write_text(yaml.safe_dump(rule)) | ||
logger.debug(f"updated alert rules file {rule_file.as_posix()}") | ||
self.remote_write.reload_alerts() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything before this line doesn't really need self
and could be a free function. This could also help with not needing to keep remote_write
in self
.
src/charm.py
Outdated
@@ -76,6 +105,70 @@ def _pebble_checks(self) -> Dict[str, Any]: | |||
} | |||
return checks | |||
|
|||
def _prometheus_scrape(self, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think _prometheus_scrape
and _prometheus_remote_write
should be in charm.py
; the whole point of having a config manager is to let it handle config stuff and keep it out of the charm
As a consequence, I would move updating alert rules outside this function, and call it directly in reconcile
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand config.py
is a "generic" config builder, and most of this method is the receiver config, but if it doesn't belong to either maybe we should think about a third place to put this in?
I think we only need to discuss this in a meeting for me to better understand the design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the method just handled adding to the config then I would agree but it is a bit more involved. I am happy to discuss in a meeting!
src=charm_root.joinpath(*self._rules_source_path.split("/")), | ||
dest=charm_root.joinpath(*self._rules_destination_path.split("/")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: self._rules_*_path
is used only once. Should we put it here as a string literal instead?
self, alert_rules_path=self.metrics_rules_paths.dest | ||
) | ||
|
||
self._reconcile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to instantiate remote_write
and metrics_consumer
inside the reconciler (and without making them part of self
)?
@@ -8,6 +8,7 @@ | |||
class Ports(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is SimpleNamespace a better fit here?
tests/integration/test_metrics.py
Outdated
av_app_name = "avalanche-k8s" | ||
otel_app_name = "otel-collector-k8s" | ||
prom_app_name = "prometheus-k8s" | ||
await ops_test.model.deploy(av_app_name) | ||
await ops_test.model.deploy(charm, otel_app_name, resources=_charm_resources()) | ||
await ops_test.model.deploy(prom_app_name, trust=True) | ||
# WHEN they are related to scrape and remote-write | ||
await ops_test.model.integrate( | ||
f"{av_app_name}:metrics-endpoint", f"{otel_app_name}:metrics-endpoint" | ||
) | ||
await ops_test.model.integrate( | ||
f"{otel_app_name}:send-remote-write", f"{prom_app_name}:receive-remote-write" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace this with a literal bundle?
Some useful metrics here: |
This PR will enable the features:
The metrics from avalanche are consumed by otel-collector and remote-written to prometheus
data:image/s3,"s3://crabby-images/0786f/0786f6180383810d681fb56e885c801371b72faa" alt="image"
Further tests
juju refresh otelcol
to check that the alerts are refreshed without re-relationjuju refresh otelcol
to check that the existing alert rules are removed and not leftover