-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add edge observability #713
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.OpenSSF Scorecard
Scanned Files
|
6e19fe1
to
466eba7
Compare
.with_boundaries(vec![ | ||
1.0, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 100.0, 200.0, 300.0, 400.0, 500.0, 750.0, | ||
1000.0, 1500.0, 2000.0, | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one used 0.0 as its first bucket, which since we're operating with time, we don't have negative numbers.
337e957
to
e064c77
Compare
server/src/http/instance_data.rs
Outdated
.unleash_client | ||
.send_instance_data(observed_data, &instance_data_sender.token) | ||
.await; | ||
match status { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal taste but I think if let Err
is a nice construct than an empty Ok match. You could replace this with
if let Err(e) = status {
match e {
EdgeError::EdgeMetricsRequestError(status, message) => {
warn!("Failed to post instance data with status {status} and {message:?}");
if status == StatusCode::NOT_FOUND {
debug!("Upstream edge metrics not found, clearing our data about downstream instances to avoid growing to infinity (and beyond!).");
empty = true;
do_the_work = false;
} else if status == StatusCode::FORBIDDEN {
warn!("Upstream edge metrics rejected our data, clearing our data about downstream instances to avoid growing to infinity (and beyond!)");
empty = true;
do_the_work = false;
}
}
_ => {
warn!("Failed to post instance data due to unknown error {e:?}");
empty = false;
}
}
}
server/src/http/instance_data.rs
Outdated
loop { | ||
let mut empty = true; | ||
tokio::time::sleep(std::time::Duration::from_secs(60)).await; | ||
if let Some(instance_data_sender) = instance_data_sender.clone() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function is pretty nested. Again personal opinion, but if we never stored the data in if we couldn't send it, then we could use
let Some(instance_data_sender) = instance_data_sender.clone() else {
continue;
};
to flatten this out a little
} | ||
} | ||
|
||
pub async fn send_instance_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give this a different name? We now have two functions called the same thing doing different things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Ok(()) | ||
} else { | ||
match result.status() { | ||
StatusCode::BAD_REQUEST => Err(EdgeMetricsRequestError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this specific check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because copy-pasta from earlier request. I still like it though, because it exposes full message if we receive 400 from upstream, so we know what we need to fix.
server/src/metrics/edge_metrics.rs
Outdated
pub p99: f64, | ||
} | ||
|
||
impl LatencyMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is already implemented for this struct, this shouldn't be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
server/src/metrics/edge_metrics.rs
Outdated
started: Utc::now(), | ||
traffic: InstanceTraffic::default(), | ||
latency_upstream: UpstreamLatency::default(), | ||
connected_edges: Vec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not married to it but I'm so used to seeing the macro invocation over explicit new here
connected_edges: Vec::new(), | |
connected_edges: vec![], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, fixed
f3060e5
to
c802591
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Draft state: get latency for own endpoints from prometheus get latency for upstream endpoints from prometheus get process stats from prometheus instantiate on startup
9b8636c
to
1f2778d
Compare
No description provided.