Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dan/top infra errors #3195

Merged
merged 6 commits into from
Mar 27, 2020
Merged

Dan/top infra errors #3195

merged 6 commits into from
Mar 27, 2020

Conversation

danielsdeleo
Copy link
Contributor

@danielsdeleo danielsdeleo commented Mar 24, 2020

🔩 Description: What code changed, and why?

API for the top Chef Infra errors. Aggregates the most common error by class+message among the most recent chef runs across the fleet. The nodes considered in the aggregation can be filtered with the same filters as other node APIs (except status is not allowed because the aggregation only considers nodes with "failed" status).

⛓️ Related Resources

👍 Definition of Done

👟 How to Build and Test the Change

Generating usable test data is most easily done with the changes in #3213 -- the instructions here assume that's merged or you make a combo branch or get those changes some other way.

  • restart your studio, or at least source .studio/automate-gateway -- this is used for the data generation step.
  • Build automate-gateway, ingest and config-management services, start all services
  • You need sample data with different error classes and messages. The following script will generate usable data in a randomized way:
for _ in {1..500} ; do
  m=$((RANDOM % 4))
  n=$((RANDOM % 25))
  generate_chef_run_failure_example | \
    jq --arg msg "Error $n occurred" --arg type "Chef::ExampleError$m" '.error.message = $msg | .error.class = $type' | \
    send_chef_data_raw
done
  • You can then call the api with curl -f --insecure -H "api-token: $(get_admin_token)" "$GATEWAY_URL/cfgmgmt/errors" | jq .

✅ Checklist

@danielsdeleo danielsdeleo force-pushed the dan/top-infra-errors branch 4 times, most recently from d4f5e25 to c03071f Compare March 26, 2020 18:43
message Errors {
// The number of results to return.
// If set to zero, the default size of 10 will be used. Set to a negative
// value for unlimited results.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐️

// },
// "aggs": {
// "group_by_error_type_and_message": {
// "composite": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANK YOU for all the descriptive comments, this is awesome!

Copy link

@vjeffrey vjeffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐️ thank you!!

Copy link
Contributor

@lancewf lancewf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually tested this and got through most of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants