Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup alerting for connector builds #4736

Closed
sherifnada opened this issue Jul 13, 2021 · 2 comments
Closed

Setup alerting for connector builds #4736

sherifnada opened this issue Jul 13, 2021 · 2 comments
Assignees
Labels
area/connectors Connector related issues autoteam priority/high High priority team/extensibility type/enhancement New feature or request

Comments

@sherifnada
Copy link
Contributor

Tell us about the problem you're trying to solve

#4720 is a perfect illustration of what this issue is trying to solve: periodic connector builds were not running reliably for most connectors (but were still running via the /test and /publish commands). This resulted in a few obvious issues:

  1. One PR made changes to the destination acceptance tests and caused snowflake build to fail (no bugs introduced, just build failing). This meant that two weeks later when we needed to release a fix for snowflake, the one-line change was delayed by a few hours just to understand and debug the underlying issue. If this were a critical bug, this would have been a nightmare.
  2. Another PR made a backwards breaking change to a private method in the Python CDK. Turns out this private method was used by a connector (square). This caused the build to fail. It was only fixed when a developer happened to notice it.

The crucial thing we need to solve here is that a build pipeline for a connector should never be blocked for a prolonged amount of time as it prevents the ability to roll out critical fixes in a timely manner. It also wastes dev time and allows tech debt in connector builds to accumulate.

Describe the solution you’d like

I want to receive an alert proactively if a connector build is failing or missing repeatedly.

Alerting Triggers

We should have two levels of alerting on our connector builds:

  1. If a connector's build has not run in more than 2 days, we should be alerted
  2. If a connector's build has been failing for more than 2 days, we should be alerted

Alerting format

Ideally, the format of the alerts is more of a report. In the future, we'll probably want to auto-create tickets. But we need to iterate on this report first to fine tune it before spamming github issues.

Cadence: I suggest we report on a cadence e.g: every 12 or 24 hours. This is probably the most actionable interval on which we can report. Otherwise it will be too noisy and we'll stop listening to the alert.

Delivery mechanism: Slack might be the easiest place to receive this to ensure visibility for the whole team. If we go with Slack, we should create a dedicated channel.

Report format
It should highlight:

  • which connector builds have been failing or missing
  • for how long they've been failing
  • include a link to the build page for each failing connector

posting this as a vanilla slack message is fine. It doesn't need to have fancy formatting.

Implementation

This can be implemented as a periodic Github action that runs once or twice per day. The Github action can probably run a Python script which performs this monitoring as needed and send the alerts.

Acceptance Criteria

The report mentioned above is sent to a dedicated slack channel once every 24 hours

@sherifnada sherifnada added type/enhancement New feature or request area/connectors Connector related issues priority/high High priority labels Jul 13, 2021
@sherifnada
Copy link
Contributor Author

@midavadim should we close this issue?

@girarda
Copy link
Contributor

girarda commented Dec 22, 2022

closing because we have daily reports

@girarda girarda closed this as completed Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues autoteam priority/high High priority team/extensibility type/enhancement New feature or request
Projects
No open projects
Archived in project
Development

No branches or pull requests

4 participants