Setup alerting for connector builds #4736
Labels
area/connectors
Connector related issues
autoteam
priority/high
High priority
team/extensibility
type/enhancement
New feature or request
Tell us about the problem you're trying to solve
#4720 is a perfect illustration of what this issue is trying to solve: periodic connector builds were not running reliably for most connectors (but were still running via the
/test
and/publish
commands). This resulted in a few obvious issues:The crucial thing we need to solve here is that a build pipeline for a connector should never be blocked for a prolonged amount of time as it prevents the ability to roll out critical fixes in a timely manner. It also wastes dev time and allows tech debt in connector builds to accumulate.
Describe the solution you’d like
I want to receive an alert proactively if a connector build is failing or missing repeatedly.
Alerting Triggers
We should have two levels of alerting on our connector builds:
Alerting format
Ideally, the format of the alerts is more of a report. In the future, we'll probably want to auto-create tickets. But we need to iterate on this report first to fine tune it before spamming github issues.
Cadence: I suggest we report on a cadence e.g: every 12 or 24 hours. This is probably the most actionable interval on which we can report. Otherwise it will be too noisy and we'll stop listening to the alert.
Delivery mechanism: Slack might be the easiest place to receive this to ensure visibility for the whole team. If we go with Slack, we should create a dedicated channel.
Report format
It should highlight:
posting this as a vanilla slack message is fine. It doesn't need to have fancy formatting.
Implementation
This can be implemented as a periodic Github action that runs once or twice per day. The Github action can probably run a Python script which performs this monitoring as needed and send the alerts.
Acceptance Criteria
The report mentioned above is sent to a dedicated slack channel once every 24 hours
The text was updated successfully, but these errors were encountered: