Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlertNotifications: Translate notifications IDs to UIDs in Rule builder #19882

Merged
merged 13 commits into from
Mar 18, 2020

Conversation

aSapien
Copy link
Contributor

@aSapien aSapien commented Oct 17, 2019

What this PR does / why we need it:
Fixes alert notifications for dashboards that were saved with notification IDs

Which issue(s) this PR fixes:
Fixes #19771

Special notes for your reviewer:
I wasn't sure if rule.go is the correct place for this fix, but it seemed the easiest to put there. Please let me know if there's a better approach to fixing this.

Bug trace:
Currently, IDs are parsed, converted to strings and included in the same slice (of UIDs):

if id, err := jsonModel.Get("id").Int64(); err == nil {
model.Notifications = append(model.Notifications, fmt.Sprintf("%09d", id))
} else {
uid, err := jsonModel.Get("uid").String()
if err != nil {
return nil, ValidationError{Reason: "Neither id nor uid is specified in 'notifications' block, " + err.Error(), DashboardID: model.DashboardID, AlertID: model.ID, PanelID: model.PanelID}
}
model.Notifications = append(model.Notifications, uid)
}

In a later stage they are used in

func (n *notificationService) getNeededNotifiers(orgID int64, notificationUids []string, evalContext *EvalContext) (notifierStateSlice, error) {
query := &models.GetAlertNotificationsWithUidToSendQuery{OrgId: orgID, Uids: notificationUids}

Which in turn triggers a query, however, the SQL query expects UIDs and doesn't consider "stringified" IDs:

sql.WriteString(` OR alert_notification.uid IN (?` + strings.Repeat(",?", len(query.Uids)-1) + ")")

This causes notification entities for IDs to not be returned and therefore to not be triggered.

@CLAassistant
Copy link

CLAassistant commented Oct 17, 2019

CLA assistant check
All committers have signed the CLA.

@aSapien
Copy link
Contributor Author

aSapien commented Oct 17, 2019

@torkelo

@torkelo
Copy link
Member

torkelo commented Oct 22, 2019

How are you getting alert rules with notification ids? new rules should use uids. Only legacy rules should still use ids, and we convert those to uids same way we migrated them in the DB

@aSapien
Copy link
Contributor Author

aSapien commented Oct 22, 2019

@torkelo We have system that generates dashboards with ID based notification channels

@aSapien
Copy link
Contributor Author

aSapien commented Oct 22, 2019

Is there a migration for dashboards that point to notifications by the old IDs? I'm only seeing the migration of the alerts_notifications table:

mg.AddMigration("Add column uid in alert_notification", NewAddColumnMigration(alert_notification, &Column{
Name: "uid", Type: DB_NVarchar, Length: 40, Nullable: true,
}))
mg.AddMigration("Update uid column values in alert_notification", new(RawSqlMigration).

@papagian papagian added the pr/external This PR is from external contributor label Oct 31, 2019
@stale
Copy link

stale bot commented Dec 12, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 2 weeks. It will be closed in 30 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale Issue with no recent activity label Dec 12, 2019
@Tim-Schwalbe
Copy link

Tim-Schwalbe commented Jan 6, 2020

any plans to solve this issue on the code base? I think a lot of people will run into this problem.
I have updated from 6.0.2 and luckily recognize the issue, but it could be a huge trap for others.

I am also not sure how to solve the issue on for all dashboards. can someone point out which ID, UID need to be used for the migration?

@stale stale bot removed the stale Issue with no recent activity label Jan 6, 2020
@stale
Copy link

stale bot commented Jan 20, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 2 weeks. It will be closed in 30 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale Issue with no recent activity label Jan 20, 2020
@marefr marefr requested a review from papagian January 29, 2020 14:30
@stale stale bot removed the stale Issue with no recent activity label Jan 29, 2020
Copy link
Contributor

@papagian papagian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!
Please check my comments for some suggested improvements.

@aSapien
Copy link
Contributor Author

aSapien commented Feb 6, 2020

@papagian thank you for the comments. I'll try to complete the needed changes soon.

@aSapien aSapien requested a review from papagian February 9, 2020 14:24
Copy link
Contributor

@papagian papagian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for the modifications.
I have a minor comment.

@aSapien aSapien requested a review from papagian February 10, 2020 14:50
Copy link
Contributor

@papagian papagian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!
I have a minor request before merging, please check my inline comment!

Additionally, it would be nice to do some UID caching.
Therefore I would recommend to change the GetAlertNotifications to save the result in the cache and check whether it exists in the cache before running the actual query. Here you can find an example of using the cache.

@aSapien
Copy link
Contributor Author

aSapien commented Feb 13, 2020

@papagian are you referring to caching the id->uid translation result? This makes perfect sense.

@papagian
Copy link
Contributor

@papagian are you referring to caching the id->uid translation result? This makes perfect sense.

yes! sorry for not being clear

@aSapien
Copy link
Contributor Author

aSapien commented Feb 13, 2020

great! I was concerned about caching values that might change in runtime 😅

@aSapien
Copy link
Contributor Author

aSapien commented Feb 21, 2020

@papagian I'm not sure why mysql-integration-test failed but it looks like a timeout issue. Is there a way to run it locally? I can't re-run from CircleCI

@papagian
Copy link
Contributor

@aSapien
It seems to be a timeout. I have rerun the CI workflow and hopefully it will be ok.
For your information for running mysql tests you can try:
./scripts/circle-test-mysql.sh
which actually sets environmental variable GRAFANA_TEST_DB=mysql and on top of this using the integration build tag for running integration tests.

@aSapien
Copy link
Contributor Author

aSapien commented Feb 24, 2020

@papagian Yep, it's fine now, but failing on grafana-docker-ubuntu-pr which has nothing to do with this PR. Please review :)

@papagian
Copy link
Contributor

papagian commented Feb 24, 2020

@aSapien
FYI there is a fix for grafana-docker-ubuntu-pr in the master branch so for overcoming it you need to sync your branch with the master.
But I will review it anyway tomorrow.

Copy link
Contributor

@papagian papagian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thank you for the modifications.
I have some suggestions for improving the tests.
Let me know if you want to work on them or you would prefer me to do it.

@@ -321,5 +321,54 @@ func TestAlertNotificationSQLAccess(t *testing.T) {
So(len(query.Result), ShouldEqual, 4)
})
})

Convey("Notification Uid by Id Caching", func() {
ss.CacheService.Flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer mocking the actual cache (and reverting after execution) instead of flushing:

actualCache := ss.CacheService
ss.CacheService = localcache.New(5*time.Minute, 10*time.Minute)
defer func() {
    ss.CacheService = actualCache
}()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@papagian I saw that some tests initialize a separate TestDb for isolated test scopes. Perhaps I should use this technique instead? I like it because it's less verbose.

Convey("Should enable all users", func() {
ss = InitTestDB(t)
createFiveTestUsers(func(i int) *models.CreateUserCommand {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then there's not even a need to flush :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you can do that instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that important but my last impression was that you would use a separate test database for these tests and you would remove flushing.

@aSapien
Copy link
Contributor Author

aSapien commented Feb 27, 2020

@papagian thanks for the comments! I'll do the modifications. They help me learn idiomatic Golang :)

@aSapien aSapien requested a review from a team as a code owner March 11, 2020 11:40
Copy link
Contributor

@papagian papagian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modifications look good to me only check my last comment.

@@ -321,5 +321,54 @@ func TestAlertNotificationSQLAccess(t *testing.T) {
So(len(query.Result), ShouldEqual, 4)
})
})

Convey("Notification Uid by Id Caching", func() {
ss.CacheService.Flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that important but my last impression was that you would use a separate test database for these tests and you would remove flushing.

@aSapien
Copy link
Contributor Author

aSapien commented Mar 15, 2020

@papagian Thanks for pointing it out! It must've slipped my mind somehow... I made the necessary change

@papagian papagian added this to the 7.0 milestone Mar 17, 2020
@aknuds1 aknuds1 closed this Mar 17, 2020
@aknuds1 aknuds1 reopened this Mar 17, 2020
@papagian
Copy link
Contributor

Thank you for contributing!

@papagian
Copy link
Contributor

@aSapien could you synchronise your branch with master in order to trigger running the new build-pipeline workflow and be able to merge it?
I can do it myself if you prefer but I'm wasn't sure since the PR is targeting your repository's master.

@aSapien
Copy link
Contributor Author

aSapien commented Mar 18, 2020

@papagian yeah working on a fork's master was poor choice. I synced the branches :)

@papagian papagian merged commit 44b7f3e into grafana:master Mar 18, 2020
bergquist added a commit that referenced this pull request Mar 18, 2020
* master: (113 commits)
  AlertNotifications: Translate notifications IDs to UIDs in Rule builder (#19882)
  CircleCI: Don't build Storybook on PR (#22865)
  Graphite: Rollup Indicator (#22738)
  Plugins: Return jsondetails as an json object instead of raw json on datasource healthchecks. (#22859)
  Backend plugins: Exclude plugin metrics in Grafana's metrics endpoint (#22857)
  Graphite: Fixed issue with query editor and next select metric now showing after selecting metric node  (#22856)
  Webpack: Fix webpack for enterprise (#22863)
  Metrics: Storybook documented components  (#22854)
  Search: Improve tags layout , #22804 (#22830)
  Stackdriver: Fix GCE auth bug when creating new data source (#22836)
  @grafana/runtime: Add cancellation of queries to DataSourceWithBackend (#22818)
  Rich history: Test coverage (#22852)
  Chore: Support Volta in package.json (#22849)
  CircleCI: Skip enterprise builds for forked PRs (#22851)
  Toolkit: docker image for plugin CI process (#22790)
  Revert "Explore: Add test coverage for Rich history (#22722)" (#22850)
  Datasource config was not mapped for datasource healthcheck (#22848)
  upgrades plugin sdk to 0.30.0 (#22846)
  Explore: Add test coverage for Rich history (#22722)
  Rich History: UX adjustments and fixes (#22729)
  ...
bergquist added a commit to AndrewBurian/grafana that referenced this pull request Mar 19, 2020
* master: (98 commits)
  NewPanelEdit: Refactor value mappings UI to work better with new panel edit (grafana#22808)
  FieldOverrides: Move FieldConfigSource from fieldOptions to PanelModel.fieldConfig (grafana#22600)
  Reporting: Update docs with correct logger name (grafana#22892)
  Design tweaks (grafana#22886)
  Remove duplicated localStorage mock (grafana#22872)
  Rich history UX fixes (grafana#22783)
  Storybook: Solve deployment issues (grafana#22873)
  Docs: Update templating.md (grafana#22881)
  AzureMonitor: support workspaces function for template variables (grafana#22882)
  Check if the datasource is of type loki using meta.id instead of name. (grafana#22877)
  Prometheus: Render missing labels in legend formats as an empty string (grafana#22355)
  BarGauge: Fix strict null that breaks Storybook build (grafana#22871)
  SQLStore: Add migration for adding index on annotation.alert_id (grafana#22876)
  Docs: Update export-pdf.md (grafana#22767)
  Variables: adds missing feature toggle in DashboardModel (grafana#22868)
  Devenv: adds grafana block with a customizeable version (grafana#22867)
  Alerting: support alerting on data.Frame (that can be time series) (grafana#22812)
  AlertNotifications: Translate notifications IDs to UIDs in Rule builder (grafana#19882)
  CircleCI: Don't build Storybook on PR (grafana#22865)
  Graphite: Rollup Indicator (grafana#22738)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dashboards saved with numeric IDs for alert notifications are not triggering notifications
7 participants