Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test dataset downloadability on a schedule #2617

Closed
pmeier opened this issue Aug 25, 2020 · 11 comments · Fixed by #2665 or #2675
Closed

Test dataset downloadability on a schedule #2617

pmeier opened this issue Aug 25, 2020 · 11 comments · Fixed by #2665 or #2675

Comments

@pmeier
Copy link
Collaborator

pmeier commented Aug 25, 2020

Currently we rely on user feedback to detect failing dataset downloads. In #2610 I've included a downloadability test by asserting a head request to every URL used in the dataset is successful. We should expand these test for all downloadable datasets.

That being said, I don't think we should run this as part of every PR or push, but rather on a schedule (for example daily). In that case we need some way to inform us about failed tests.

cc @seemethere @pmeier

@fmassa
Copy link
Member

fmassa commented Aug 25, 2020

@seemethere how do you think would be a good way for a scheduled CI job to signal us that a test stopped working (due to an external server going down / stopping to work)? Sending an e-mail is one option, but is there a better one?

@seemethere
Copy link
Member

seemethere commented Sep 9, 2020

We could also possibly have it just a send a message in a slack channel as well.

That way community members as well as maintainers can have a way to see the signal in a more public space.

Another good middle ground is to have a github action that posts a comment to a specific issue every time a dataset is not able to be downloaded

@pmeier
Copy link
Collaborator Author

pmeier commented Sep 9, 2020

I'm currently looking into probot to do this. @seemethere Do you know if CircleCI can fire a webhook if it fails?

@seemethere
Copy link
Member

It's entirely possible, but we haven't had much success trying to create an alerting system through CircleCI

@pmeier
Copy link
Collaborator Author

pmeier commented Sep 10, 2020

A simple solution could be this https://github.com/JasonEtco/create-an-issue. If we test the download as part of a GitHub Actions workflow, this could simply create an issue from a given template if the workflow fails.

@seemethere @fmassa Is GitHub Actions a alternative to CircleCI for this?

@pmeier
Copy link
Collaborator Author

pmeier commented Sep 10, 2020

I've created a proof of concept repo. With this minimal setup it will create an issue like this https://github.com/pmeier/test-issue-on-fail/issues/9 every time the workflow fails. I think that might already be enough.

@pmeier
Copy link
Collaborator Author

pmeier commented Sep 14, 2020

Re-opening this, since #2665 only laid the ground work, but not the actual testing on a schedule.

@pmeier pmeier reopened this Sep 14, 2020
@fmassa
Copy link
Member

fmassa commented Sep 14, 2020

I've created a proof of concept repo. With this minimal setup it will create an issue like this pmeier/test-issue-on-fail#9 every time the workflow fails. I think that might already be enough.

I think this looks pretty good! One thing to check is to see if the bot creates a repeated issue every day if the CI is not fixed the day it fails.

@pmeier
Copy link
Collaborator Author

pmeier commented Sep 14, 2020

Unfortunately, right now it will. I have more time at the end of September / early October to build a proper bot. Until then, we have stick to close it manually. That being said, with the retry functionality and wait time between the individual requests, I don't think this will fail often. Actual dead links or broken downloads are quite rare.

Should I send a PR for this?

@fmassa
Copy link
Member

fmassa commented Sep 14, 2020

Hum, that would be a bit annoying to be spammed by a known-problem by the bot. But I suppose having an initial PR would be good to have

@pmeier
Copy link
Collaborator Author

pmeier commented Sep 14, 2020

Given that I'm responsible for the datasets, it will mostly spam me 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants