Test dataset downloadability on a schedule #2617

pmeier · 2020-08-25T13:28:47Z

Currently we rely on user feedback to detect failing dataset downloads. In #2610 I've included a downloadability test by asserting a head request to every URL used in the dataset is successful. We should expand these test for all downloadable datasets.

That being said, I don't think we should run this as part of every PR or push, but rather on a schedule (for example daily). In that case we need some way to inform us about failed tests.

cc @seemethere @pmeier

fmassa · 2020-08-25T14:13:54Z

@seemethere how do you think would be a good way for a scheduled CI job to signal us that a test stopped working (due to an external server going down / stopping to work)? Sending an e-mail is one option, but is there a better one?

seemethere · 2020-09-09T17:37:26Z

We could also possibly have it just a send a message in a slack channel as well.

That way community members as well as maintainers can have a way to see the signal in a more public space.

Another good middle ground is to have a github action that posts a comment to a specific issue every time a dataset is not able to be downloaded

pmeier · 2020-09-09T17:55:58Z

I'm currently looking into probot to do this. @seemethere Do you know if CircleCI can fire a webhook if it fails?

seemethere · 2020-09-09T18:46:53Z

It's entirely possible, but we haven't had much success trying to create an alerting system through CircleCI

pmeier · 2020-09-10T05:18:07Z

A simple solution could be this https://github.com/JasonEtco/create-an-issue. If we test the download as part of a GitHub Actions workflow, this could simply create an issue from a given template if the workflow fails.

@seemethere @fmassa Is GitHub Actions a alternative to CircleCI for this?

pmeier · 2020-09-10T06:52:09Z

I've created a proof of concept repo. With this minimal setup it will create an issue like this https://github.com/pmeier/test-issue-on-fail/issues/9 every time the workflow fails. I think that might already be enough.

pmeier · 2020-09-14T10:27:47Z

Re-opening this, since #2665 only laid the ground work, but not the actual testing on a schedule.

fmassa · 2020-09-14T12:17:53Z

I've created a proof of concept repo. With this minimal setup it will create an issue like this pmeier/test-issue-on-fail#9 every time the workflow fails. I think that might already be enough.

I think this looks pretty good! One thing to check is to see if the bot creates a repeated issue every day if the CI is not fixed the day it fails.

pmeier · 2020-09-14T12:39:46Z

Unfortunately, right now it will. I have more time at the end of September / early October to build a proper bot. Until then, we have stick to close it manually. That being said, with the retry functionality and wait time between the individual requests, I don't think this will fail often. Actual dead links or broken downloads are quite rare.

Should I send a PR for this?

fmassa · 2020-09-14T12:43:50Z

Hum, that would be a bit annoying to be spammed by a known-problem by the bot. But I suppose having an initial PR would be good to have

pmeier · 2020-09-14T12:46:04Z

Given that I'm responsible for the datasets, it will mostly spam me 😉

pmeier added enhancement module: ci module: datasets module: tests needs discussion labels Aug 25, 2020

pmeier mentioned this issue Sep 10, 2020

Split off dataset download tests #2665

Merged

2 tasks

fmassa closed this as completed in #2665 Sep 14, 2020

pmeier reopened this Sep 14, 2020

pmeier mentioned this issue Sep 14, 2020

run download tests on a daily schedule #2675

Merged

fmassa closed this as completed in #2675 Sep 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test dataset downloadability on a schedule #2617

Test dataset downloadability on a schedule #2617

pmeier commented Aug 25, 2020 •

edited by pytorch-probot bot

Loading

fmassa commented Aug 25, 2020

seemethere commented Sep 9, 2020 •

edited

Loading

pmeier commented Sep 9, 2020

seemethere commented Sep 9, 2020

pmeier commented Sep 10, 2020

pmeier commented Sep 10, 2020

pmeier commented Sep 14, 2020

fmassa commented Sep 14, 2020 •

edited

Loading

pmeier commented Sep 14, 2020

fmassa commented Sep 14, 2020

pmeier commented Sep 14, 2020

Test dataset downloadability on a schedule #2617

Test dataset downloadability on a schedule #2617

Comments

pmeier commented Aug 25, 2020 • edited by pytorch-probot bot Loading

fmassa commented Aug 25, 2020

seemethere commented Sep 9, 2020 • edited Loading

pmeier commented Sep 9, 2020

seemethere commented Sep 9, 2020

pmeier commented Sep 10, 2020

pmeier commented Sep 10, 2020

pmeier commented Sep 14, 2020

fmassa commented Sep 14, 2020 • edited Loading

pmeier commented Sep 14, 2020

fmassa commented Sep 14, 2020

pmeier commented Sep 14, 2020

pmeier commented Aug 25, 2020 •

edited by pytorch-probot bot

Loading

seemethere commented Sep 9, 2020 •

edited

Loading

fmassa commented Sep 14, 2020 •

edited

Loading