Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Adds the pretrained module from allennlp-hub. #42

Merged
merged 11 commits into from
May 8, 2020
Merged

Adds the pretrained module from allennlp-hub. #42

merged 11 commits into from
May 8, 2020

Conversation

dirkgr
Copy link
Member

@dirkgr dirkgr commented May 4, 2020

These tests might not succeed because of the disk space issue.

Finishes allenai/allennlp#3302.
Closes allenai/allennlp#3965.

@dirkgr dirkgr requested a review from epwalsh May 4, 2020 23:57
@dirkgr
Copy link
Member Author

dirkgr commented May 5, 2020

Our organization of pre-trained models is still a total mess. We have three places now where we have the URLs. Here, and twice in allennlp-models. They don't all agree, and they don't all have the same models. We will have to fix this soon. It's confusing.

@epwalsh
Copy link
Member

epwalsh commented May 8, 2020

Looks like the tests you wrote aren't found by pytest. I think by default pytest assumes test classes are named Test* instead of *Test. But if we just add this line to the pytest.ini file it will find them.

@epwalsh
Copy link
Member

epwalsh commented May 8, 2020

Another thing to consider is that we'll be re-downloading these models on each CI run, which will run up our cloud bill. If we used a self-hosted runner for the pretrained tests we wouldn't have this problem since the models would be cached locally.

return predictor


def named_entity_recognition_with_elmo_peters_2018() -> SentenceTaggerPredictor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to import the crf tagger module for this to work

# But default we don't run these tests
@pytest.mark.skipif(
not os.environ.get("ALLENNLP_MODELS_RUN_PRETRAINED_TEST"), reason="requires massive downloads"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GPU tests are selected in a different way, aren't they? Why isn't this done the same way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yea, we should probably run those on the self-hosted runner as well

Copy link
Member

@epwalsh epwalsh May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I realize your question now. I felt like there should be an explicit skip condition here because if you were running tests locally you might not want to download these massive files, and I didn't know what that condition could be other than an environment variable flag

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we could also add a custom mark like we do for GPU tests and then exclude those by default. That might be cleaner

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're inconsistent about how we handle GPU tests. Some have @pytest.mark.gpu, others use skipif and detect how many GPUs there are. Others yet have both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skipif ensures they're never accidentally run if you don't have a GPU available. The mark.gpu is just so that we can select only GPU tests to run

@dirkgr
Copy link
Member Author

dirkgr commented May 8, 2020

I think this is good to go then?

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go. I'll follow up with a PR to add a GPU testing workflow

@dirkgr dirkgr merged commit 96d623c into master May 8, 2020
@dirkgr dirkgr deleted the Pretrained branch May 8, 2020 20:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move pretrained stuff from -hub to -models
2 participants