Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automated unit testing on pull requests with GitHub Actions #356

Merged

Conversation

carlthome
Copy link
Contributor

What?

Introduce automated testing on PRs by GitHub Actions.

Why?

To make PR reviews easier by checking that unit tests pass automatically on fresh installations of mir_eval, as discussed in #354

@carlthome carlthome force-pushed the add-github-actions-test-workflow branch 2 times, most recently from 3bf0152 to 376a227 Compare June 9, 2023 13:11
@carlthome
Copy link
Contributor Author

@carlthome carlthome changed the title Migrate pull request automated testing from Travis CI to GitHub Actions Add automated unit testing on pull requests with GitHub Actions Jun 9, 2023
@bmcfee
Copy link
Collaborator

bmcfee commented Jun 9, 2023

Getting a error about missing decorator: https://github.com/craffel/mir_eval/actions/runs/5222270760/jobs/9427537252#step:7:27

Looks like a dependency problem with the decorator package. In general, I think we'd do best to set up a proper miniconda environment here, rather than relying on pip.

@carlthome carlthome force-pushed the add-github-actions-test-workflow branch from bc98222 to c216467 Compare June 11, 2023 18:33
@carlthome
Copy link
Contributor Author

carlthome commented Jun 11, 2023

In general, I think we'd do best to set up a proper miniconda environment here, rather than relying on pip.

Can definitely update this to follow https://github.com/librosa/librosa/blob/main/.github/workflows/ci.yml instead but think there's value in getting a standard pip setup to work since conda is a bit more involved and harder to convince people to use.

@carlthome
Copy link
Contributor Author

@bmcfee
Copy link
Collaborator

bmcfee commented Jun 12, 2023

but think there's value in getting a standard pip setup to work since conda is a bit more involved and harder to convince people to use.

Distributing via conda (convincing people to use it) and using conda for internal testing environments are two different things. I was only advocating for the latter here, and the librosa example you point to actually uses pip to install the package being tested.

yield based tests are skipped by pytest:

Yes - migrating from nose to pytest is going to be a pretty hefty endeavor for this reason, and I think should be handled in a separate PR.

@carlthome
Copy link
Contributor Author

Distributing via conda (convincing people to use it) and using conda for internal testing environments are two different things. I was only advocating for the latter here, and the librosa example you point to actually uses pip to install the package being tested.

Got it! I'll update this PR to use miniconda accordingly.

@carlthome carlthome force-pushed the add-github-actions-test-workflow branch from 1b69d70 to 0c0f6c1 Compare June 12, 2023 23:03
@carlthome carlthome force-pushed the add-github-actions-test-workflow branch from 0c0f6c1 to 41d42c0 Compare June 12, 2023 23:04
@carlthome
Copy link
Contributor Author

carlthome commented Jun 13, 2023

Initial conda done! 🐍

Changing from actions/setup-python to conda-incubator/setup-miniconda made build times go up quite a bit however:

Questions:

  1. Should the build matrix include Linux, macOS and Windows?
  2. Should multiple conda environments and dependency resolution strategies be tested (as in librosa/.github)?

Or high-level rather: how closely should mir_eval follow librosa? Identically? Loosely? This package doesn't use Numba or Cython so might be less susceptible to breaking. I'd consider using actions/setup-python because of that, but probably missing some important details about maintenance over time.

@carlthome carlthome marked this pull request as ready for review June 13, 2023 00:05
@bmcfee
Copy link
Collaborator

bmcfee commented Jun 13, 2023

Awesome, thanks for the effort on this!

Questions:

1. Should the build matrix include Linux, macOS and Windows?

In general, cross-platform tests are a good idea. However, I also see your point that there's not a lot going on in mir_eval that could break on different platforms (eg cython builds or numba/llvmlite dependencies).

There are some points in the tests that might need a little tweaking to run on windows, mainly to do with file globbing and the like, but I'm not too worried about it.

My vote for now would be to just test on linux until we have sufficient cause to expand. @craffel what do you think?

2. Should multiple conda environments and dependency resolution strategies be tested (as in [librosa/.github](https://github.com/librosa/librosa/tree/main/.github))?

Or high-level rather: how closely should mir_eval follow librosa? Identically? Loosely?

To be clear, are you referring to the different python versions (3.7, 3.8, etc) and the minimal environment here? In general, I think it's good to have at least two test builds here: one using pinned minimum versions of the dependencies, and one using the latest stable releases (no upper pin on versions).

In librosa, we also fan this out across python versions, but this has more to do with the fact that librosa ships conda packages as well as pypi; because conda packages are explicitly linked to a python version, we need to ensure that the entire dependency stack works in each version, hence separate tests for 3.7, 3.8, 3.9, etc. If mir_eval is not shipping conda packages, this is less important. (That said, I think mir_eval should ship conda packages, but I digress.)

We also have a handful of other envs and actions for things like doc builds (we can't use RTD for various reasons), release deployment, etc. These are mostly unrelated to testing, with the exception of the linting action. I made this one a separate action in librosa because it doesn't require installing librosa (and all of its dependencies), only the tools to do static analysis on the code, so the environment is much lighter than the full test env.

mir_eval has linting rolled into the main test action (ie in the same env). I think this step should probably be split off into its own job (if not a separate action entirely) so that we can disentangle style checking from correctness checking, but it's not a big deal to leave it as is. I'm happy to go with whatever's easiest for now and refine the workflow later on.

@craffel
Copy link
Collaborator

craffel commented Jun 13, 2023

Yeah, I'm comfortable with only testing on Linux. OTOH I don't even think we have any filesystem access stuff in the core library (just in the tests).

@bmcfee
Copy link
Collaborator

bmcfee commented Jun 13, 2023

However, I also see your point that there's not a lot going on in mir_eval that could break on different platforms (eg cython builds or numba/llvmlite dependencies).

Actually, I'm going to reverse myself on this. We should definitely include cross-platform tests.

While mir_eval itself doesn't have any platform-specific funny business, there have been incidents in the past where platform changes surfaced some nasty reproducibility issues that took me quite a bit of time to pin down. These mainly had to do with differences in the underlying BLAS package between linux and mac (and maybe architecture dependent), causing slight deviations in numerical stability for bss eval.

We eventually resolved this by relaxing the tests to the point where these deviations didn't matter - remember kids, never trust decibel measurements past the first decimal place - but having cross-platform tests in place are handy for detecting this kind of thing quickly.

@carlthome carlthome force-pushed the add-github-actions-test-workflow branch 2 times, most recently from 9221f9d to bb19bff Compare June 13, 2023 22:15
1. Use both flexible and strict channel priority for conda environment.
2. Include macOS and Windows for latest Python version.
SciPy is already part of install_requires so not necessary to have it
in extras_require as well.
@carlthome carlthome force-pushed the add-github-actions-test-workflow branch from fd2c383 to 4ba2070 Compare June 13, 2023 23:38
@carlthome
Copy link
Contributor Author

Actually, I'm going to reverse myself on this. We should definitely include cross-platform tests.

Well, that might have been a good call actually because two unit tests fails with OOM on Windows despite running similar code with similar memory availability (hardware specs according to github.com):

https://github.com/craffel/mir_eval/actions/runs/5261403240/jobs/9509466272?pr=356#step:6:592

Doesn't feel obvious to me that this isn't an actual error but could consider tweaking the sizes in here and here to something even shorter but 2147042648 is curiously large so need more digging. Aiming to look more into this tomorrow!

@bmcfee
Copy link
Collaborator

bmcfee commented Jun 14, 2023

Doesn't feel obvious to me that this isn't an actual error but could consider tweaking the sizes in here and here to something even shorter but 2147042648 is curiously large so need more digging.

That looks pretty obviously like an integer overflow, eg triggered here:

 a = array([    441000,     441001,     441002, ..., 2147483645, 2147483646,
       2147483647])

These look like sample indices to me. It may also be related to the issue identified in #355 where the length field of time_frequency was not handled correctly.

@carlthome
Copy link
Contributor Author

But why would this only fail on Windows runners? Didn't think conda packages for NumPy/SciPy could behave that differently. Assuming some OpenBLAS vs MKL or gcc vs MSVC differences that a curious soul could dig into. 🤔

Guess this PR is blocked by figuring this out proper first.

@bmcfee
Copy link
Collaborator

bmcfee commented Jun 14, 2023

But why would this only fail on Windows runners?

Is it possible the windows build is running 32bit while the others are 64bit, and the integer overflow we're hitting just happens to land in between?

@carlthome
Copy link
Contributor Author

Is it possible the windows build is running 32bit while the others are 64bit, and the integer overflow we're hitting just happens to land in between?

Hmm, perhaps but supposedly ubuntu-latest and windows-latest both run on x86_64 as per the guide so would that be if miniconda is installed differently with conda-incubator/setup-miniconda depending on OS?

@carlthome
Copy link
Contributor Author

carlthome commented Jun 15, 2023

Well, it states "win32" here actually so perhaps, but x64 is specified here so perhaps something funky with the setup-miniconda action.

Since miniconda hasn't released Python 3.11 support yet as per conda/conda#11170 (comment)
@carlthome
Copy link
Contributor Author

Fishing for ideas in conda-incubator/setup-miniconda#287

@bmcfee
Copy link
Collaborator

bmcfee commented Jun 16, 2023

This might be easier to debug if you add a conda info step to the job, as that will explicitly report the archspec of the environment.

It might also help to add a step that runs:

python -c "import numpy; numpy.show_config()"

just to get the full blas / lapack setup in the test log.

@carlthome
Copy link
Contributor Author

Hmm, didn't spot anything obviously wrong in the debug logging.

How would it feel to mark the Windows runner in the build matrix as excluded for now, with a TODO, and still proceed with merging this PR? Not optimal, but also nice with some incremental progress.

@bmcfee
Copy link
Collaborator

bmcfee commented Jun 21, 2023

We could also just leave it in (failing) and merge a fix (#355) later.

@carlthome
Copy link
Contributor Author

Sounds good! I'm happy enough with this PR for now so if you're up for it feel free to merge.

@craffel
Copy link
Collaborator

craffel commented Jun 22, 2023

Thank you Carl!

@craffel craffel merged commit 93f2741 into mir-evaluation:master Jun 22, 2023
@carlthome carlthome deleted the add-github-actions-test-workflow branch June 22, 2023 18:47
@bmcfee bmcfee added this to the 0.8 milestone Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants