-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automated unit testing on pull requests with GitHub Actions #356
Add automated unit testing on pull requests with GitHub Actions #356
Conversation
3bf0152
to
376a227
Compare
Getting a error about missing |
Looks like a dependency problem with the decorator package. In general, I think we'd do best to set up a proper miniconda environment here, rather than relying on pip. |
bc98222
to
c216467
Compare
Can definitely update this to follow https://github.com/librosa/librosa/blob/main/.github/workflows/ci.yml instead but think there's value in getting a standard |
yield based tests are skipped by pytest: https://github.com/craffel/mir_eval/actions/runs/5236939084/jobs/9454797085#step:5:215 |
Distributing via conda (convincing people to use it) and using conda for internal testing environments are two different things. I was only advocating for the latter here, and the librosa example you point to actually uses pip to install the package being tested.
Yes - migrating from nose to pytest is going to be a pretty hefty endeavor for this reason, and I think should be handled in a separate PR. |
Got it! I'll update this PR to use miniconda accordingly. |
1b69d70
to
0c0f6c1
Compare
0c0f6c1
to
41d42c0
Compare
Initial conda done! 🐍 Changing from
Questions:
Or high-level rather: how closely should mir_eval follow librosa? Identically? Loosely? This package doesn't use Numba or Cython so might be less susceptible to breaking. I'd consider using |
Awesome, thanks for the effort on this!
In general, cross-platform tests are a good idea. However, I also see your point that there's not a lot going on in mir_eval that could break on different platforms (eg cython builds or numba/llvmlite dependencies). There are some points in the tests that might need a little tweaking to run on windows, mainly to do with file globbing and the like, but I'm not too worried about it. My vote for now would be to just test on linux until we have sufficient cause to expand. @craffel what do you think?
To be clear, are you referring to the different python versions (3.7, 3.8, etc) and the minimal environment here? In general, I think it's good to have at least two test builds here: one using pinned minimum versions of the dependencies, and one using the latest stable releases (no upper pin on versions). In librosa, we also fan this out across python versions, but this has more to do with the fact that librosa ships conda packages as well as pypi; because conda packages are explicitly linked to a python version, we need to ensure that the entire dependency stack works in each version, hence separate tests for 3.7, 3.8, 3.9, etc. If mir_eval is not shipping conda packages, this is less important. (That said, I think mir_eval should ship conda packages, but I digress.) We also have a handful of other envs and actions for things like doc builds (we can't use RTD for various reasons), release deployment, etc. These are mostly unrelated to testing, with the exception of the linting action. I made this one a separate action in librosa because it doesn't require installing librosa (and all of its dependencies), only the tools to do static analysis on the code, so the environment is much lighter than the full test env. mir_eval has linting rolled into the main test action (ie in the same env). I think this step should probably be split off into its own job (if not a separate action entirely) so that we can disentangle style checking from correctness checking, but it's not a big deal to leave it as is. I'm happy to go with whatever's easiest for now and refine the workflow later on. |
Yeah, I'm comfortable with only testing on Linux. OTOH I don't even think we have any filesystem access stuff in the core library (just in the tests). |
Actually, I'm going to reverse myself on this. We should definitely include cross-platform tests. While mir_eval itself doesn't have any platform-specific funny business, there have been incidents in the past where platform changes surfaced some nasty reproducibility issues that took me quite a bit of time to pin down. These mainly had to do with differences in the underlying BLAS package between linux and mac (and maybe architecture dependent), causing slight deviations in numerical stability for bss eval. We eventually resolved this by relaxing the tests to the point where these deviations didn't matter - remember kids, never trust decibel measurements past the first decimal place - but having cross-platform tests in place are handy for detecting this kind of thing quickly. |
9221f9d
to
bb19bff
Compare
1. Use both flexible and strict channel priority for conda environment. 2. Include macOS and Windows for latest Python version.
SciPy is already part of install_requires so not necessary to have it in extras_require as well.
fd2c383
to
4ba2070
Compare
Well, that might have been a good call actually because two unit tests fails with OOM on Windows despite running similar code with similar memory availability (hardware specs according to github.com): Doesn't feel obvious to me that this isn't an actual error but could consider tweaking the sizes in here and here to something even shorter but 2147042648 is curiously large so need more digging. Aiming to look more into this tomorrow! |
That looks pretty obviously like an integer overflow, eg triggered here:
These look like sample indices to me. It may also be related to the issue identified in #355 where the |
But why would this only fail on Windows runners? Didn't think conda packages for NumPy/SciPy could behave that differently. Assuming some OpenBLAS vs MKL or gcc vs MSVC differences that a curious soul could dig into. 🤔 Guess this PR is blocked by figuring this out proper first. |
Is it possible the windows build is running 32bit while the others are 64bit, and the integer overflow we're hitting just happens to land in between? |
Hmm, perhaps but supposedly |
Since miniconda hasn't released Python 3.11 support yet as per conda/conda#11170 (comment)
Fishing for ideas in conda-incubator/setup-miniconda#287 |
This might be easier to debug if you add a It might also help to add a step that runs:
just to get the full blas / lapack setup in the test log. |
Hmm, didn't spot anything obviously wrong in the debug logging. How would it feel to mark the Windows runner in the build matrix as excluded for now, with a TODO, and still proceed with merging this PR? Not optimal, but also nice with some incremental progress. |
We could also just leave it in (failing) and merge a fix (#355) later. |
Sounds good! I'm happy enough with this PR for now so if you're up for it feel free to merge. |
Thank you Carl! |
What?
Introduce automated testing on PRs by GitHub Actions.
Why?
To make PR reviews easier by checking that unit tests pass automatically on fresh installations of mir_eval, as discussed in #354