Pre-commit (dev workflow) #1563

ivirshup · 2021-01-04T05:01:20Z

I'd like to start using pre-commit with scanpy and anndata. Pre-commit is essentially a tool that manages scripts we'd like to run before each commit, e.g. linting and formatting, so it becomes essentially impossible to forget these.

I think this can allow PRs to progress faster since it gives us a way to codify formatting requirements – so we don't have to remember them – and have these checks happen locally – so we don't have to wait on CI. Of course, having these checks run depends on developers installing pre-commit, so we can also run these checks on CI (example ci script, example run).

There is a question of what things we'd like to add here. For sure: black. I think import checks (e.g. no unused imports) and flake8 would be good too. We can also add custom checks for things like slow imports.

I think this would be a good time to run black over the whole codebase so we don't have exempted files any more.

My questions for the dev team:

Does this sound good?
Do you have more ideas for checks/ tools?

@michalk8, I saw you added this to squidpy. How's the experience been there – that is, any major foot guns we should look out for? Also, are there any tools you're using (beyond the basic black, isort, flake8) you'd especially recommend?

The text was updated successfully, but these errors were encountered:

LuckyMD · 2021-01-04T12:23:24Z

I like this idea a lot. I recently learned about automated linting and blacking of code per commit and have started using it for the single cell open problems project upon the suggestion of @scottgigante.

michalk8 · 2021-01-04T16:11:09Z

@ivirshup

I'd recommend these:

mypy for type checking (I've caught a few bugs using it)
autoflake to remove unused variables/imports (by default removes only imports from Python's standard library)
yesqa to automatically remove unused # noqa for flake8
check-yaml and pretty-format-yaml I also found useful
requirements-txt-fixer to sort the reqs.
check-added-large-files have prevented my from commiting some .h5ad files that shouldn't be there
rstcheck to check the syntax of .rst files
some flake8: pytest-style (enforces some pytest conventions, like non-returning fixtures beginning with _), blind-except (no blanket exceptions) and comprehensions (helps you rewrite comprehensions)

Here's also an exhaustive list from which I picked the ones I use: https://pre-commit.com/hooks.html

As for any problems, some of them came from rstcheck as
docs/source/classes.rst:9: (INFO/1) No directive entry for "autoclass" in module "docutils.parsers.rst.languages.en"., that's why .rstcheck.cfg might be necessary. Also fixing types for mypy takes a while, I'd do it as last.
Also flake8-comprehensions forces you to abandon dict(foo=..., bar=...) calls, unless you disable it.
And [tool.isort] in pyproject.toml should have profile = "black".

I had also planed to use pylint to enforce naming conventions, etc., but it has large overlap with flake8 + the config file can be rather large, see https://gist.github.com/dkatz23238/08d79a76911ddaaa6171dff9cddd5772

Koncopd · 2021-01-04T16:14:10Z

I really don't want it to be mandatory. It is not a rational objection, i just prefer my code to be committed as i wrote it and then run some formatting scripts separately if needed.

ivirshup · 2021-01-05T02:49:08Z

@michalk8 thanks for the extensive recommendations!

I think I'd like to keep the number of tools used small. It's the worst when you want to fix a bug, but instead have to learn about configuring a linter. More tools means more configurations people need to be familiar with, and the goal is reducing cognitive load.

Also fixing types for mypy takes a while, I'd do it as last.

Yeah, I figured this would be the case. Does mypy allow partial typing these days? Also, I haven't found the numpy or pandas type stubs to always be great. Have you run into problems around this?

I think this would also need to wait at least until we can drop python 3.6 for anndata, since adding types there currently means circular dependencies.

rstcheck to check the syntax of .rst files

I would particularly like a linter for rst. I noticed you also had doc8, but you'd recommend rstcheck check over this? I'm a little worried, considering its last release was over a year ago.

Spell check for prose in doc-strings could also be great, but I could see this being overzealous (is there a good way to notify about misspelled words, while not being annoying about technical terms?)

I'm a little worried about some custom sphinx extensions we have, and conflicting with this, any experience here?

@Koncopd, I think I agree with your concern, as I said above: it's the worst when you want to fix a bug, but instead have to learn about configuring a linter. I also think it's very easy to add new checks, so someone complaining about new ones is valuable.

Per commit, this should always be an option with git commit --no-verify, though you could also just not install pre-commit.

I would like to keep the required checks limited, ideally formatting tasks that can be automated as opposed "this is poor style" warnings. I also know these tools can be wrong (e.g. black when expression's have many operators, sometimes with chaining) so it would be good to document the escape hatches for this (# fmt: off for black).

That said, we already do require that merged code goes through black before it gets merged, and a benefit of using this would be to not have commit messages like "formatting", "remove unused import", etc. The pre-commit checks would be a part of CI as well, so it would be eventually mandatory – just not on your machine.

Does this address your concerns?

LuckyMD · 2021-01-05T10:25:39Z

I think I misunderstood... I thought you were suggesting things like autoblack in github actions for commits.

giovp · 2021-01-11T09:50:42Z

If I can jump on this, talking by personal experience, it would be a very very useful tool for contributors, especially young/inexperienced ones (like me!).

In squidpy @michalk8 put together a very comprehensive check list in pre-commits, and I'm appreciating it more and more as I get familiar with it.
yes, there is a lot of cognitive load at the beginning, and yes it can be very (very) painful, but when you get used to it, it soon becomes essential and actually really useful.

Only concern of course is that it highers the bar for contributions in the repo, but honestly I'm seeing it being adopted in other large bio-related oss (e.g. https://github.com/napari/napari ). I think this can be simplified by having an extensive contributors guide, and the explicit mention on how to skip pre-commits and submit the PR anyway (and then otehr scanpy dev can jump in and give suggestions on why precommits failed).

flying-sheep · 2021-01-11T12:01:36Z

@Koncopd pre-commit doesn’t have to be configured, you can choose not to enable it. Of course tests will fail then.

regarding mypy: I guess it’s possible to make everything return -> t.Any: # TODO fix typing

I like isort!

Also we should get #1527 in before doing any big restructuring: It’s been through too many rounds of delays and I had to resolve conflicts so many times.

ivirshup · 2021-02-11T06:07:53Z

I think it would be nice to start small, then gradually add to this. I'd nominate black as the first step, then using pre-commit on CI, then adding more tools. I like black for a starting place since we can just run it over both dev and stable branches and be confident in nothing breaking. For anything that requires more manual intervention, I think we'll have to wait until close to a new stable branch being cut so we don't have to worry about conflicts during backports.

I had also thought isort could be a good starting place, but it might actually be some work to turn on due to "partially initialized module" errors (imperative programming strikes again!).

Some initial settings for `isort`

[tool.isort]
profile = "black"
multi_line_output = 3
skip = "scanpy/__init__.py"

giovp · 2021-02-23T13:00:38Z

can we get at least flake8 also?

scottgigante · 2021-02-23T14:32:44Z

I just added black, isort, and autopep8 to scprep which worked pretty seamlessly.
pre-commit config: https://github.com/KrishnaswamyLab/scprep/blob/dev/.pre-commit-config.yaml
precommit github action: https://github.com/KrishnaswamyLab/scprep/blob/dev/.github/workflows/pre-commit.yml

ivirshup · 2021-02-24T07:59:24Z

I've just merged initial pre-commit stuff via #1684 (just black). Once the doc builds propagate, there will be a section under: "Getting set up" in the dev docs on how to install.

I would like to see a flake8 PR, though it might take a bit of time to hash out configuration. Maybe @giovp or @Zethson would be interested in looking into this?

Zethson · 2021-02-24T09:04:44Z

Yeah, I can add flake8 and some more stuff

Koncopd · 2021-02-24T13:46:19Z

I'd prefer to have only black on pre-commit for some time.

Zethson · 2021-02-24T13:48:03Z

I'd prefer to have only black on pre-commit for some time.

flake8 just checks and does not format. It makes aware of some nasty code and these things should get fixed immediately. I strongly advocate for it.

Zethson · 2021-02-24T13:55:30Z

I had also thought isort could be a good starting place, but it might actually be some work to turn on due to "partially initialized module" errors (imperative programming strikes again!).

Yeah, I ran into this stuff when creating the flake8 PR (#1689). isort is dangerous with Scanpy and requires good testing and many exceptions.

ivirshup · 2021-02-28T07:55:45Z

@Koncopd, I would agree, but there have a been a few bugs recently that flake8 definitely would have caught. This is stuff like "parameters are never used".

Rules that I think are useful:

Catch bugs
Can be automatically applied/ prevent "remove whitespace/ formatting" commits.

If you run into any rules you don't like, you can just add them to the ignore and we can deal with it during review.

Does that sound reasonable?

flying-sheep · 2021-03-01T10:37:32Z

Flake8 can indeed be buggy, but the rules are very modular and can be turned off.

Even if we end up with only “unused import” and “unused parameter” being left, it’s a win.

flying-sheep · 2021-03-01T10:41:29Z

One thing I noticed: The pre-commit GH workflow for a commit I made seems to be queued for quite a while, doesn’t appear in the list of checks, and the PR thinks all checks have run.

This doesn’t seem ideal, as PRs can slip in where it will only run (and then fail) after the PR is merged.

Can we configure it to be mandatory? Will that change anything?

ivirshup · 2021-03-01T11:18:54Z

Github actions are down. It seems like they have problems at least once a week: https://twitter.com/githubstatus.

I'd be fine with having pre-commit be a required check. I'm a little antsy about having something that goes down frequently be required though.

flying-sheep · 2021-05-17T12:47:24Z

double-quote-string-fixer could be used to keep single quotes with black!

Zethson · 2021-05-17T12:57:05Z

double-quote-string-fixer could be used to keep single quotes with black!

While true, I think we should just stick to Black's default (although the 88 line limit is still killing me).
I would add:

check-added-large-files
end-of-file-fixer
trailing-whitespace
check-yaml
check-toml

and maaaaybe
https://github.com/prettier/pre-commit

ivirshup · 2021-05-18T03:32:20Z

I'd be up for all of the numbered ones. IIRC, I had some issues with the trailing whitespace/ end of file fixers and some binary files/ csvs in the test suite. I'm a bit worried about false positives with check-large-files, but so long as it's easy to allow certain things (e.g. intentionally added test data) it should be fine.

In terms of breaking these things down into small tasks/ PRs how about: (1), (2, 3), (4, 5)?

prettier looks a bit heavy and like it's targeting a lot of stuff we don't use, so you'd have to make a good case.

ivirshup · 2021-05-26T09:30:14Z

I think we've done enough to mark this as closed. For any additional checks, please open a new issue.

flying-sheep · 2021-05-26T10:37:49Z

I think we should just stick to Black's default

we’re not, we skip string normalization entirely: https://github.com/theislab/scanpy/blob/4dd8de9e355ce8d59009c15522670a3d0970462c/pyproject.toml#L133

giovp · 2021-06-29T09:20:03Z

does anybody knows of a pre-commit check that checks for spelling in docstrings? I think that would be very cool to have.

ivirshup added the enhancement label Jan 4, 2021

ivirshup added this to the 1.8.0 milestone Feb 4, 2021

giovp mentioned this issue Feb 10, 2021

Improve documentation for developers #1419

Closed

giovp mentioned this issue Feb 23, 2021

enables highly_variable_genes_seurat_v3 to accept pseudocounts #1679

Merged

Zethson mentioned this issue May 21, 2021

Add more pre-commit checks #1848

Merged

Zethson closed this as completed May 26, 2021

flying-sheep removed the Enhancement ✨ label Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-commit (dev workflow) #1563

Pre-commit (dev workflow) #1563

ivirshup commented Jan 4, 2021

LuckyMD commented Jan 4, 2021

michalk8 commented Jan 4, 2021 •

edited

Loading

Koncopd commented Jan 4, 2021

ivirshup commented Jan 5, 2021

LuckyMD commented Jan 5, 2021

giovp commented Jan 11, 2021

flying-sheep commented Jan 11, 2021

ivirshup commented Feb 11, 2021 •

edited

Loading

giovp commented Feb 23, 2021

scottgigante commented Feb 23, 2021

ivirshup commented Feb 24, 2021

Zethson commented Feb 24, 2021

Koncopd commented Feb 24, 2021

Zethson commented Feb 24, 2021

Zethson commented Feb 24, 2021 •

edited by ivirshup

Loading

ivirshup commented Feb 28, 2021

flying-sheep commented Mar 1, 2021

flying-sheep commented Mar 1, 2021 •

edited

Loading

ivirshup commented Mar 1, 2021

flying-sheep commented May 17, 2021

Zethson commented May 17, 2021

ivirshup commented May 18, 2021

ivirshup commented May 26, 2021

flying-sheep commented May 26, 2021

giovp commented Jun 29, 2021 •

edited

Loading

Pre-commit (dev workflow) #1563

Pre-commit (dev workflow) #1563

Comments

ivirshup commented Jan 4, 2021

LuckyMD commented Jan 4, 2021

michalk8 commented Jan 4, 2021 • edited Loading

Koncopd commented Jan 4, 2021

ivirshup commented Jan 5, 2021

LuckyMD commented Jan 5, 2021

giovp commented Jan 11, 2021

flying-sheep commented Jan 11, 2021

ivirshup commented Feb 11, 2021 • edited Loading

giovp commented Feb 23, 2021

scottgigante commented Feb 23, 2021

ivirshup commented Feb 24, 2021

Zethson commented Feb 24, 2021

Koncopd commented Feb 24, 2021

Zethson commented Feb 24, 2021

Zethson commented Feb 24, 2021 • edited by ivirshup Loading

ivirshup commented Feb 28, 2021

flying-sheep commented Mar 1, 2021

flying-sheep commented Mar 1, 2021 • edited Loading

ivirshup commented Mar 1, 2021

flying-sheep commented May 17, 2021

Zethson commented May 17, 2021

ivirshup commented May 18, 2021

ivirshup commented May 26, 2021

flying-sheep commented May 26, 2021

giovp commented Jun 29, 2021 • edited Loading

michalk8 commented Jan 4, 2021 •

edited

Loading

ivirshup commented Feb 11, 2021 •

edited

Loading

Zethson commented Feb 24, 2021 •

edited by ivirshup

Loading

flying-sheep commented Mar 1, 2021 •

edited

Loading

giovp commented Jun 29, 2021 •

edited

Loading