Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codespell pre-commit and fix revealed errors #2819

Merged
merged 1 commit into from
May 3, 2024

Conversation

LecrisUT
Copy link
Contributor

@LecrisUT LecrisUT commented Apr 4, 2024

When comparing with the sphinx spellchecker (PyEnchant), that one is way more aggressive and does not handle dictionary additions well. On the other hand, this one only checks common spellchecking errors and does not check against an extensive dictionary. I've seen projects use it as a minimum spellchecker. What do you guys think? Check the pre-commit failure for what it picked up

Related to #2661

@lukaszachy
Copy link
Collaborator

If there was something which could handle US English + custom dictionary well. But as a minimal bar it is OK, IMO.

@happz
Copy link
Collaborator

happz commented Apr 4, 2024

A lot of false negatives that would have to be amended somehow, but it seems to be require only. One or two issues with files that shall remain unreadable. Everything else seems to be spot on.

    182 requre ==> require
     12 successfull ==> successful
      9 overriden ==> overridden
      8 re-use ==> reuse
      6 occured ==> occurred
      6 CheckT ==> checked
      6 bellow ==> below
      5 SUCCESSFULL ==> SUCCESSFUL
      4 occurences ==> occurrences
      4 aditional ==> additional
      3 contraint ==> constraint
      2 Unfortunatelly ==> Unfortunately
      2 taylored ==> tailored
      2 reponsible ==> responsible
      2 prefered ==> preferred
      2 originaly ==> originally
      2 discove ==> discover
      2 contraints ==> constraints
      2 consistenly ==> consistently
      2 betwen ==> between
      1 want's ==> wants
      1 visualy ==> visually
      1 unsed ==> unused, unset, used
      1 unparseable ==> unparsable
      1 uniqe ==> unique
      1 Trying next encoding "iso-8859-1"
      1 trough ==> through
      1 tansition ==> transition
      1 tabl ==> table
      1 swaped ==> swapped
      1 supressed ==> suppressed
      1 suppport ==> support
      1 Suport ==> Support
      1 suport ==> support
      1 Successfull ==> Successful
      1 succesful ==> successful
      1 storys ==> stories, storeys
      1 should't ==> shouldn't
      1 shoud ==> should
      1 seprated ==> separated
      1 Requre ==> Require
      1 Reenable ==> Re-enable
      1 readibility ==> readability
      1 priviledged ==> privileged
      1 preferrably ==> preferably
      1 precendence ==> precedence
      1 potentialy ==> potentially
      1 permision ==> permission
      1 parametr ==> parameter
      1 overwritting ==> overwriting
      1 Othwerise ==> Otherwise
      1 offical ==> official
      1 occurence ==> occurrence
      1 noveau ==> nouveau
      1 neccessary ==> necessary
      1 namepace ==> namespace
      1 manualy ==> manually
      1 isues ==> issues
      1 Inteface ==> Interface
      1 intallation ==> installation
      1 instace ==> instance
      1 indention ==> indentation
      1 implemeted ==> implemented
      1 identifer ==> identifier
      1 hierachy ==> hierarchy
      1 HAX ==> HEX
      1 generaly ==> generally
      1 forbiden ==> forbidden
      1 fo ==> of, for, to, do, go
      1 fillowing ==> following
      1 experimeting ==> experimenting
      1 essentialy ==> essentially
      1 environemnt ==> environment
      1 environement ==> environment
      1 Doest ==> Does, Doesn't
      1 documention ==> documentation
      1 direcotory ==> directory
      1 Dicovery ==> Discovery
      1 dependecies ==> dependencies
      1 cought ==> caught, cough, fought
      1 continous ==> continuous
      1 compatiblity ==> compatibility
      1 colorfull ==> colorful, colorfully
      1 clonning ==> cloning
      1 Clenup ==> Cleanup
      1 Cannot decode file using encoding "utf-8": tests/execute/weird/data/weird.txt
      1 burried ==> buried
      1 behing ==> behind, being
      1 availabe ==> available
      1 authetication ==> authentication
      1 authentification ==> authentication
      1 atributes ==> attributes
      1 assignes ==> assigns
      1 arbitratry ==> arbitrary
      1 aquire ==> acquire
      1 alowed ==> allowed
      1 additon ==> addition
      1 Additionaly ==> Additionally
      1 accomodate ==> accommodate

@LecrisUT LecrisUT requested a review from martinhoyer as a code owner April 4, 2024 13:45
@LecrisUT
Copy link
Contributor Author

LecrisUT commented Apr 4, 2024

If there was something which could handle US English + custom dictionary well

That would be more along the lines of PyEnchant. To be fair I only checked the integration with sphinx-spellcheck, which is unusable , e.g. cannot add words with -. Using pure PyEnchant might not be that bad. But at the same time, maybe we should look at a rust-based one since the main dev of the PyEnchant moved to such as well, and there could be a better ruff-like tool in the rust ecosystem. One contenders seem to be skyspell

@martinhoyer
Copy link
Collaborator

I don't know much about spell checking, but one thing I'm curious about is integration of custom dictionaries to IDEs.

In essence, IDE marks a word as potentially wrong and user can mark it as false-positive, which saves the word to a dictionary. That file is then being distributed within the repo, so other people can take advantage of it, as well as add to it and of course it could be used for CI/pre-commit checks.

At least that's how I imagine it. Does it make sense? Can codespell or other tool facilitate it?

@LecrisUT
Copy link
Contributor Author

LecrisUT commented Apr 4, 2024

IDE -> codespell/other tool would be tricky because the IDE would need to know and use the specific tool for that spell checking. I don't see any obvious integration for that in my PyCharm. Technically if everyone is on pycharm the .idea folder can be synced which would contain those, but I think it would be safer to have it in pre-commit so that it is IDE independent.

I have noticed PyCharm has the option to include a specific word-list file, but I can't get it to open the docs/codespell.txt. It also supports hunspell as its configuration, maybe worth looking at what that is?

@martinhoyer
Copy link
Collaborator

As for pycharm, it looks like I can add text file like the codespell.txt you have here, but it needs to to have .dic extension. Don't know how VSCode works.

@LecrisUT
Copy link
Contributor Author

LecrisUT commented Apr 4, 2024

As for pycharm, it looks like I can add text file like the codespell.txt you have here, but it needs to to have .dic extension. Don't know how VSCode works.

Oh interesting, we can definitely rename it. Did you check if it had any effect?

@martinhoyer
Copy link
Collaborator

As for pycharm, it looks like I can add text file like the codespell.txt you have here, but it needs to to have .dic extension. Don't know how VSCode works.

Oh interesting, we can definitely rename it. Did you check if it had any effect?

It works. I cannot add words to it through the Pycharm UI, but it does use those words for spell checking.

sidenote: I've tried to mark the .txt as "spellchecking dictionary file type", but that didn't work. It has to be .dic

@LecrisUT
Copy link
Contributor Author

LecrisUT commented Apr 4, 2024

Oh, also about the configuration:

  --builtin BUILTIN-LIST
                        comma-separated list of builtin dictionaries to include (when "-D -" or no "-D" is passed). Current options are:
                        - 'clear' for unambiguous errors
                        - 'rare' for rare (but valid) words that are likely to be errors
                        - 'informal' for making informal words more formal
                        - 'usage' for replacing phrasing with recommended terms
                        - 'code' for words from code and/or mathematics that are likely to be typos in other contexts (such as uint)
                        - 'names' for valid proper names that might be typos
                        - 'en-GB_to_en-US' for corrections from en-GB to en-US
                        The default is 'clear,rare'.

@happz happz added the documentation Improvements or additions to documentation label Apr 11, 2024
@happz happz added this to the 1.33 milestone Apr 11, 2024
Copy link
Collaborator

@martinhoyer martinhoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it :) Briefly look if there is any suitable alternative to codespell, but haven't found any, so tool choice is great from my PoV.
Added just a couple of comments.

@psss psss changed the title ci: Add codespell pre-commit Add codespell pre-commit Apr 11, 2024
@psss
Copy link
Collaborator

psss commented Apr 11, 2024

@LecrisUT, just a nitpick hint: Could you please not use prefixes for the commits and pull request names? Just to keep the changelog consistent. For pull request categorization labels can be used instead. Thanks.

@LecrisUT
Copy link
Contributor Author

@LecrisUT, just a nitpick hint: Could you please not use prefixes for the commits and pull request names? Just to keep the changelog consistent. For pull request categorization labels can be used instead. Thanks.

Sure. It's a reflex from other projects that prefer such a style. I think for commits I don't have them, but if I do, can you point them out? External contributors do not have access to labels, so would probably have to be other that add the appropriate labels.

@lukaszachy
Copy link
Collaborator

@happz
Copy link
Collaborator

happz commented Apr 25, 2024

+1 from me, all that remains to be resolved is the question of changes to the changelog in the spec file, on which I have no opinion, actually, so hoping for wiser ones to appear, comment and decide :)

@martinhoyer
Copy link
Collaborator

+1 from me, all that remains to be resolved is the question of changes to the changelog in the spec file, on which I have no opinion, actually, so hoping for wiser ones to appear, comment and decide :)

#2880 I'd like to bring the changelog up at nearest opportunity.

@LecrisUT
Copy link
Contributor Author

#2880 I'd like to bring the changelog up at nearest opportunity.

For sure. Do we keep this PR waiting on the specfile to change to %autochangelog? Running codespell after rebasing seems trivial, so I don't mind doing that every now and then until that PR is resolved.

@lukaszachy
Copy link
Collaborator

For sure. Do we keep this PR waiting on the specfile to change to %autochangelog? Running codespell after rebasing seems trivial, so I don't mind doing that every now and then until that PR is resolved.

IMO it would be better to merge this first, spec changes might slip into next release (or not).

Copy link
Collaborator

@thrix thrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LecrisUT nice work, so many typos missed by poor human reviewers!

@martinhoyer
Copy link
Collaborator

For sure. Do we keep this PR waiting on the specfile to change to %autochangelog? Running codespell after rebasing seems trivial, so I don't mind doing that every now and then until that PR is resolved.

IMO it would be better to merge this first, spec changes might slip into next release (or not).

+1, the specfile changes PR, while ready for review, is not trivial and needs a lot of eyes (and testing).

Copy link
Collaborator

@psss psss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick.

@psss psss requested review from martinhoyer and happz May 3, 2024 06:27
@psss psss added the ci | full test Pull request is ready for the full test execution label May 3, 2024
@psss psss changed the title Add codespell pre-commit Add codespell pre-commit and fix revealed errors May 3, 2024
@psss
Copy link
Collaborator

psss commented May 3, 2024

/packit build

@psss psss merged commit 0980bc8 into teemtee:main May 3, 2024
20 checks passed
@psss psss self-assigned this May 3, 2024
@LecrisUT LecrisUT deleted the ci/codespell branch May 6, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci | full test Pull request is ready for the full test execution documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants