Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible incorrect formatting from remove tabs #60

Closed
SteveBarnes-BH opened this issue Jan 20, 2023 · 13 comments
Closed

Possible incorrect formatting from remove tabs #60

SteveBarnes-BH opened this issue Jan 20, 2023 · 13 comments

Comments

@SteveBarnes-BH
Copy link

If lines contain spaces followed by tabs then replacing with a fixed number of spaces will miss-align which can be a problem for some file types, e.g. fi a file contains 2 lines such as:
"
Line 1
Line 2
"
Then visually, with tab spacing of 4, it will be:

    Line 1
    Line 2

But after tab replacement will become:

    Line 1
     Line 2

Suggest adding:

def line_tab_strip(line: bytes, whitespaces_count: int) -> bytes:
    """Remove tabs and replace with whitespaces_count spaces maintaining alignment"""
    spaces = b" " * whitespaces_count
    return spaces.join(sect.strip(b' ') for sect in line.split(b"\t"))

And changing the line:

    lines = [line.replace(b'\t', b' ' * whitespaces_count) for line in lines]

With something like:

    lines = [line_tab_strip(line, whitespaces_count) for line in lines]

This should give the expected alignment even on multi column lines such as: \tCol 1\t Col 2 \tCol 3 \n

@Lucas-C
Copy link
Owner

Lucas-C commented Jan 20, 2023

Hi @SteveBarnes-BH!

Interestng, thank you for reporting this.

Would you like to submit a PR with a small unit test covering this case,
in order to fix this bug?

@SteveBarnes-BH
Copy link
Author

Patch for possible test to show the issue:

--- a/tests/remove_tabs_test.py                                                
+++ b/tests/remove_tabs_test.py                                                
@@ -6,8 +6,9 @@ from pre_commit_hooks.remove_tabs import main as remove_tabs   
 @pytest.mark.parametrize(                                                     
     ('input_s', 'expected'),                                                  
     (                                                                         
-        ('foo \t\nbar', 'foo     \nbar'),                                     
+        ('foo \t\nbar', 'foo    \nbar'),                                      
         ('bar\n\tbaz\n', 'bar\n    baz\n'),                                   
+        ('foo \tfoo2\nbar', 'foo    foo2\nbar'),                              
     ),                                                                        
 )                                                                             
 def test_remove_tabs(input_s, expected, tmpdir):                              

@Lucas-C
Copy link
Owner

Lucas-C commented Jan 20, 2023

Alright, thanks!

I can apply those patchs myself, thank you very much for providing them 😊
But would you like to contribute directly through a Pull Request on this GitHub repo?

Some contributors sometimes like to have their name in the repo commits history,
and I just want to let you decide what you prefer.

@SteveBarnes-BH
Copy link
Author

Since this is a work account I an raise a bug report or suggestion but contributing code requires a pile of paperwork :-( in advance hence the patch in the issue report ;-) I hope that you understand.

@Lucas-C
Copy link
Owner

Lucas-C commented Jan 20, 2023

Sure, I can perfectly understand! No worries 😊
I'm going to apply your suggestions

@Lucas-C
Copy link
Owner

Lucas-C commented Jan 20, 2023

I checked more closely your suggested patch on tests/remove_tabs_test.py,
and I think that in fact there may be no bug there.

If you consider the input string foo \t\nbar,
you can see that a whitespace character is present before the \t,
explaining why the output string is foo \nbar with 5 whitespace characters,
after running remove-tabs on it.

@SteveBarnes-BH
Copy link
Author

But visually foo \tbar and foo\tbar will be identical in most editors so the result of foo bar with 5 spaces will be unexpected. This is due to tab alignment.

@SteveBarnes-BH
Copy link
Author

Strictly each tab should align the text that follows it to the next whitespaces_count after its position so my solution above, while it will work for many cases is still not perfect.

@SteveBarnes-BH
Copy link
Author

I think that a much better approach would be:

lines = lines.expandtabs(whitespaces_count)

This should align tabs correctly so that: b' \tA\ttabbed \tstring\twith\tspaces\n\tAn\tother\ttabbed\tline\there!\n'
Gives: b' A tabbed string with spaces\n An other tabbed line here!\n'
Which renders to:

    A   tabbed  string  with    spaces
    An  other   tabbed  line    here! 

@Lucas-C
Copy link
Owner

Lucas-C commented Jan 20, 2023

I think I now better understand your point.

.expandtabs() is a very good suggestion, sadly 'foo \t\nbar'.expandtabs(4)
produces 'foo \nbar' with 5 whitespace characters

@Lucas-C
Copy link
Owner

Lucas-C commented Jan 20, 2023

I opened PR #61: what do you think of it @SteveBarnes-BH?

@GadgetSteve
Copy link
Contributor

Now that I am on my own machine & time I can go into a little more detail. As an old timer who first learnt to use tabs on a mechanical typewriter, (at school I was made to do secretarial studies in an attempt to improve my spelling & learnt a lot just not spelling), my understanding of what is expected with tabs matches the explanation here.
So if you have a tab size of 4, (the most common for programming), this corresponds to setting tab positions like:
---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---
So a test sample such as:
b'No leading\ttab\n\tleading\ttab\n \tSpace then\tTab\nTabs\tbetween\tevery\tword\nSpace \tthen \ttab \tbetween \tevery \tword.'
Would be visually in your editor or word processor (with fixed width fonts) rendered as below _putting the ruler in as well:

---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---T---
No leading  tab
    leading tab
    Space then  Tab
Tabs    between every   word    in  the line.
Space   then    tab     between     every   word    in  the     line.

This is what the built in string/bytes method .expandtabs(4) gives.

In my personal opinion it is best if the hook leaves the file visually unchanged as, if the originator of the file used a mix of spaces and tabs to align the words in a specific way, then they would expect the output of the hook to look the same.

GadgetSteve added a commit to GadgetSteve/pre-commit-hooks-1 that referenced this issue Jan 21, 2023
@Lucas-C
Copy link
Owner

Lucas-C commented Jan 21, 2023

Thank you for the detailed explanation & PR! 😊

I now understand why .expandtabs() behaves this way:

  • input 'foo \t' -> output 'foo '
  • input 'fo \t' -> output 'fo '

Alright, I'm going to merge your PR #62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants