Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 CRLF line endings are stripped from diff #754

Open
plgruener opened this issue Oct 29, 2021 · 11 comments · May be fixed by #1000
Open

🐛 CRLF line endings are stripped from diff #754

plgruener opened this issue Oct 29, 2021 · 11 comments · May be fixed by #1000

Comments

@plgruener
Copy link

I'm trying to show CRLF line endings in my diffs (on Linux, to quickly spot if someone committed a file with CRLF instead of LF). Less has the option -u/--underline-special to do just that, so I could just use that as pager for delta. However, delta always seems to strip the carriage return byte from diffs, see:

delta-CRLF

Here is a hexdump of both diffs. You can see the CR-bytes (0x0d) are missing.
delta-CRLF-hexdump2

Even if delta strips the ANSI color escape codes (which is turned off here with --raw), it should never just delete any other character from the output. Let the pager decide to show those control chars or not.

@dandavison
Copy link
Owner

Thanks @plgruener. Do you have time to look at #664 and help me decide what the correct course of action is? In that PR, code was added that deliberately strips trailing CR. cc @norisio in case you have time to advise also.

@yoichi
Copy link
Contributor

yoichi commented Feb 27, 2022

#664 is not the only cause. byte_lines may also strip CR.
The correct course will be

  • prevent byte_lines removing CR
  • strip CR only if delta writes to console

@yoichi
Copy link
Contributor

yoichi commented Feb 27, 2022

strip CR only if delta writes to console

I've tried this by applying sample code in https://rosettacode.org/wiki/Check_output_device_is_a_terminal#Rust
But it didn't work with git add -p.

To find a solution, I think it would be better if I understand the cause of #664 a little more.

CR will be emitted directly into a terminal (at least when delta used through git add -p). Thus the rendered result will be blank (CR erases the line).

This (=CR erases the line) does not happen with the default pager. Why does this happen with delta?

@dandavison
Copy link
Owner

dandavison commented Feb 27, 2022

This (=CR erases the line) does not happen with the default pager. Why does this happen with delta?

I think this is because:

Suppose we have a line ending with CR LF.

Git changes that to CR ANSI LF.

With delta, bytelines strips the LF but leaves the CR (due to ANSI escape added by git). So now the line ends with CR ANSI but no LF. I think this causes the line to be overwritten by the terminal emulator.

Whereas with the usual pager the line ends CR ANSI LF.

@yoichi
Copy link
Contributor

yoichi commented Mar 1, 2022

This (=CR erases the line) does not happen with the default pager. Why does this happen with delta?

I found right_fill_background_color's output overwrites the text before CR.
Without interactive.diffFilter, there is only reset (ESC[m) after CR, the text before CR remains intact.

I've tried this by applying sample code in https://rosettacode.org/wiki/Check_output_device_is_a_terminal#Rust
But it didn't work with git add -p.

git add -p (git-add--interactive.perl) sends diff to pipe (not directly to the terminal) then print it.
So the terminal detection did not work.
git diff with delta send output to less --RAW-CONTROL-CHARS, and less replaces control characters with "^%c" (the caret notation), but the feature is effective only for terminal output (https://github.com/gwsw/less/blob/4f5e45b4262009c6fcd87c0efc077725271c8c57/main.c#L222). Therefore it is not usable with git add -p.

I think one solution is to define delta's new command line option to replace control characters by caret notations (like less --RAW-CONTROL-CHARS on terminal) and use the option in the interactive.diffFilter value.

@yoichi
Copy link
Contributor

yoichi commented Mar 3, 2022

I think one solution is to define delta's new command line option to replace control characters by caret notations (like less --RAW-CONTROL-CHARS on terminal) and use the option in the interactive.diffFilter value.

I've made a fix in this way and created a pull request. Would you please take a look? @norisio @dandavison
#1000

@dandavison
Copy link
Owner

dandavison commented Mar 4, 2022

Is it true that the problem is occurring because

(1) git is placing an ANSI character between CR and LF: CR ANSI LF
(2) bytelines is stripping the LF, leaving CR ANSI

?

If so, would one solution be for delta to strip trailing CR if it is followed by the trailing ANSI character? Presumably that would be very rare in real input.

In any case, delta's output is for humans to look at, not for machines to read, so it's not the end of the world if we occasionally remove a byte someone wanted as long as it really is rare.

@norisio
Copy link
Contributor

norisio commented Mar 4, 2022

git is placing an ANSI character between CR and LF

Only when used with -p, according to the images in #754 (comment), which show there are no CR-ANSI-LF .

@norisio
Copy link
Contributor

norisio commented Mar 4, 2022

Therefore this issue is not related to the CR-ANSI-LF problem, I guess.
Simply because using bytelines to split the input stream. It always erases CR before LF.

@dandavison
Copy link
Owner

Only when used with -p

Thanks!

Therefore this issue is not related to the CR-ANSI-LF problem

Right. OK, so:

I think it is correct that delta's internal code paths work with a stripped line. So maybe the solution is we update bytelines so that it does not strip the line endings, or strips them but returns them to us with the line. Then, when delta is ready to print a line, it adds back the original line ending.

I do think it's worth bearing in mind that delta's output is for humans to look at, not for machines to parse: if you want machine-readable diff output then that's delta's input, not its output. And indeed, if you redirect git's output into a pipe or file, then delta doesn't even get invoked (git doesn't invoke core.pager).

HOWEVER:

  1. It seems a reasonable goal that delta prettifies a line with ANSI colors but nevertheless retains the original line ending
  2. delta has --raw, and it is very reasonable to expect that to retain the line endings even if delta didn't by default.

@Atemu
Copy link

Atemu commented Dec 3, 2023

What is the status on this?

it's worth bearing in mind that delta's output is for humans to look at, not for machines to parse

This is true but us humans also need to understand the output. If it show a line removed but the same line added again, that's extremely confusing.

You might be able to deduce it's due to line endings but then you're left wondering what changed about line endings.
Were they the correct ones but aren't anymore? The other way around? No idea, all you see is the same characters.

Line endings are already complex enough in git as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants