-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlighting improvements #1244
Highlighting improvements #1244
Conversation
edits::tests::test_infer_edits_12 passes so there is no need to ignore it.
Reduce the amount of repetition and specify the Levenshtein distance explicitly in preparation for changing the Levenshtein calculation and adding more tests in a future commit. As there are quite a few inputs to each test use a struct rather than a plain function for the tests as this effectively provides named parameters for the inputs.
When a token has moved within a line try to highlight it as a deletion followed by an insertion. Currently -'b ' +' b' is highlighted as -'b[ ]' +'[ ]b' as if the space has moved to the left, instead highlight it as -'[b] ' +' [b]' as if the "b" has moved to the right. As the changes to the tests show this also tends to favor a longer match at the start of the line.
Only unchanged space between changed tokens should be highlighted, if the line ends with an unchanged space then the space should not be highlighted.
Sometimes the highlighting of a change is not as clear as it could be. For example the change -printf "%s\n" s y y ... +test_write_lines s y n ... is highlighted as -[printf "%]s[\n"] [s ]y y ... +[test_write_lines ]s y y ... rather than -[printf "%s\n"] s y n ... +[test_write_lines] s y y ... This is because the Levenshtein distance calculation only checks if the current tokens match without considering if the previous tokens matched or not. Adding a small penalty for starting a run of changed tokens forces the changes to be grouped together whenever possible. A knock on effect of adding the penalty is that the cost a substitution needs to be increased relative to the cost of an insertion/deletion otherwise the lowest cost for "ab" -> "ba" is two substitutions rather than an insertion, match and deletion. There are several changes to the tests - the Levenshtein distance is updated to reflect the new calculation in tests::align::* - several new tests are added to tests::align to check the grouping of different combinations of insertions and deletions - tests::edits::test_infer_edits_10 shows an improvement in the highlighting as the unchanged space at the end of the changed tokens is no longer highlighted. This is because it is no longer considered to be deleted by the Levenshtein matching as deleting the space between the deleted words now has a lower cost. - there is a new test in tests::edits using the example in this message.
Now that the lowest cost edit path groups deletions and insertions together there seems to be little point in keeping substitutions. Indeed the lower cost of substitutions compared to a deletion followed by an insertion can adversely affect the highlighting. If the length of the line does not change delta will prefer to use substitutions over deletions and insertions even if it results in unchanged tokens being marked as changed. For example in -[a] a a a a a [b b b] +[c] a a a a a [a c c] unchanged tokens are marked as deleted and inserted. Without substitutions this is highlighted as -a a a a a a [b b b] +[c ]a a a a a a [c c] which highlights the insertion at the beginning of the line and the change at the end of the line more clearly. There is a change to the distance calculation as substitutions only contributed the length of the deletion whereas now the same change will add the length of both the deletion and the insertion. This is addressed by doubling the contribution of unchanged sections so that the denominator equals the sum of the distance contributions of the plus line and minus line.
3c12268
to
ef4a6aa
Compare
@phillipwood this looks like one of the best improvements to delta that has been made since the project started. Thank you very much for working on this. I'm very short of time currently unfortunately, but I'm certainly going to run your branch as my local delta and I've collected some screenshots of differences below. If some other delta contributors would like to try this branch out that would be great. Unless someone's aware of a major breaking change that some users are likely not to like, then I'm optimistic that we can merge and release this. @phillipwood Is there any chance you'd be able to do the following? Look through some of the open issues that contain criticism of the details of delta's highlighting and comment on whether this branch fixes them / could fix them with modification / whether you think the issue should stay open? Have you formed any view on what the default value of
|
@dandavison Thanks, I'll try and take a look at the issues this week. I'm not sure what to do with A couple of changes that get highlighted when they weren't before A couple of examples where we lose the highlighting Here are a couple examples of improvements which show changes being grouped together and the benefit of removing substitutions. This is an improvement I've not seen anything terrible but these are slightly worse now (due to the prefix matching which improved the example above). In the future I'd like to play with sliding the highlights to optimize their position in the same way git's diff.indentHeurstic does for diffs. [edited to add some comments to explain the source of the change in highlighting] |
@dandavison I've been though all the open highlighting issues I could find and commented on the ones improved by this PR. Most of the issues seem to stem from problems pairing up added and deleted lines which are not addressed by this series. #521 is the closest match but that is already closed. #345 & #1181 can be closed if this is merged I think. |
LGTM! And thanks very much for going through those issues -- that's extremely useful. |
I know the v0.15.0 release is a few days old at this point, but the highlighting improvement is so noticeable I just had to come back to this issue and say WOW. Thank you for making these changes! |
While using delta I have noticed a few cases where the highlighting
was not quite as good as it could have been. This PR attempts to
address that. The most significant change is that where possible
insertions and deletions are grouped together to minimize the number of
groups of changes within a line. For example the change
is now highlighted as
rather than
There are several other changes:
the line is unchanged.
changed when it is preceded by a changed word.
I was unsure whether to squash the last two commits together. The
diffstat of the combined patch is quite a bit smaller than the two
separate patches but I think it is clearer to show the changes
separately.