Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support git format-patch diffs #1

Closed
ObserverOfTime opened this issue Sep 14, 2022 · 5 comments
Closed

Support git format-patch diffs #1

ObserverOfTime opened this issue Sep 14, 2022 · 5 comments

Comments

@ObserverOfTime
Copy link

ObserverOfTime commented Sep 14, 2022

git format-patch uses --- to separate the commit body from the summary.
This parser expects that to be part of an old_file node and returns an error.

Here's a random kernel patch and the syntax tree it produces:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=da3b1c2

source [0, 0] - [36, 0]
  context [0, 0] - [0, 70]
  context [1, 0] - [1, 33]
  context [2, 0] - [2, 36]
  context [3, 0] - [3, 71]
  context [4, 0] - [4, 21]
  context [6, 0] - [6, 63]
  context [7, 0] - [7, 67]
  context [8, 0] - [8, 7]
  context [10, 0] - [10, 102]
  context [11, 0] - [11, 42]
  context [12, 0] - [12, 39]
  context [13, 0] - [13, 67]
  context [14, 0] - [14, 44]
  ERROR [15, 0] - [15, 3]
  context [16, 0] - [16, 77]
  context [17, 0] - [17, 46]
  command [19, 0] - [19, 154]
    filename [19, 11] - [19, 154]
  index [20, 0] - [20, 41]
    commit [20, 6] - [20, 19]
    commit [20, 21] - [20, 34]
    mode [20, 35] - [20, 41]
  old_file [21, 0] - [21, 75]
    filename [21, 4] - [21, 75]
  new_file [22, 0] - [22, 75]
    filename [22, 4] - [22, 75]
  location [23, 0] - [23, 29]
    linerange [23, 3] - [23, 8]
    linerange [23, 9] - [23, 14]
  context [24, 0] - [24, 64]
  context [26, 0] - [26, 18]
  deletion [27, 0] - [27, 22]
  addition [28, 0] - [28, 28]
  context [29, 0] - [29, 17]
  context [31, 0] - [31, 10]
  context [32, 0] - [32, 3]
  context [33, 0] - [33, 5]
@the-mikedavis
Copy link
Owner

This grammar focuses only on diff syntax. Patch syntax should be covered in a separate grammar which injects this diff grammar.

@the-mikedavis the-mikedavis closed this as not planned Won't fix, can't repro, duplicate, stale Sep 14, 2022
@ObserverOfTime
Copy link
Author

ObserverOfTime commented Sep 14, 2022

Patch syntax is diff syntax with --- in a context line.
(and some metadata which don't have to be handled by this grammar)

@the-mikedavis
Copy link
Owner

I can't find a formal definition for patch and/or diff (if you can find these, please do link them).

My understanding is that patch syntax is a superset of diff - it's additional metadata surrounding a diff section. There is no way to not handle part of a document with tree-sitter: everything must be parsed as something, so adding support for patch would mean parsing all of the patch syntax.

A separate tree-sitter-patch grammar would be a reasonable way to approach this, and it could parse the metadata.

@ObserverOfTime
Copy link
Author

I can't find a formal definition for patch and/or diff

GNU diffutils supports several distinct formats:

Ideally, this parser would handle the first three formats. (maybe in separate grammars?)


My understanding is that patch syntax is a superset of diff

I suppose you're right as per the Kernel documentation.
It's in fact a combination of mbox and unified diff.


There is no way to not handle part of a document with tree-sitter

I know, but you can parse the --- marker as a context node to avoid the error.
At least until someone (maybe me?) writes a grammar for the patch format.

the-mikedavis added a commit that referenced this issue Sep 16, 2022
Deleted lines may start with the leading dash "-" but then the line
itself may contain an arbitrary number of dashes. This change parses
any number of leading dashes as a deletion except for the triple
leading dash which is considered to be an old file marker.

These cases should have been valid parse examples before but were
marked as errors because of the contention between the deletion
and old_file rules and error recovery.

Connects #1
@the-mikedavis
Copy link
Owner

Yeah, the error node should be cleaned up. I have fixed the parsing to produce a deletion node though rather than context in 6d7c499, since --- may mark the deletion of a line containing only "--".

I don't use patch format myself so I am uninterested in maintaining a grammar for it. This grammar should serve as a decent example if you would like to tackle it yourself though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants