-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subtree: Fix handling of complex history #493
base: maint
Are you sure you want to change the base?
Conversation
Welcome to GitGitGadgetHi @tqc, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that this Pull Request has a good description, as it will be used as cover letter. Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions. If you want to see what email(s) would be sent for a submit request, add a PR comment If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox ("raw") file corresponding to the mail you want to reply to from the Git mailing list. If you use GMail, you can upload that raw mbox file via: curl -g --user "<EMailAddress>:<Password>" --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt Need help?New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
/allow |
User tqc is now allowed to use GitGitGadget. |
/preview |
Preview email sent as [email protected] |
fa2cc8f
to
c0de1ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, comment intended for the first patch in the series went here by accident.
Friendly ping @tqc. What would be necessary to get this PR ready for submisson? |
Probably just one thing: the Git mailing list insists on continuing in lower-case after the After that, I think it'll be Also, the first comment (which will be used as a cover letter) mentions that documentation is missing, and I would also suggest to add a footer |
Sorry guys, got hopelessly distracted between updating the docs and submitting the PR. The tech piece is all done, I should be able to find time to do the final cleanup tomorrow |
b93871b
to
19db9cf
Compare
/preview |
Preview email sent as [email protected] |
/submit |
Submitted as [email protected] |
GitHub UI fail, that (+ve review) was meant to apply to 74fa670 |
contrib/subtree/git-subtree.sh
Outdated
main= | ||
sub= | ||
local grep_format="^git-subtree-dir: $dir/*\$" | ||
git log --reverse --grep="$grep_format" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why the grep format isn't just expanded inline
contrib/subtree/git-subtree.sh
Outdated
main= | ||
sub= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are main
and sub
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An artifact of starting with a copy of find_existing splits that should be removed. Since you’re building on top of this, is it better to rewrite the history or fix in a new commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewriting history won't bother me, my WIP is going to need to be edited and rebased significantly anyway.
contrib/subtree/git-subtree.sh
Outdated
@@ -161,7 +162,7 @@ command="$1" | |||
shift | |||
|
|||
case "$command" in | |||
add|merge|pull) | |||
add|merge|pull|map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pedantic nit, might be useful to keep these in alpha order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assuming logical rather than alphabetical ordering, where at least in the docs the more arcane options go towards the end. Happy to make the internal bits alphabetical though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fair, it so happens the existing options are both logically & alphabetically ordered. Not a significant comment either way
Signed-off-by: Tom Clarkson <[email protected]>
Rebased on current maint and updated with the minor fixes to find_mainline ref from Ed's review. Will submit the updated version if the tests all pass and no other minor fixes come up |
Please ignore the vs-build failure: this has been fixed already (in e58e405, but that fix did not make it to |
/preview |
Preview email sent as [email protected] |
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
@@ -238,7 +238,7 @@ cache_miss () { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ed Maste wrote (reply to this):
On Tue, 6 Oct 2020 at 18:05, Tom Clarkson via GitGitGadget
<[email protected]> wrote:
>
> From: Tom Clarkson <[email protected]>
>
> Signed-off-by: Tom Clarkson <[email protected]>
Reviewed-by: Ed Maste <[email protected]>
@@ -238,13 +238,13 @@ cache_miss () { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Tom,
On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:
> From: Tom Clarkson <[email protected]>
>
> Include recursion depth in debug logs so we can see when the recursion is
> getting out of hand.
>
> Making the cache handle null mappings correctly and adding older commits
> to the cache allows the recursive algorithm to terminate at any point on
> mainline rather than needing to reach either the add point or the initial
> commit.
Makes sense.
> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index 9867718503..160bad95c1 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -244,7 +244,7 @@ check_parents () {
> do
> if ! test -r "$cachedir/notree/$miss"
> then
> - debug " incorrect order: $miss"
> + debug " unprocessed parent commit: $miss ($indent)"
Without any context, it is hard to understand what the `$indent` variable
is supposed to mean, so it is unclear why we need to print it here.
I _guess_ it is the degree removed from the first-parent lineage?
In any case, it does not hurt here, so I trust that it is good to include
it in the debug output.
> process_split_commit "$miss" "" "$indent"
> fi
> done
> @@ -392,6 +392,24 @@ find_existing_splits () {
> done
> }
>
> +find_mainline_ref () {
> + debug "Looking for first split..."
> + dir="$1"
> + revs="$2"
The `git-subtree` script seems to rely on the `local` construct, using it
in plenty of other circumstances. How about using it here, too?
> +
> + git log --reverse --grep="^git-subtree-dir: $dir/*\$" \
> + --no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |
Since all you are interested in is the `git-subtree-mainline:` trailer,
wouldn't a format like `%(trailers:key=git-subtree-mainline)` instead of
`START %H%n%s%n%n%b%nEND%n`?
See
https://git-scm.com/docs/git-log#Documentation/git-log.txt-emtrailersoptionsem
for more information about pretty formats.
BTW I am super unfamiliar with `git subtree`'s inner workings, and
therefore it would help me incredibly if the commit message talked a bit
about the commit message layout (with a particular eye on
`git-subtree-dir` and `git-subtree-mainline` which I guess are trailers
added by `git subtree`?)...
> + while read a b junk
> + do
> + case "$a" in
> + git-subtree-mainline:)
> + echo "$b"
> + return
> + ;;
> + esac
> + done
> +}
> +
> copy_commit () {
> # We're going to set some environment vars here, so
> # do it in a subshell to get rid of them safely later
> @@ -646,9 +664,9 @@ process_split_commit () {
>
> progress "$revcount/$revmax ($createcount) [$extracount]"
>
> - debug "Processing commit: $rev"
> + debug "Processing commit: $rev ($indent)"
> exists=$(cache_get "$rev")
> - if test -n "$exists"
> + if test -z "$(cache_miss "$rev")"
> then
> debug " prior: $exists"
I do not see the `exists` variable being used other than for the debug
statement. Maybe better something like this?
debug " prior found for $rev"
> return
> @@ -773,6 +791,17 @@ cmd_split () {
>
> unrevs="$(find_existing_splits "$dir" "$revs")"
>
> + mainline="$(find_mainline_ref "$dir" "$revs")"
> + if test -n "$mainline"
> + then
> + debug "Mainline $mainline predates subtree add"
> + git rev-list --topo-order --skip=1 $mainline |
> + while read rev
> + do
> + cache_set "$rev" ""
Ah, so they are not really "null mappings", but mapped to an empty string.
Makes sense. Maybe adjust the commit message?
> + done || exit $?
> + fi
> +
> # We can't restrict rev-list to only $dir here, because some of our
> # parents have the $dir contents the root, and those won't match.
> # (and rev-list --follow doesn't seem to solve this)
> --
> gitgitgadget
>
>
@@ -27,6 +27,7 @@ b,branch= create a new branch from the split subtree | |||
ignore-joins ignore prior --rejoin commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Tom,
On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:
> @@ -48,6 +49,7 @@ annotate=
> squash=
> message=
> prefix=
> +clearcache=
It might be more consistent to call it `clear_cache` (i.e. with an
underscore), just like `ignore_joins`.
>
> debug () {
> if test -n "$debug"
> @@ -131,6 +133,9 @@ do
> --no-rejoin)
> rejoin=
> ;;
> + --clear-cache)
> + clearcache=1
> + ;;
> --ignore-joins)
> ignore_joins=1
> ;;
> @@ -206,9 +211,13 @@ debug "opts: {$*}"
> debug
>
> cache_setup () {
> - cachedir="$GIT_DIR/subtree-cache/$$"
> - rm -rf "$cachedir" ||
> - die "Can't delete old cachedir: $cachedir"
> + cachedir="$GIT_DIR/subtree-cache/$prefix"
Excellent, the `prefix` should be "unique enough".
> + if test -n "$clearcache"
> + then
> + debug "Clearing cache"
> + rm -rf "$cachedir" ||
> + die "Can't delete old cachedir: $cachedir"
> + fi
> mkdir -p "$cachedir" ||
> die "Can't create new cachedir: $cachedir"
> mkdir -p "$cachedir/notree" ||
> @@ -266,6 +275,16 @@ cache_set () {
> echo "$newrev" >"$cachedir/$oldrev"
> }
>
> +cache_set_if_unset () {
> + oldrev="$1"
> + newrev="$2"
`local`? ;-)
> + if test -e "$cachedir/$oldrev"
> + then
> + return
> + fi
> + echo "$newrev" >"$cachedir/$oldrev"
So that directory contains commit mappings, a file for each mapped
revision.
Thinking back to patch 2/11, I am now no longer that sure that it makes
sense to fill it up with every commit in that commit range: performance
suffers when directories contain too many files.
For example, I had a case in the past where it took a minute just to
enumerate a directory, and even looking whether a file existed in that
directory was not exactly fun.
In any case, I would write it slightly shorter:
test -e "$cachedir/$oldrev" ||
echo "$newrev" >"$cachedir/$oldrev"
> +}
> +
> rev_exists () {
> if git rev-parse "$1" >/dev/null 2>&1
> then
> @@ -375,13 +394,13 @@ find_existing_splits () {
> then
> # squash commits refer to a subtree
> debug " Squash: $sq from $sub"
> - cache_set "$sq" "$sub"
> + cache_set_if_unset "$sq" "$sub"
> fi
> if test -n "$main" -a -n "$sub"
> then
> debug " Prior: $main -> $sub"
> - cache_set $main $sub
> - cache_set $sub $sub
> + cache_set_if_unset $main $sub
> + cache_set_if_unset $sub $sub
> try_remove_previous "$main"
> try_remove_previous "$sub"
> fi
> @@ -688,6 +707,8 @@ process_split_commit () {
> if test -n "$newparents"
> then
> cache_set "$rev" "$rev"
> + else
> + cache_set "$rev" ""
Was this hunk intended to be snuck in here? I can understand the
s/cache_set/cache_set_if_unset/ changes, of course, but not this hunk.
> fi
> return
> fi
> @@ -785,7 +806,7 @@ cmd_split () {
> # the 'onto' history is already just the subdir, so
> # any parent we find there can be used verbatim
> debug " cache: $rev"
> - cache_set "$rev" "$rev"
> + cache_set_if_unset "$rev" "$rev"
> done
> fi
>
> @@ -798,7 +819,7 @@ cmd_split () {
> git rev-list --topo-order --skip=1 $mainline |
> while read rev
> do
> - cache_set "$rev" ""
> + cache_set_if_unset "$rev" ""
Okay. A quite interesting question now would be: are there any callers of
`cache_set` left? If so, why?
Thanks,
Dscho
> done || exit $?
> fi
>
> --
> gitgitgadget
>
>
@@ -9,12 +9,15 @@ then | |||
set -- -h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Tom,
On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:
> @@ -796,20 +810,60 @@ cmd_add_commit () {
> }
>
> cmd_map () {
> - oldrev="$1"
> - newrev="$2"
>
> - if test -z "$oldrev"
> + if test -z "$1"
I'd like to keep the nice name. Maybe if it is `local`, there is no longer
a need to replace `$oldrev` by `$1`?
> then
> die "You must provide a revision to map"
> fi
>
> + oldrev=$(git rev-parse --revs-only "$1") || exit $?
> + newrev=
> +
> + if test -n "$2"
> + then
> + newrev=$(git rev-parse --revs-only "$2") || exit $?
> + fi
> +
Would it not make more sense to validate the parameters before calling
`cmd_map`?
In any case, this strikes me like a subject for another commit.
Thanks,
Dscho
P.S.: I'll have to stop reviewing here for the moment, not sure whether
I'll come back to it later today or maybe tomorrow.
> cache_setup || exit $?
> cache_set "$oldrev" "$newrev"
>
> say "Mapped $oldrev => $newrev"
> }
>
> +cmd_ignore () {
> + revs=$(git rev-parse $default --revs-only "$@") || exit $?
> + ensure_single_rev $revs
> +
> + say "Ignoring $revs"
> +
> + cache_setup || exit $?
> +
> + git rev-list $revs |
> + while read rev
> + do
> + cache_set "$rev" ""
> + done
> +
> + echo "$revs" >>"$cachedir/processed"
> +}
> +
> +cmd_use () {
> + revs=$(git rev-parse $default --revs-only "$@") || exit $?
> + ensure_single_rev $revs
> +
> + say "Using existing subtree $revs"
> +
> + cache_setup || exit $?
> +
> + git rev-list $revs |
> + while read rev
> + do
> + cache_set "$rev" "$rev"
> + done
> +
> + echo "$revs" >>"$cachedir/processed"
> +}
> +
> cmd_split () {
> debug "Splitting $dir..."
> cache_setup || exit $?
> @@ -827,7 +881,7 @@ cmd_split () {
> done
> fi
>
> - unrevs="$(find_existing_splits "$dir" "$revs")"
> + unrevs="$(find_existing_splits "$dir" "$revs") $(exclude_processed_refs)"
>
> mainline="$(find_mainline_ref "$dir" "$revs")"
> if test -n "$mainline"
> --
> gitgitgadget
>
>
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
@@ -9,12 +9,15 @@ then | |||
set -- -h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi,
On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:
> From: Tom Clarkson <[email protected]>
>
> Prevent a mainline commit without $dir being treated as a subtree
> commit and pulling in the entire mainline history. Any valid subtree
> commit will have only valid subtree commits as parents, which will be
> unchanged by check_parents.
I feel like this is only half the picture because I have a hard time
stitching these two sentences together.
After studying the code and your patch a bit, it appears to me that
`process_split_commit()` calls `check_parents()` first, which will call
`process_split_commit()` for all as yet unmapped parents. So basically, it
recurses until it found a commit all of whose parents are already mapped,
then permeates that information all the way back.
Doesn't this cause serious issues with stack overflows and all for long
commit histories?
> Signed-off-by: Tom Clarkson <[email protected]>
> ---
> contrib/subtree/git-subtree.sh | 24 +++++++++++-------------
> 1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index e56621a986..fa6293b372 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -224,8 +224,6 @@ cache_setup () {
> fi
> mkdir -p "$cachedir" ||
> die "Can't create new cachedir: $cachedir"
> - mkdir -p "$cachedir/notree" ||
> - die "Can't create new cachedir: $cachedir/notree"
It might make sense to talk about this a bit in the commit message.
Essentially, you are replacing the `notree/<rev>` files by mapping `<rev>`
to the empty string.
This makes me wonder, again, whether the file system layout of the cache
can hold up to the demands. If a main project were to merge a subtree
with, say, 10 million commits, wouldn't that mean that `git subtree` would
now fill one directory with 10 million files? I cannot imagine that this
performs well, still.
> debug "Using cachedir: $cachedir" >&2
> }
>
> @@ -255,18 +253,11 @@ check_parents () {
> local indent=$(($2 + 1))
> for miss in $missed
> do
> - if ! test -r "$cachedir/notree/$miss"
> - then
> - debug " unprocessed parent commit: $miss ($indent)"
> - process_split_commit "$miss" "" "$indent"
> - fi
> + debug " unprocessed parent commit: $miss ($indent)"
> + process_split_commit "$miss" "" "$indent"
That makes sense to me, as the `missed` variable only contains as yet
unmapped commits, therefore we do not have to have an equivalent `test -r`
check.
Ciao,
Dscho
> done
> }
>
> -set_notree () {
> - echo "1" > "$cachedir/notree/$1"
> -}
> -
> cache_set () {
> oldrev="$1"
> newrev="$2"
> @@ -719,11 +710,18 @@ process_split_commit () {
> # vs. a mainline commit? Does it matter?
> if test -z "$tree"
> then
> - set_notree "$rev"
> if test -n "$newparents"
> then
> - cache_set "$rev" "$rev"
> + if test "$newparents" = "$parents"
> + then
> + # if all parents were subtrees, this can be a subtree commit
> + cache_set "$rev" "$rev"
> + else
> + # a mainline commit with tree missing is equivalent to the initial commit
> + cache_set "$rev" ""
> + fi
> else
> + # no parents with valid subtree mappings means a commit prior to subtree add
> cache_set "$rev" ""
> fi
> return
> --
> gitgitgadget
>
>
@@ -52,6 +52,12 @@ useful elsewhere, you can extract its entire history and publish | |||
that as its own git repository, without accidentally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Tom,
On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:
> From: Tom Clarkson <[email protected]>
>
> Signed-off-by: Tom Clarkson <[email protected]>
> ---
> contrib/subtree/git-subtree.txt | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/contrib/subtree/git-subtree.txt b/contrib/subtree/git-subtree.txt
> index 352deda69d..a5a76e8ce6 100644
> --- a/contrib/subtree/git-subtree.txt
> +++ b/contrib/subtree/git-subtree.txt
> @@ -52,6 +52,12 @@ useful elsewhere, you can extract its entire history and publish
> that as its own git repository, without accidentally
> intermingling the history of your application project.
>
> +Although the relationship between subtree and mainline commits is stored
As far as I can see, this is the first time the term "mainline commit" is
used in that file, and it has not really be defined what you mean by that.
I *guess* you are referring to commits in the main project that did not
come from any subtree project.
Maybe this can be described without needing a new term?
Ciao,
Dscho
> +in regular git history, it is also cached between subtree runs. In most
> +cases this is merely a performance improvement, but for projects with
> +large and complex histories the cache can be manipulated directly
> +with the use, ignore and map commands.
> +
> [TIP]
> In order to keep your commit messages clean, we recommend that
> people split their commits between the subtrees and the main
> @@ -120,6 +126,21 @@ and friends will work as expected.
> Note that if you use '--squash' when you merge, you should usually not
> just '--rejoin' when you split.
>
> +ignore::
> + Mark a commit and all of its history as irrelevant to subtree split.
> + In most cases this would be handled automatically based on metadata
> + from subtree join commits. Intended for improving performance on
> + extremely large repos and excluding complex history that turns out
> + to be otherwise problematic.
> +
> +use::
> + Mark a commit and all of its history as part of an existing subtree.
> + In normal circumstances this would be handled based on the metadata
> + from the subtree join commit. Similar to the --onto option of split.
> +
> +map::
> + Manually override the normal output of split for a particular commit.
> + Extreme flexibility for advanced troubleshooting purposes only.
>
> OPTIONS
> -------
> @@ -142,6 +163,9 @@ OPTIONS
> This option is only valid for add, merge and pull (unsure).
> Specify <message> as the commit message for the merge commit.
>
> +--clear-cache::
> + Reset the subtree cache and recalculate all subtree mappings from the
> + commit history
>
> OPTIONS FOR add, merge, push, pull
> ----------------------------------
> --
> gitgitgadget
>
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
(Unfortunately my git mailing list subscription is on my inbox from my previous job, so I can't reply directly just yet, though I'm working on it) @tqc Do you know what's blocking this from landing? Is it something that others can help with? For context, we're using |
You don't need to be subscribed to post. You only need to be careful to send plain text mail, not even an alternate HTML or it will bounce. For details, see: https://github.com/gitgitgadget/gitgitgadget/wiki/ReplyToThis |
I was experiencing occasional segfaults on It's fine that I temporarily take point with my patched git to manage anything where we need to perform splits, but I'd like my teammates to be able learn how to manage subtrees as well without bumping into this bug (which almost turned us away from subtrees altogether). |
@jzelinskie this is currently stuck in @tqc's court (I provided the prerequisite review on the Git mailing list, but haven't seen any activity in response). |
For what it's worth. I've just used the changes on this PR to get subtree data from our repository and push it back upstream. This version is much faster and actually delivers, the original in 2.32 was just failing with |
I also used this to detangle my company's changes to a library we had stored in a complicated monorepo. Worked great, where mainline |
I reported the same issue described in original thread which I found only after. I've used git-filter-repo as a work around and it works great and way faster then original subtree split. Haven't compared with implementation from here though. |
Doesn't look like @tqc is still following here, so I guess now the question is whether someone else is willing and able to pick this up? None of the subtrees used in the Rust repository would have any chance of actually working without this patch. So all the people maintaining those subtrees have had to use a forked git for years, which is not a great situation. |
@RalfJung This is also my guess. However, this patch series has languished for so long that it will be quite challenging: I just gave this a try, and after an hour of rebasing and resolving merge conflicts and noticing that there are no added regression tests to build confidence in the patches, I have to shift my attention elsewhere. If anyone wants to pick up from where I left off: here is the rebased branch, and here are the reviews I provided on the Git mailing list. |
Fixes several issues that could occur when running subtree split on large repos with more complex history.
A merge commit could bypass the known start point of the subtree, which would cause the entire history to be processed recursively, leading to a stack overflow / segfault after reading a few hundred commits. Older commits are now explicitly recorded as irrelevant so that the recursive process can terminate on any mainline commit rather than only on subtree joins and initial commits.
It is possible for a repo to contain subtrees that lack the metadata that is usually present in add/join commit messages (git-svn at least can produce such a structure). The new use/ignore/map commands allow the user to provide that information for any problematic commits.
A mainline commit that does not contain the subtree folder could be erroneously identified as a subtree commit, which would add the entire mainline history to the subtree. Commits will now only be used as is if all their parents are already identified as subtree commits. While the new code can still be tripped up by unusual folder structures, the completely unambiguous solution turned out to involve a significant performance penalty, and the new ignore / use commands provide a workaround for that scenario.
Cc: Avery Pennarun [email protected]
cc: Ed Maste [email protected]
cc: Johannes Schindelin [email protected]