-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --no-rewrite
flag to migrate import
command
#3029
Conversation
Allow `migrate import` command to create a new commit without modifying git history when using the --no-rewrite flag
Allow the `migrate import` command to convert blobs to git-lfs pointers by rewriting the tree in a new commit without touching git history
Change the `migrate import` command to take a list of files to be imported when the `--no-rewrite` flag is used.
Add a new test script, test-migrate-import-no-rewrite.sh, that runs a series of integration tests using the `migrate import` command and the `--no-rewrite` flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome to Git LFS! This is a stellar first contribution and I am so excited to land these changes in v2.5.0. I left a handful of notes on various things, some little and some not.
Let me know if you have any thoughts/questions:
@@ -287,6 +294,8 @@ func init() { | |||
importCmd := NewCommand("import", migrateImportCommand) | |||
importCmd.Flags().BoolVar(&migrateVerbose, "verbose", false, "Verbose logging") | |||
importCmd.Flags().StringVar(&objectMapFilePath, "object-map", "", "Object map file") | |||
importCmd.Flags().BoolVar(&migrateNoRewrite, "no-rewrite", false, "Add new history without rewriting previous") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that these are great short descriptions of what the --no-rewrite
and --message
flags do, which is exactly what is appropriate for these.
Could you also add a longer explanation of what these flags are in git-lfs-migrate(1)
? The docs/man
tree is used to automatically generate commands/mancontent.go
, and this in turn is what is shown when you run git lfs migrate --help
, or git lfs help migrate
, for instance.
The existing prose should be a good place to start in terms of what level of detail is appropriate.
commands/command_migrate_import.go
Outdated
@@ -29,6 +30,67 @@ func migrateImportCommand(cmd *cobra.Command, args []string) { | |||
} | |||
defer db.Close() | |||
|
|||
if migrateNoRewrite { | |||
if len(args) == 0 { | |||
ExitWithError(errors.Errorf("fatal: the migrate import command requires a list of files to import when using the --no-rewrite flag")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great; I think dying here is the right thing to do. Let's shorten this message a little bit while we're here. For instance, the phrase "the migrate import command" might be unnecessary, since this will be the first line of output after a user runs git lfs migrate import
.
What about: "fatal: expected one or more file(s) with --no-rewrite"?
ExitWithError(errors.Wrap(err, "fatal: unable to find current reference")) | ||
} | ||
|
||
sha, _ := hex.DecodeString(ref.Sha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great; throwing this error away is reasonable since we trust git.CurrentRef()
to always give back a valid SHA-256 ref.Sha
.
commands/command_migrate_import.go
Outdated
|
||
filter := git.GetAttributeFilter(cfg.LocalWorkingDir(), cfg.LocalGitDir()) | ||
if len(filter.Include()) == 0 && len(filter.Exclude()) == 0 { | ||
ExitWithError(errors.Errorf("fatal: no git lfs filters setup in .gitattributes")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use "found" instead of "setup".
commands/command_migrate_import.go
Outdated
ExitWithError(errors.Errorf("fatal: no git lfs filters setup in .gitattributes")) | ||
} | ||
|
||
for _, file := range args { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one interesting consequence of rewriteTree()
processing one file at a time is that there are some penalties that we'll pay if we get part of the way through this loop. In particular, if there are m
arguments where m
is large, and the m-1
th of them doesn't match a pattern in .gitattributes
, we'll have created a lot of tree objects that are dangling in the object database.
So, I think that we could run this loop twice, once to check that all given filenames match at least one pattern, and then once more to call rewriteTree()
, which would mean we spend more time here. I think that's an OK tradeoff for not creating more objects than we need.
Of course, we could also look into ways to make rewriteTree()
take more than one file and therefore create less dangling tree objects, but I'll leave it up to you if you want to go down that path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, thanks! I didn't think about the possibility of dangling objects.
I agree with you on running the loop twice and think that's the best option, since we won't be able to avoid checking files against the .gitattributes
filters.
commands/command_migrate_import.go
Outdated
} | ||
} | ||
|
||
// findEntry searches a tree for the desired entry, and returns the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; This line is wrapped at fewer than 80 characters.
git/attribs.go
Outdated
patterns := make([]filepathfilter.Pattern, len(paths)) | ||
|
||
for i, path := range paths { | ||
patterns[i] = filepathfilter.NewPattern(path.Path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case we want to change this later in a way that takes away the 1-1 correspondence between paths and patterns, we can minimize that diff by calling append()
instead of using the slicing operator []
. In order to make that work as-expected, you'll want to change L108 to initialize patterns
with a capacity of len(paths)
, instead of a len
of len(paths)
.
# - Commit 'A' has 120, in a.txt, and a corresponding entry in .gitattributes. There is also | ||
# 140 in a.md, with no corresponding entry in .gitattributes. | ||
# It also has 140 in subtree/a.md, and a corresponding entry in subtree/.gitattributes | ||
setup_local_branch_with_nested_gitattrs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✨ Awesome!
assert_local_object "$txt_oid" "120" | ||
|
||
# Ensure the git history remained the same | ||
new_commit_oid="$(git rev-parse HEAD~1)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✨ Nice.
setup_multiple_local_branches | ||
|
||
# Ensure command fails if no .gitattributes files are present | ||
[ !"$(git lfs migrate import --no-rewrite *.txt *.md)" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could check that the error message this invocation gives is what we expect it to be? I think one way to do this would be to write something of the ilk:
git lfs migrate import --no-rewrite *.txt *.md 2>&1 | tee migrate.log
if [ ${PIPESTATUS[0]} -eq 0 ]; then
echo >&2 "fatal: expected git lfs migrate import --no-rewrite to fail, didn't"
fi
grep "..." migrate.log
Add documentation about the `--no-rewrite` flag to the man page for the `migrate` command
* Update error message text to be more concise/descriptive * Fix casing/formatting issues * Inline variables to reduce scope * Cache and reuse GitFilter instead of creating one every time a clean is performed
With the `--no-rewrite` flag to the `migrate import`, check all provided files match `.gitattributes` filters before any tree rewrite operations.
Append elements to the array rather than assigning them to indices in GetAttributeFilter
Update `--no-rewrite` error messages to correctly case Git LFS and use the term `filters`
In the `--no-rewrite` tests that cover error cases, test that the provided error message is as expected
@ttaylorr Thank you for the review! I've gone ahead and pushed up some new commits addressing your feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @PastelMobileSuit! I left a few minor nit-picks, but besides that I think we are ready to go.
commands/command_migrate_import.go
Outdated
root := commit.TreeID | ||
|
||
filter := git.GetAttributeFilter(cfg.LocalWorkingDir(), cfg.LocalGitDir()) | ||
if len(filter.Include()) == 0 && len(filter.Exclude()) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid checking len(filter.Exclude())
? My thought is that if filter.Include()
is empty, but filter.Exclude()
is not, we'll get past this check, but not pick up any files.
|
||
# Ensure command fails if no .gitattributes files are present | ||
git lfs migrate import --no-rewrite *.txt *.md 2>&1 | tee migrate.log | ||
if [ ${PIPESTATUS[0]} -eq 0 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we exit 1
here?
# Failure should occur when trying to import a.md as no entry exists in | ||
# top-level .gitattributes file | ||
git lfs migrate import --no-rewrite a.md 2>&1 | tee migrate.log | ||
if [ ${PIPESTATUS[0]} -eq 0 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we exit 1
here as well?
exit 1 | ||
fi | ||
) | ||
end_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing trailing LF.
docs/man/git-lfs-migrate.1.ronn
Outdated
a different argument list, specialized options will become available, and the core | ||
`migrate` options will be ignored. See [IMPORT (NO REWRITE)]. | ||
|
||
If `--no-rewrite` is not provided and `--include` or `--exclude` (`-I`, `-X`, respectively) are given, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it looks like a few lines in this file are longer than 80 columns. Can you make sure that the wrapping is set properly for .ronn
files?
docs/man/git-lfs-migrate.1.ronn
Outdated
|
||
The `--no-rewrite` sub-mode supports the following options and arguments: | ||
|
||
* `-m` <message> `--message`=<message> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the backticks should wrap from the first -
to the last >
in each example.
docs/man/git-lfs-migrate.1.ronn
Outdated
The `import` mode has a special sub-mode enabled by the `--no-rewrite` flag. This sub-mode | ||
will migrate large objects to pointers as in the base `import` mode, but will do so in a new commit | ||
without rewriting Git history. When using this sub-mode, the base `migrate` options, | ||
such as `--include-ref`, will be ignored, as will those for the base `import` mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that ignoring --include-ref
is totally OK here, but we should perhaps add a note in the documentation that all interested branches will have to merge new commits generated by git lfs migrate import --no-rewrite
if they also want to take new commits in.
docs/man/git-lfs-migrate.1.ronn
Outdated
You can also migrate files without modifying the existing history of your respoitory: | ||
|
||
``` | ||
# Without a specified commit message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move these comments out of the code block, and replace the shell with 4 spaces, and then git lfs migrate ...
(and without the $
).
Ensure `/` is used as the path separator when parsing Git LFS filters in nested directories.
git/attribs.go
Outdated
@@ -73,7 +73,7 @@ func GetAttributePaths(workingDir, gitDir string) []AttributePath { | |||
fields := strings.Fields(line) | |||
pattern := fields[0] | |||
if len(reldir) > 0 { | |||
pattern = filepath.Join(reldir, pattern) | |||
pattern = filepath.ToSlash(filepath.Join(reldir, pattern)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The necessity of this change was prompted by a consistent Windows test failure I was seeing on a VM. Every time I ran the nested .gitattributes
test, I saw a failure.
It turns out the problem is that filepath.Join()
uses \
as the separator character on Windows, rather than /
. As a result, a *.txt
pattern in a folder subtree
would be parsed as subtree\*.txt
, and instead of acting as a path separator, the \
would escape the *
character, so the pattern would then only match a file with the name subtree*.txt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great fix! One nit is that filepath.Join
calls filepath.Clean
(see [1]), which can have unintended consequences with certain pattern types (i.e., escape sequences are removed, etc.)
How about we shorten this to:
pattern = fmt.Sprintf("%s/%s", reldir, pattern)
or:
pattern = strings.Join([]string{reldir, pattern}, "/")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ttaylorr Hey! I believe this code was already being used as-is with the track
command, so I think that was already removing escape characters from patterns. I did some investigation to confirm this, and git lfs track "*\*.mov"
gets converted to */*.mov
in the .gitattributes
file, and if you manually change it to *\*.mov
, that's not matched by track
on a file with the name a*.mov
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PastelMobileSuit Oh! That makes sense; it looks like as a historical artifact we don't disambiguate between *\*.dat
meaning "any directory name and any filename, ending in '.dat'" (on Unix) and "any filename, ending with '*.dat' (literal '*')" (on Windows).
I think it's safer to leave this behavior as-is, in which case it makes sense to me to keep using filepath.ToSlash(filepath.Join(...))
. Merge away!
* Fix various formatting issues * Add note about `--no-rewrite` commits needing to be merged into other branches
* Exit `migrate import --no-rewrite` if attributes filter has no elements in its Include() list * In `--no-rewrite` tests checking errors, fail out of test if the command doesn't fail
Shorten the repo/folder names in the nested .gitattributes test to prevent AppVeyor failure
Convert path separators to `/` when creating a pattern from a path to a .gitattributes filter instead of on creating the path to the filter
Okay I've pushed up some additional commits to address feedback and have finally fixed all the AppVeyor test failures, so this should be ready to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tiny change requested, but everything looks great! Well done.
6cb4b36
to
7caa122
Compare
@PastelMobileSuit @ttaylorr Can you please elaborate on when to use this feature? Sorry if I am missing something obvious, but the manual for migrate did not help me understand it and there is no further documentation that I can find. Thanks a lot! |
Hi, @hoeler. Thanks for writing in, and I'd be more than happy to answer
your question. It seems to me that the central point is:
Then I found the announcement of the "--no-rewrite" feature and
thought it may provide a way of moving all required files to lfs
without changing the history (maybe by leaving the SHA intact although
the content of the commits has changed).
Which isn't quite the case. The '--no-rewrite' flag works by _adding_
new commits, instead of _rewriting_ existing ones.
The SHA-1 of a commit is dependent upon the contents of the files in
that commit's root tree, the parents, message, and etc., of the commit
itself. So, there is no way to "rewrite" the contents of a commit
without also changing its SHA-1.
'--no-rewrite' is used in a case where (1) you have files in Git that
should have been committed with Git LFS, but weren't, and (2) those
files are under 100 MiB (thus, don't prevent pushing to github.com),
and/or (3) history rewriting is not an option.
Per (3), when you run 'git lfs migrate <import | export>', you have to
_also_ ensure that each clone of the repository has (1) fetched and (2)
reset their local references to match the remote's. When running 'git
lfs migrate import' with '--no-rewrite', the repository's cloners need
only run 'git pull' to receive the new changes.
The central difference between the two from the standpoint of "what are
my references?" is: 'git lfs migrate import' (with or without
'--no-rewrite') changes your references, the difference is whether or
not the old values are still valid locations in your repository's line
of history or not.
|
Hi @ttaylorr, thank you for the explanation.
|
1) So basically "--no-rewrite" only works with the tip of a branch, by
replacing existing files passed as arguments with LFS pointers and
commiting that as a new change? Isn't this the same as manually
removing the files from git and simply re-adding them under LFS
control?
I think that the gist of that is correct, but 'git lfs migrate import
--no-rewrite' has two attributes that make it different than 'git rm
--cached', which are:
1. Change each file in a separate commit, and
2. Perform the change in Git LFS, so that it will work even if 'git
lfs install' hasn't been run.
2) As I understand it now the "--no-rewrite" option does not help with
getting rid of large files referenced somewhere in the git history,
because this will _always_ require a rewrite of the history, correct?
Exactly.
|
Thanks @ttaylorr for clearing it up for me! |
This PR adds a new
--no-rewrite
flag to themigrate import
command. When this flag is given,migrate import
takes a list of files as arguments and imports these files into Git LFS. It does this by rewriting the git tree in a new commit, without modifying git history.Some of the code in this PR was based off of code previously written by @ttaylorr. Also, a big thank you to @ttaylorr for pairing with me on parts of this PR!