Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --no-rewrite flag to migrate import command #3029

Merged
merged 16 commits into from
Jun 11, 2018

Conversation

PastelMobileSuit
Copy link
Contributor

This PR adds a new --no-rewrite flag to the migrate import command. When this flag is given, migrate import takes a list of files as arguments and imports these files into Git LFS. It does this by rewriting the git tree in a new commit, without modifying git history.

Some of the code in this PR was based off of code previously written by @ttaylorr. Also, a big thank you to @ttaylorr for pairing with me on parts of this PR!

Allow `migrate import` command to create a new commit without
modifying git history when using the --no-rewrite flag
Allow the `migrate import` command to convert blobs to git-lfs
pointers by rewriting the tree in a new commit without touching
git history
Change the `migrate import` command to take a list of files to be
imported when the `--no-rewrite` flag is used.
Add a new test script, test-migrate-import-no-rewrite.sh,
that runs a series of integration tests using the `migrate import`
command and the `--no-rewrite` flag
Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to Git LFS! This is a stellar first contribution and I am so excited to land these changes in v2.5.0. I left a handful of notes on various things, some little and some not.

Let me know if you have any thoughts/questions:

@@ -287,6 +294,8 @@ func init() {
importCmd := NewCommand("import", migrateImportCommand)
importCmd.Flags().BoolVar(&migrateVerbose, "verbose", false, "Verbose logging")
importCmd.Flags().StringVar(&objectMapFilePath, "object-map", "", "Object map file")
importCmd.Flags().BoolVar(&migrateNoRewrite, "no-rewrite", false, "Add new history without rewriting previous")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these are great short descriptions of what the --no-rewrite and --message flags do, which is exactly what is appropriate for these.

Could you also add a longer explanation of what these flags are in git-lfs-migrate(1)? The docs/man tree is used to automatically generate commands/mancontent.go, and this in turn is what is shown when you run git lfs migrate --help, or git lfs help migrate, for instance.

The existing prose should be a good place to start in terms of what level of detail is appropriate.

@@ -29,6 +30,67 @@ func migrateImportCommand(cmd *cobra.Command, args []string) {
}
defer db.Close()

if migrateNoRewrite {
if len(args) == 0 {
ExitWithError(errors.Errorf("fatal: the migrate import command requires a list of files to import when using the --no-rewrite flag"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great; I think dying here is the right thing to do. Let's shorten this message a little bit while we're here. For instance, the phrase "the migrate import command" might be unnecessary, since this will be the first line of output after a user runs git lfs migrate import.

What about: "fatal: expected one or more file(s) with --no-rewrite"?

ExitWithError(errors.Wrap(err, "fatal: unable to find current reference"))
}

sha, _ := hex.DecodeString(ref.Sha)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great; throwing this error away is reasonable since we trust git.CurrentRef() to always give back a valid SHA-256 ref.Sha.


filter := git.GetAttributeFilter(cfg.LocalWorkingDir(), cfg.LocalGitDir())
if len(filter.Include()) == 0 && len(filter.Exclude()) == 0 {
ExitWithError(errors.Errorf("fatal: no git lfs filters setup in .gitattributes"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use "found" instead of "setup".

ExitWithError(errors.Errorf("fatal: no git lfs filters setup in .gitattributes"))
}

for _, file := range args {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one interesting consequence of rewriteTree() processing one file at a time is that there are some penalties that we'll pay if we get part of the way through this loop. In particular, if there are m arguments where m is large, and the m-1th of them doesn't match a pattern in .gitattributes, we'll have created a lot of tree objects that are dangling in the object database.

So, I think that we could run this loop twice, once to check that all given filenames match at least one pattern, and then once more to call rewriteTree(), which would mean we spend more time here. I think that's an OK tradeoff for not creating more objects than we need.

Of course, we could also look into ways to make rewriteTree() take more than one file and therefore create less dangling tree objects, but I'll leave it up to you if you want to go down that path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, thanks! I didn't think about the possibility of dangling objects.

I agree with you on running the loop twice and think that's the best option, since we won't be able to avoid checking files against the .gitattributes filters.

}
}

// findEntry searches a tree for the desired entry, and returns the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; This line is wrapped at fewer than 80 characters.

git/attribs.go Outdated
patterns := make([]filepathfilter.Pattern, len(paths))

for i, path := range paths {
patterns[i] = filepathfilter.NewPattern(path.Path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we want to change this later in a way that takes away the 1-1 correspondence between paths and patterns, we can minimize that diff by calling append() instead of using the slicing operator []. In order to make that work as-expected, you'll want to change L108 to initialize patterns with a capacity of len(paths), instead of a len of len(paths).

# - Commit 'A' has 120, in a.txt, and a corresponding entry in .gitattributes. There is also
# 140 in a.md, with no corresponding entry in .gitattributes.
# It also has 140 in subtree/a.md, and a corresponding entry in subtree/.gitattributes
setup_local_branch_with_nested_gitattrs() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ Awesome!

assert_local_object "$txt_oid" "120"

# Ensure the git history remained the same
new_commit_oid="$(git rev-parse HEAD~1)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ Nice.

setup_multiple_local_branches

# Ensure command fails if no .gitattributes files are present
[ !"$(git lfs migrate import --no-rewrite *.txt *.md)" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could check that the error message this invocation gives is what we expect it to be? I think one way to do this would be to write something of the ilk:

git lfs migrate import --no-rewrite *.txt *.md 2>&1 | tee migrate.log
if [ ${PIPESTATUS[0]} -eq 0 ]; then
  echo >&2 "fatal: expected git lfs migrate import --no-rewrite to fail, didn't"
fi
grep "..." migrate.log

Add documentation about the `--no-rewrite` flag to the man page
for the `migrate` command
* Update error message text to be more concise/descriptive
* Fix casing/formatting issues
* Inline variables to reduce scope
* Cache and reuse GitFilter instead of creating one every time
  a clean is performed
With the `--no-rewrite` flag to the `migrate import`, check all
provided files match `.gitattributes` filters before any tree
rewrite operations.
Append elements to the array rather than assigning them to
indices in GetAttributeFilter
Update `--no-rewrite` error messages to correctly case Git LFS
and use the term `filters`
In the `--no-rewrite` tests that cover error cases, test that
the provided error message is as expected
@PastelMobileSuit
Copy link
Contributor Author

@ttaylorr Thank you for the review! I've gone ahead and pushed up some new commits addressing your feedback

Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @PastelMobileSuit! I left a few minor nit-picks, but besides that I think we are ready to go.

root := commit.TreeID

filter := git.GetAttributeFilter(cfg.LocalWorkingDir(), cfg.LocalGitDir())
if len(filter.Include()) == 0 && len(filter.Exclude()) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid checking len(filter.Exclude())? My thought is that if filter.Include() is empty, but filter.Exclude() is not, we'll get past this check, but not pick up any files.


# Ensure command fails if no .gitattributes files are present
git lfs migrate import --no-rewrite *.txt *.md 2>&1 | tee migrate.log
if [ ${PIPESTATUS[0]} -eq 0 ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we exit 1 here?

# Failure should occur when trying to import a.md as no entry exists in
# top-level .gitattributes file
git lfs migrate import --no-rewrite a.md 2>&1 | tee migrate.log
if [ ${PIPESTATUS[0]} -eq 0 ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we exit 1 here as well?

exit 1
fi
)
end_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing trailing LF.

a different argument list, specialized options will become available, and the core
`migrate` options will be ignored. See [IMPORT (NO REWRITE)].

If `--no-rewrite` is not provided and `--include` or `--exclude` (`-I`, `-X`, respectively) are given, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it looks like a few lines in this file are longer than 80 columns. Can you make sure that the wrapping is set properly for .ronn files?


The `--no-rewrite` sub-mode supports the following options and arguments:

* `-m` <message> `--message`=<message>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backticks should wrap from the first - to the last > in each example.

The `import` mode has a special sub-mode enabled by the `--no-rewrite` flag. This sub-mode
will migrate large objects to pointers as in the base `import` mode, but will do so in a new commit
without rewriting Git history. When using this sub-mode, the base `migrate` options,
such as `--include-ref`, will be ignored, as will those for the base `import` mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ignoring --include-ref is totally OK here, but we should perhaps add a note in the documentation that all interested branches will have to merge new commits generated by git lfs migrate import --no-rewrite if they also want to take new commits in.

You can also migrate files without modifying the existing history of your respoitory:

```
# Without a specified commit message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move these comments out of the code block, and replace the shell with 4 spaces, and then git lfs migrate ... (and without the $).

Ensure `/` is used as the path separator when parsing Git LFS
filters in nested directories.
git/attribs.go Outdated
@@ -73,7 +73,7 @@ func GetAttributePaths(workingDir, gitDir string) []AttributePath {
fields := strings.Fields(line)
pattern := fields[0]
if len(reldir) > 0 {
pattern = filepath.Join(reldir, pattern)
pattern = filepath.ToSlash(filepath.Join(reldir, pattern))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The necessity of this change was prompted by a consistent Windows test failure I was seeing on a VM. Every time I ran the nested .gitattributes test, I saw a failure.

It turns out the problem is that filepath.Join() uses \ as the separator character on Windows, rather than /. As a result, a *.txt pattern in a folder subtree would be parsed as subtree\*.txt, and instead of acting as a path separator, the \ would escape the * character, so the pattern would then only match a file with the name subtree*.txt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix! One nit is that filepath.Join calls filepath.Clean (see [1]), which can have unintended consequences with certain pattern types (i.e., escape sequences are removed, etc.)

How about we shorten this to:

pattern = fmt.Sprintf("%s/%s", reldir, pattern)

or:

pattern = strings.Join([]string{reldir, pattern}, "/")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttaylorr Hey! I believe this code was already being used as-is with the track command, so I think that was already removing escape characters from patterns. I did some investigation to confirm this, and git lfs track "*\*.mov" gets converted to */*.mov in the .gitattributes file, and if you manually change it to *\*.mov, that's not matched by track on a file with the name a*.mov.

Copy link
Contributor

@ttaylorr ttaylorr Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PastelMobileSuit Oh! That makes sense; it looks like as a historical artifact we don't disambiguate between *\*.dat meaning "any directory name and any filename, ending in '.dat'" (on Unix) and "any filename, ending with '*.dat' (literal '*')" (on Windows).

I think it's safer to leave this behavior as-is, in which case it makes sense to me to keep using filepath.ToSlash(filepath.Join(...)). Merge away!

* Fix various formatting issues
* Add note about `--no-rewrite` commits needing to be merged into
other branches
* Exit `migrate import --no-rewrite` if attributes filter
has no elements in its Include() list
* In `--no-rewrite` tests checking errors, fail out of test if
the command doesn't fail
Shorten the repo/folder names in the nested .gitattributes test
to prevent AppVeyor failure
Convert path separators to `/` when creating a pattern from a
path to a .gitattributes filter instead of on creating the path
to the filter
@PastelMobileSuit
Copy link
Contributor Author

Okay I've pushed up some additional commits to address feedback and have finally fixed all the AppVeyor test failures, so this should be ready to go!

Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny change requested, but everything looks great! Well done.

@ttaylorr ttaylorr merged commit caf1e9a into git-lfs:master Jun 11, 2018
@ttaylorr ttaylorr added this to the v2.5.0 milestone Jun 11, 2018
@ghost
Copy link

ghost commented Sep 17, 2018

@PastelMobileSuit @ttaylorr
Hi!
I am not sure I grasp the use-case for the new "--no-rewrite" feature of git lfs import:
I was looking for a way to migrate our huge repository to lfs withouth destroying all of our (numerous) references to commit SHAs. I came to the conclusion that this is not possible and we will need to do a clean start with a rewritten history.
Then I found the announcement of the "--no-rewrite" feature and thought it may provide a way of moving all required files to lfs without changing the history (maybe by leaving the SHA intact although the content of the commits has changed). But this does not seem to be the use-case for this feature.

Can you please elaborate on when to use this feature? Sorry if I am missing something obvious, but the manual for migrate did not help me understand it and there is no further documentation that I can find.

Thanks a lot!

@ttaylorr
Copy link
Contributor

ttaylorr commented Sep 17, 2018 via email

@ghost
Copy link

ghost commented Sep 17, 2018

Hi @ttaylorr, thank you for the explanation.
I am still not sure if i got it right though:

  1. So basically "--no-rewrite" only works with the tip of a branch, by replacing existing files passed as arguments with LFS pointers and commiting that as a new change? Isn't this the same as manually removing the files from git and simply re-adding them under LFS control?

  2. As I understand it now the "--no-rewrite" option does not help with getting rid of large files referenced somewhere in the git history, because this will always require a rewrite of the history, correct?

@ttaylorr
Copy link
Contributor

ttaylorr commented Sep 17, 2018 via email

@ghost
Copy link

ghost commented Sep 18, 2018

Thanks @ttaylorr for clearing it up for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants