Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add link traversal option #96

Merged
merged 1 commit into from
Dec 7, 2018
Merged

Conversation

djdv
Copy link
Contributor

@djdv djdv commented Apr 9, 2018

This allows us to follow filesystem links (symlinks, junctions, etc.).
The logic should be good but I need a suggestions on what the flag(s) should be. traverse-links is a placeholder.

This is needed in go-ipfs so that we can traverse the link and add the target instead of adding the link itself.

@ghost ghost assigned djdv Apr 9, 2018
@ghost ghost added the status/in-progress In progress label Apr 9, 2018
@djdv djdv force-pushed the feat/resolve-links branch 2 times, most recently from e7f2f8b to d04eb43 Compare April 9, 2018 14:54
@djdv djdv requested a review from keks April 9, 2018 17:19
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I don't have many opinions on the flag (maybe --dereference and -L like in the cp command?).

cli/parse.go Outdated
@@ -475,6 +480,13 @@ func appendFile(fpath string, argDef *cmdkit.Argument, recursive, hidden bool) (

fpath = filepath.ToSlash(filepath.Clean(fpath))

if travLinks {
fpath, err := filepath.EvalSymlinks(fpath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, calling that function for every file added is going to be slow. It recursively evaluates symlinks for every component in the path.

I'd write this as:

stat, err := os.Lstat(fpath)
if err != nil {...}
if travLinks {
    for stat.Mode().IsSymlink() {
        // read link
        stat, err = ...
        // handle error.
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as far as I can tell, this won't conserve the filename (the first argument to NewSerialFile below).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about the speed, but I agree that the filename probably isn't preserved. @djdv could you add tests that verify that the code works as inteded?

@djdv
Copy link
Contributor Author

djdv commented Apr 20, 2018

Sorry for the delay, I had no notification on this for some reason.

@Stebalien, @keks
Good call on dropping EvalSymlinks.

--dereference and -L

cp seems to dereference ALL links. With this patch, we only traverse the arguments provided, not their children.

I'd be interested in adding -L in addition to this, or coalescing the 2 into 1 by adding an optional depth parameter if we have the ability to do so.
i.e.
ipfs add -L would traverse all links like cp
ipfs add -L=1 would traverse only the base argument, as it does now.
ipfs add -l=2 would traverse the base root and 1 level under that, symlinks at a depth of 2 will be added AS links.
...
Otherwise we may want to use a different parameter name.
Thoughts on this?

could you add tests

For sure, I'll try and come up with some automated tests after we decide on the above^

@Stebalien
Copy link
Member

cp seems to dereference ALL links. With this patch, we only traverse the arguments provided, not their children.

Ah. For some reason I thought this was traversing all links. This would be the cp -H option. Given that, ignore my comment about EvalSymlinks being slow. As top-level files, I believe we have to recursively eval the symlinks. I assumed that the EvalSymlinks call on . evaluated the top-level directory and all the other calls were evaluating symlinks on children.

Although... actually, I believe the fast and correct way to do this is to simply call Stat instead of Lstat here with no manual link traversal. If we pass a stat object to NewSerialFile that says the file is either a directory or a regular file, NewSerialFile will treat it as such (and follow any symlinks).


I'd add -L in a different PR and I wouldn't allow levels. I'd just have -L evaluate all symlinks recursively (although deduplicating identical subtrees would be a neat trick).

Copy link
Contributor

@keks keks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR only derefences links in the command line and not those inside the tree, I'd change the flags to -H (as in cp, ls, chown, ...) and maybe --dereference-command-line (as in ls).

@djdv djdv force-pushed the feat/resolve-links branch from b202622 to 4160477 Compare April 25, 2018 19:58
@djdv
Copy link
Contributor Author

djdv commented Apr 25, 2018

@Stebalien, @keks
-H seems appropriate but it's used in ipfs add already for hidden files. Should we just provide the long form or use something else?

Using Stat over Lstat seems to work the same, except it messes up the progress bar for ipfs add for some reason.
Above is Stat, below is when we resolve links manually. Any idea why?
progress bar

Edit:
My assumption is that somewhere we're picking up the size of the link when we should be picking up the size of the target

@keks
Copy link
Contributor

keks commented May 2, 2018

@djdv

Should we just provide the long form or use something else?

I think I'd go for long-form only right now and see if we come up with a good short flag later.

My assumption is that somewhere we're picking up the size of the link when we should be picking up the size of the target

Yes, that's what I'm thinking, too. The progress bar get's the total file size here, you could log the file size there and see if that's what you expect. Also, using Printf("%T", sizeFile) you could find out what kind of file that is to further debug the cause.


One minor nitpick: In cli/parse.go:478 you have two declarations in a row - I'd use a var () block instead.

@djdv
Copy link
Contributor Author

djdv commented May 4, 2018

@keks, @Stebalien
Thanks for that!
The types I'm getting are serialFiles.

I have something working but I'm not sure if I'm approaching this right, I'd like some input on this.

I managed to get accurate sizes by modifying serialFile.Size() to either deep-resolve the size of symlinks or not, depending on their depth. Currently they're just ignored and sized as 0.
With -H, root links get sized, and deeper links do not. If we use -L we just ignore the depth limit and size all links.

However, I did this by hard coding a depth limit.
Passing variables through to serialFile.Size() would require adding a member to its struct, and changing NewSerialFile to accept it.
The current signature is

func NewSerialFile(name, path string, hidden bool, stat os.FileInfo) (File, error)

I would likely need something like this

func NewSerialFile(name, path string, hidden, deref, derefAll bool, stat os.FileInfo) (File, error)

I figured if this has to change anyway it may be best to do something like this instead:

type Options struct {
	handleHiddenFiles bool
	resolveRootLinks bool
	resolveAllLinks bool
}
func NewSerialFile(name, path string, options Options, stat os.FileInfo) (File, error)

In either case, I could just switch off of them inside of Size() like this

//if file is a symlink {
// -H
if f.options.resolveRootLinks {...conditionalySize(link)...}
// -L
if f.options.resolveAllLinks {...alwaysSize(link)...}
// neither
// just return the size of the link itself
//}

@parkan
Copy link

parkan commented Sep 4, 2018

+1 to get some eyes on this again!

@Stebalien
Copy link
Member

@djdv (please bug me if I don't respond)

I see why you'd need this for the recursive case but I think the issue here is really walk. That is, serialFile can stat f.FullPath() and, if it's a symlink while f.stat says it should be a directory, it can dereference the symlink. Sound reasonable?

@djdv
Copy link
Contributor Author

djdv commented Sep 17, 2018

@Stebalien
I think I understand, however, getting the size is not so much an issue. I had a patch previously that resolved the links inside of size as appropriate, but the problem came in deciding when to resolve.

For a recursive approach, we would need to know inside of Size() what flags were passed in, so that we either dereference links or not, (and to what depth if we choose to support that).

I have to revisit this though, the patch I had working before, doesn't anymore, because the types have split since then (Need to change method for SliceFile.Size())

I believe we're going to have that same requirement though, since the only other option is to dereference only the commandline (during initial parsing).

That is to say, I think we can have -H without further modifications, but -L and/or -L=int would likely require Size() to be aware of some kind of resolve depth limit, or an "alwaysResolveLinks" type of bool. But in either case, it seems like we'd need to pass something through.

@djdv djdv force-pushed the feat/resolve-links branch from 4160477 to b6c0dc4 Compare September 17, 2018 18:49
@djdv
Copy link
Contributor Author

djdv commented Sep 17, 2018

I pushed a current WIP that depends on these:
go-ipfs/feat/link-traverse
go-ipfs-cmdkit/fix/serial-size

But seems to work.

I need to test it more, but we also have to come to a decision on what we want to support, and how to change the NewSerialFile options.
In the WIP commits, I just extended it, but I don't think this is ideal.

Edit:
Invocation examples:
ipfs add -r --dereference-command-line=0 symlink-to-dir adds the link, as a link (same as ipfs add with no deref args)
ipfs add -r --dereference-command-line=1 symlink-to-dir resolves the link and adds the target (default if --dereference-command-line is provided with no value)
ipfs add -r --dereference-command-line=2 symlink-to-dir resolves the link, and links 1 level underneath it
...
ipfs add -r --dereference-command-line=-1 symlink-to-dir resolves all links at every level

Note: I forgot to implement -1 in the current patchset.
At the moment, this is only related to sizing, not the actual adding itself.

@djdv djdv force-pushed the feat/resolve-links branch from b6c0dc4 to 790b550 Compare September 18, 2018 16:45
@djdv
Copy link
Contributor Author

djdv commented Sep 18, 2018

Latest patch works, but is inaccurate on the progressbar for 1 of my test scenarios.
I have a tree like this

.
├── mostly-real
│   ├── File1 -> ../all-real/File1
│   ├── File2
│   ├── File3
│   └── File4
├── all-real
│   ├── File1
│   ├── File2
│   ├── File3
│   └── File4
├── sym-to-real -> all-real
└── sym-to-mostly-real -> mostly-real

Calling ipfs add -r --dereference-command-line=2 "sym-to-mostly-real", adds as expected.
However, you end up in a situation where the link for File1 is being sized as a link, so your progress goes over 100%. (e.g. 100MB/75MB added).

Semi-related, I may need to add link type-detection up front, in go-ipfs.
This call ipfs add --dereference-command-line=2 "sym-to-mostly-real" succeeds without the -r flag, when it should probably fail.
I'm not actually sure.
"sym-to-mostly-real" isn't a directory, but is added recursively with --deref...
I'm not sure if there should be special handling around this.
Likely, I would check if --deref is set, and then check the type and if -r was set or not, returning the typical error.

@djdv djdv force-pushed the feat/resolve-links branch from 790b550 to bceb240 Compare September 18, 2018 19:46
@djdv
Copy link
Contributor Author

djdv commented Sep 18, 2018

Everything seems to be in order on this end now. I still need to fix the -r issue in go-ipfs though.
link traversal

I'll have to rewrite the tests as well, however I'm not sure how to best do that here.
Testing in go-ipfs would simply be testing add against this tree. But inside cmds itself I'm not too sure.

@djdv
Copy link
Contributor Author

djdv commented Sep 18, 2018

The names of flags need to be decided on as well.
To recap: -H is in use on ipfs add for adding hidden files.

This is my current thought:
Have a --resolve-links=# parameter, with --dereference-command-line and -L being aliases to arguments resolve-links=1 and resolve-links=-1 respectively.

Something like dereference-links and link-resolve-depth, seems fine too if not too verbose.
Open to suggestions on any of these.

@Stebalien
Copy link
Member

I'm not sure depth-limited link traversal is really all that useful, from a user's perspective. I'd really just make these two different options:

  1. Something that resolves commandline arguments. (like ls -H).
  2. Something that resolves all symlinks recursively (like ls -L).

The second feature will require an additional option to NewSerialFile constructor but I'd do that in a followup patch. Note: we'll also have to detect symlink cycles in this case.

@Stebalien
Copy link
Member

Note: to avoid breaking everything, we may want to add a new constructor that takes options using the "functional option" pattern (like we do here: https://github.com/libp2p/go-libp2p-kad-dht/blob/master/opts/options.go). That way, we can stop breaking this every time we need a new option.

@djdv
Copy link
Contributor Author

djdv commented Sep 19, 2018

I'm not sure depth-limited link traversal is really all that useful, from a user's perspective.

I'm wondering this myself. It's only exposed at the moment for testing, but seemed like it might be useful if people wanted to import complex datasets. However, I don't really know if anyone structures data like this.

In implementation, I don't think it will change much if we omit it. But I have to look at it.
In the meantime, if anyone monitoring the thread has examples of nested levels of symlinks being used for something, please speak on it.

Note: we'll also have to detect symlink cycles in this case.

I need to double check, but I think Go's EvalSymlinks handle's this internally. After enough cycles it should return an error.

functional options

👍

@djdv djdv force-pushed the feat/resolve-links branch from bceb240 to e156fdf Compare September 19, 2018 18:04
@Stebalien
Copy link
Member

@djdv can we try adding a sharness test in go-ipfs for a recursive link (A/B/C -> A). I don't think FollowLinks will catch this cycle.

Copy link
Contributor

@keks keks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get all of this. How are the two other branches related to this PR?
Also can we name the option --dereference-links?

@djdv
Copy link
Contributor Author

djdv commented Sep 20, 2018

@keks
Sorry, I kind of hastily dumped that here.

--dereference-command-line support in go-ipfs would be implemented with these changes:
master...djdv:feat/resolve-links
https://github.com/djdv/go-ipfs/tree/feat/link-traverse
This resolves arguments only.

--dereference-links support in go-ipfs would be implemented with these changes, but will go into a separate PR (it's here just for reference):
master...djdv:feat/deep-resolve-links
ipfs/go-ipfs-cmdkit@master...djdv:feat/resolve-links
ipfs/kubo@master...djdv:feat/deep-link-traverse
This resolves all links, including arguments and all descendants.

Resolving N levels deep doesn't seem useful, so it's been removed to make the --dereference-links implementation a little simpler.


I believe this should all be good. But the --dereference-links PR should be dependent on functional-options, so that will have to be resolved before it can be merged. Not sure if someone wants to handle adding those independently or not.

There's also no tests at the moment. The go test's I had previously, are no longer valid.
I've been testing using a local file tree with go-ipfs.
I'm not sure a good way to test this isolated from go-ipfs, as the multihash output is what I've been checking.

All in all, I guess we need to layout if there are any more unfulfilled requirements for --dereference-command-line.
If not, I can open a PR for it in go-ipfs to accompany this one.
Then I'd open a new PR for --dereference-links and follow the same pattern.
See what's left to do (functional-options, go-ipfs tests, ??), and open a go-ipfs PR to add the parameter.

@Stebalien
Copy link
Member

(ping me when there is a PR against go-ipfs and @keks has reviewed it).

cli/parse.go Outdated
func resolveCommandLine(req *cmds.Request) bool {
linkOpt, ok := req.Options[cmds.DerefLong].(bool)
return linkOpt && ok
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linkOpt will automatically be false if ok is false. So you could just

linkOpt, _ := ...
return linkOpt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ this comment is still valid (really, I'm not sure if we even need a function for this).

Copy link
Contributor Author

@djdv djdv Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shortened that and another section.
Edit:
Hold on this. The change is causing problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien
Sorry about that, the semantic difference between casting the return from the map if req.Options[cmds.DerefLong].(bool), and casting the return of a type assertion
safeVar, _ := req.Options[cmds.DerefLong].(bool) tripped me up.

Copy link
Contributor

@keks keks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long period of silence.

Regarding the code in this PR, it looks pretty solid, but I left two comments. Also, tests would be really nice.
We need a better test infrastructure for this repo, but for now you could add a command to http/handler_test.go and add a tests case in http/http_test.go. This will not test the local case but at least it tests something and when we sit down to build better testing, we already have a bunch of test cases that we can re-use and generalize.

I also have some more questions on the deep-resolve-links branch, but let's discuss that in a new PR :)

opts.go Outdated
)

// options that are used by this package
var OptionEncodingType = cmdkit.StringOption(EncLong, EncShort, "The encoding type the output should be encoded with (json, xml, or text)").WithDefault("text")
var OptionRecursivePath = cmdkit.BoolOption(RecLong, RecShort, "Add directory paths recursively").WithDefault(false)
var OptionStreamChannels = cmdkit.BoolOption(ChanOpt, "Stream channel output")
var OptionTimeout = cmdkit.StringOption(TimeoutOpt, "set a global timeout on the command")
var OptionTimeout = cmdkit.StringOption(TimeoutOpt, "Set a global timeout on the command")
var OptionDerefArgs = cmdkit.BoolOption(DerefLong, "Resolve link arguments, instead of adding links as links").WithDefault(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.WithDefault(false) doesn't do anything because the zero value is implicit default.

@djdv djdv force-pushed the feat/resolve-links branch 2 times, most recently from 5889674 to 8923429 Compare November 28, 2018 18:22
@djdv
Copy link
Contributor Author

djdv commented Nov 28, 2018

@keks
Sorry for my extreme delay, a few unexpected events came up. 👀

I rebased this patch, and removed some default bool calls.
I also added tests of this in go-ipfs here: ipfs/kubo#5801
Relevant test

Do you think this is sufficient?

Edit: FWIW I couldn't figure out a good way to handle tests in http/... since the tests there seem to be based around operations rather than their arguments. In this case, we need to test that the input (a symlink) is transformed, before it gets to the operation.
Or in the case of sharness, just compare known outputs.

@djdv djdv force-pushed the feat/resolve-links branch from 8923429 to 4504227 Compare November 28, 2018 19:15
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits but otherwise LGTM.

opts.go Outdated
var OptionStreamChannels = cmdkit.BoolOption(ChanOpt, "Stream channel output")
var OptionTimeout = cmdkit.StringOption(TimeoutOpt, "set a global timeout on the command")
var OptionTimeout = cmdkit.StringOption(TimeoutOpt, "Set a global timeout on the command")
var OptionDerefArgs = cmdkit.BoolOption(DerefLong, "Resolve link arguments, instead of adding links as links")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "Dereference symlinks appearing in arguments instead of adding them as symlinks".

Not sure about the "appearing in" part but I think it's slightly better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinions on "Symlinks supplied in arguments, are dereferenced"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, minus, the, comma 😄.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh — more of an em dash kind of person I see.
Changed.

cli/parse.go Outdated
func resolveCommandLine(req *cmds.Request) bool {
linkOpt, ok := req.Options[cmds.DerefLong].(bool)
return linkOpt && ok
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ this comment is still valid (really, I'm not sure if we even need a function for this).

@djdv djdv force-pushed the feat/resolve-links branch 4 times, most recently from 9847a9f to 0e426e5 Compare November 29, 2018 01:36
Copy link
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be a little late in the game here, but could we consider renaming --dereference-command-line to the less tedious --dereference-args. As far as I can tell --dereference-command-line is only used by GNU ls as a long name for -H. Here we do not have the option to use the shorter -H so I think it warrants more careful consideration to something less tedious to use.

@Stebalien @keks what do you think? If you don't agree feel free to dismiss my review.

@Stebalien
Copy link
Member

I'd like to be as consistent with other tools as possible (but really, they're both a mouthful).

(at the end of the day, I don't really care much either way)

@kevina
Copy link
Contributor

kevina commented Nov 30, 2018

I'd like to be as consistent with other tools as possible (but really, they're both a mouthful).

GNU du uses --dereference-args GNU ls uses --dereference-command-line so this is not really consistent between tools, I can not seam to find these options used anywhere else.

@Stebalien
Copy link
Member

Ah, well, in that case, I'd be happy either way.

@djdv
Copy link
Contributor Author

djdv commented Dec 2, 2018

Likewise, I have no opinion on this.
Even if we were to break the non-existing convention(s), and use something original.

Options like this seem very script heavy to me, I doubt it will get much interactive (shell) use. So I'm not against verbosity. Tab completion is also a thing anyway.

But there's nothing wrong with brevity either.

@kevina
Copy link
Contributor

kevina commented Dec 3, 2018

Yeah, unless someone is against it I think it is better to go with the shorter --dereference-args.

Sorry for the trouble and being late with my review.

License: MIT
Signed-off-by: Dominic Della Valle <[email protected]>
@djdv djdv force-pushed the feat/resolve-links branch from 0e426e5 to be09a40 Compare December 3, 2018 15:06
Copy link
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@Stebalien Stebalien merged commit 4263ae6 into ipfs:master Dec 7, 2018
@ghost ghost removed the status/in-progress In progress label Dec 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants