-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive add, and symlinks #5421
Comments
I fully agree that symlinks supports needs to be resolved. (causing issues for me as well, as you said there already are open issues about it)
In this case it points outside the root. Magically replacing all symlinks on add could be considered a bug (but also a nice feature if it is an optional argument) The gateway code (and a few other parts) needs to get support for understanding these symlinks, as long as it is "in tree" it should be possible. Maybe the symlink format should be extended with a hash of the symlink destination. |
Related: ipfs/go-ipfs-cmds#96 |
So, @djdv is adding a For relative symlinks that don't leave the root, we could probably follow them on |
I just ran into this problem myself. What I want to do is setup an archlinux mirror in ipfs. The problem is upstream mirrors heavily use symlinks to central pool directories. What I would like to happen is that safe-symlinks (like rsync) are a pointer to the other object. Currently, ipfs stores the symlink just as a symlink. What I would like to happen is that safe-symlinks (like rsync) are a pointer to the other object. Which means that the ipfs object would store both the symlink information (what path does the symlink go to) as well as a reference to the ipfs object it would have pointed to. Example: The first object would store the symlink (../../../pool/community/go-ipfs-0.4.17-1-x86_64.pkg.tar.xz) as well as the ipfs object it points to (QmXLuLzshCejKHqP4fUdv7W7x5aVZKySjE8m8CKjLPut8f) If a user does ipfs get or cat on the symlink, it should resolve to the contents of the file. Since the symlink by itself is useless. If a user does an get on a directory structure that contains both the symlink and the target, it should download as a symlink. symlinks outside of the directory structure being added or retrieved should never resolve unless the user explicitly says to dereference. |
@georgyo project to do arch mirroring: https://github.com/VictorBjelkholm/arch-mirror and has its own workaround for symlinks. I fully agree with your points. |
Yeah, that's a reasonable approach. |
I'd like to tackle this specific issue and am looking for the consensus solution. Here are some things I've noted from the conversation so far:
|
I actually really like @georgyo idea of storing both the link itself and the object it points to. It will solve a long standing objection I have with resolving symbolic links in the gateway as mentioned in the unfinished p.r. here: #3508 (comment). |
Ultimately, I think mimicking all the symlink flags rsync has is the best idea, but that is a lot of scope creep for getting something working. @michaelavila Here is my opinions on the questions you asked. I think you flipped when we would store a pointer to the object and when we would not.
Another major point is what happens when I pull down the tree vs what happens when I pull down the object. In all these case the symlink object has both the symlink information, as well as the reference to the ipfs object it was pointing to. Case 1: The symlink object points to a file in the tree that is being fetched.
Case 2: The symlink object points to a file outside of the tree that is being fetched.
|
@Stebalien after a little research, I better understand @georgyo's proposal, but I have some questions: First, I think the pairing of path/hash for symlink is a nontrivial change as it has to either:
Questions:
|
@georgyo we commented at the same time it looks like, but I've read your comment and that makes a lot of sense, thank you. Also, good point about using the concept of maxdepth instead of cyclic as it covers both cases, thanks for that. I think I understand the proposal and am willing to move forward with it if @Stebalien is OK with that and also weighs in on my question of what approach to take when storing the symlink path + hash. |
I also like @georgyo's solution. @michaelavila I was worried about that and thought this might have a unixfs-v2 dependency but, on further consideration, I think this is pretty trivial: We can just add a "target" link to the links array in the DagPB node (we don't need to modify the unixfs protobuf at all). Older clients will simply ignore this link. The tricky part will be updating the link; I'm not actually sure there is an efficient way to do this unless we create some kind of link index. For adding and retrieval, this is what I'm thinking: On add, when we encounter a symlink:
This should just work in all cases. On get, when we encounter a symlink, we'll have to provide a few options. I believe @georgyo is suggesting:
The tricky part will be figuring out how to present this all to the user. We'll probably need something like:
Summarizing, the tricky parts will be:
@michaelavila I believe the simplest thing to start with is resolution. That is, the ability to run |
Note: while not currently on our OKRs for this quarter, I'd still consider this a priority because proper symlink support is necessary for a 1.0 release. |
Trying to catch up here since this was referenced in ipld/legacy-unixfs-v2#16 I'm not sure if encoding the A symlink copied from one filesystem to another may not resolve to the same file. This is one of the hazards of symlinks. If we want to solve this issue then we probably shouldn't use symlinks :) If the goal is to have this node always point to the same data referenced by the symlink at encode time then just make it the same unixfs file object. You're effectively trying to fix one of the hazards of symlinks and, in that case, you might as well go all the way and modify the actually representation to fit what you wish the filesystem was. If we at any point need to preserve all the hazards of symlinks then we shouldn't try to encode the CID into them at all. Just encode where it points to and try to resolve it when you get the path in MFS. What I see becoming increasingly problematic is trying to preserve several aspects of traditional filesystems while selectively solving specific issues, all at the same format layer. IMHO, the format layer should preserve as much about the original filesystem as possible, warts and all. If the |
The issue here was that IPFS isn't rooted so users may share |
i need a resolution, what's the point in doing "nothing?" people are using ipfs to store massive amounts of data; my personal opinion is that you should duplicate the file in full in place--and do the same thing you should always be doing, ensure it's only really stored once in IPFS. choosing which file is the "hard file" ... if both were restored ... doesn't matter to me at this particular moment; though it does matter. i am trying to backup a database file that would be importantly on another partition--not my IPFS backup drive, if it were live. it's not so f-it. i have to copy the largest dataset i've ever seen "over and over" now because IPFS doesn't make anything immutable. edit: even paying and utilizing every provider you have available. estuary, pinata, <f/fit> ... eternum, inferno ... I mean "infer" that there's a data retention in the universe problem. i just noticed your wikipedia ... problems will be handled. right now, IPFS has a problem, nobody is offering pinning services that are reliable; and you aren't offering a service that doesn't force me to ensure I have my data stored somewhere else also, which I do. peace. (((( my personal solution will always be the same. if i don't have an SLA i will use IPFS as a "hopeful" forever storage ... and anything I need to have "in more than just my own datacenter or more than just on S3 buckets" will be in 3 places, and not just on FIL. )))) |
Version information: ipfs version 0.4.17-rc1
Type: Bug ?
Description:
I'm trying to upload the distribution directory for the Archive.org interface to IPFS so it can be accessed via IPFS (or via IPNS).
Problem is that the add process is
ipfs add -Qr dist
which appears to work.
However, but if you look at
https://ipfs.io/ipfs/QmehDxtj8UjivCXFSV4XMXCh8zVxER4D57LguXogX8rsRD/
you'll see that the symlinks to the bundles aren't uploading correctly and clicking on them gets the
ipfs cat /ipfs/QmehDxtj8UjivCXFSV4XMXCh8zVxER4D57LguXogX8rsRD/dweb-objects-bundle.js: cannot currently read symlinks
message.I understand that this raises issues (there are a number of open issues that relate to symlinks). I'm just wondering what IPFS thinks the "correct" behavior should be, i.e. is a failure to upload to be expected - which needs a permanent workaround, or is it a bug in
ipfs add -r
that will at some point get fixed (needing something short term till its fixed).The problem is that its non-trivial to work around. There is a reason these links are symlinks, and there is no (non-hacky) way I can figure out to fix just those files in the script that includes
ipfs add
. For example, ifipfs add dweb-transports-bundle.js
from the place that file is sitting, then I can't easily insert that link into the directory structure that is pointed to by theipfs add -r
Personal opinion (and there may be good reasons I'm wrong :) would be that
ipfs add -r
should have an option that copies the destination of symlinks found.The text was updated successfully, but these errors were encountered: