Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate ipfs tar add / ipfs tar cat #7951

Closed
willscott opened this issue Feb 26, 2021 · 16 comments · Fixed by #8849
Closed

deprecate ipfs tar add / ipfs tar cat #7951

willscott opened this issue Feb 26, 2021 · 16 comments · Fixed by #8849
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/proposed

Comments

@willscott
Copy link
Contributor

Currently, go-ipfs exposes a pair of functions, tar add and tar cat, which seem not widely used, and import an interesting representation of tar files.

There are some issues with this implementation (e.g. it loads the entire tar archive into memory), and the representation doesn't play nice with unixfs.

We should provide tar import/export using the mfs package to represent a unixfs subtree, and deprecate the tarfmt ipld codec.

@willscott willscott added kind/enhancement A net-new feature or improvement to an existing feature status/proposed labels Feb 26, 2021
@aschmahmann
Copy link
Contributor

aschmahmann commented Mar 1, 2021

This can be deprecated, but people may be using the codec so we probably need to maintain it so people don't lose access to their data. I assume that the ask here is around the go-ipld-prime migration and converting codecs to the new form.

Probably the best bet is to make a not particularly performant go-ipld-format-> go-ipld-prime wrapper, but I'm not sure what the workload for this is.

EDIT: I removed the incorrect references to codecs, the tar commands use DagPb it's basically just a custom type with an importer/exporter

@willscott
Copy link
Contributor Author

we probably need to maintain it so people don't lose access to their data

The proposal i guess is that we have a version with import supported to unixfs.
At that time we add a warning on use of this tar command that it is deprecated and that data in this format should be exported and then imported using the new one.
Then in a subsequent version we should get rid of this command & code path.

@RubenKelevra
Copy link
Contributor

Well, UnixFS 1.0 can't handle any attributes, like users or groups IIRC. Importing a tar into UnixFS would basically destroy this information.

To make this limitation clear a flag could be used like --no-metadata while an add without this flag will fail until UnixFS can handle all stored metadata.

@willscott
Copy link
Contributor Author

@RubenKelevra This is a good point. Are you making use of this representation of tar in order to store metadata?
The counterpoint might be that if you want metadata you should store the entire tar archive as a unixFS file.

@RubenKelevra
Copy link
Contributor

RubenKelevra commented Mar 2, 2021

@willscott my point wasn't that I'm using it that way. But when we deprecate the tar add/tar cat functionality and expect users to switch, we should make clear that there are limitations.

When I'm extracting a tar to a filesystem - and IPFS UnixFS is exactly this - the expectation is, that the user/group read/write/execute rights are set as well.

I'm running an ipfs cluster that stores compressed tars, but since the checksum must match I don't extract them and import them as tar. It's a bit sad since I think there would be quite a lot of redundancies between the files, but the compression of each file makes them inaccessible for IPFS.

@lidel
Copy link
Member

lidel commented Apr 30, 2021

Would ipfs dag export|import be a safer replacement for ipfs tar add|cat that works with all DAG types?
(I feel in most cases people want to have a "single-file-copy-of-a-CID" and don't really care if its TAR or CAR)

If so, we should do something similar to #8098

@aschmahmann
Copy link
Contributor

aschmahmann commented Apr 30, 2021

@lidel these are pretty different use cases. ipfs dag import/export is used for extracting any IPLD graph into a portable format.

IIUC ipfs tar is meant for converting tar files into a DAG but in a way that is friendlier to deduplication then what you would get by just using UnixFS.

(I feel in most cases people want to have a "single-file-copy-of-a-CID" and don't really care if its TAR or CAR)

It's surprising to me that someone would notice that the HTTP API output of ipfs get is a TAR file and then try and import it with the ipfs tar command. We don't even talk too much about how ipfs get exports a TAR because the frequently used utilities (e.g. the js http client, go http client and go-ipfs CLI) un-tar the object and just give you a directory with files.

We can add more guardrails and help text to clarify, but that command has been around a long time and today is the first I'm hearing of this confusion.


Separately, we may want to remove support for the ipfs tar commands since they're not commonly used and have associated maintenance and user-learning costs associated with them. However, unlike with #8098 ipfs tar does not have a 1:1 replacement available.

@RubenKelevra
Copy link
Contributor

RubenKelevra commented Jan 23, 2022

I don't think we want to remove the ipfs tar support.

Tar is still the goto tool to ship software. Ipfs uses it itself to ship its binaries and sourcecode.

We should be able to import a tar file, keep all attributes in an efficient manner and on an export recover them all (if the user rights are sufficient enough).

In the future this would allow us to extract .tar.xz/.tar.gz/.tar.bz2 etc. and import the tar file itself to make use of deduplication.

If ipfs implements compression itself, we can reach the same transfer speeds as a compressed tar file, while still be able to deduplicate the content.

This allows us to support also reproduceable builts with signatures of the actual binaries, while the tar part can handle the file and folder attributes and we can transparently apply compression on top of that.

@kallisti5
Copy link

Oof. I didn't know about this feature. I'd use it extensively.

Is it documented?

@RubenKelevra
Copy link
Contributor

RubenKelevra commented Jan 23, 2022

@aschmahmann
Copy link
Contributor

@RubenKelevra @kallisti5 the ipfs tar command is almost certainly not what you're looking for. It's creating a custom DAG-PB IPLD format that's just for tar files. It's not UnixFS so it won't render on gateways or work with ipfs get (which is why there's ipfs tar cat).

Probably what you'd rather have is something like ipfs add --chunker=tar which would create a UnixFS file where the chunking occurs on the boundaries of the internal objects of the tar file. Some inspiration could come from https://github.com/bmwiedemann/ipfs-iso-jigsaw. If this logic gets implemented it's usable with go-ipfs even before it's added to go-ipfs by piping the output into go-ipfs via a CAR file, e.g. ipfs-tar-chunker my.tar | ipfs dag import.

@RubenKelevra
Copy link
Contributor

@RubenKelevra @kallisti5 the ipfs tar command is almost certainly not what you're looking for. It's creating a custom DAG-PB IPLD format that's just for tar files. It's not UnixFS so it won't render on gateways or work with ipfs get (which is why there's ipfs tar cat).

Probably what you'd rather have is something like ipfs add --chunker=tar which would create a UnixFS file where the chunking occurs on the boundaries of the internal objects of the tar file. Some inspiration could come from https://github.com/bmwiedemann/ipfs-iso-jigsaw. If this logic gets implemented it's usable with go-ipfs even before it's added to go-ipfs by piping the output into go-ipfs via a CAR file, e.g. ipfs-tar-chunker my.tar | ipfs dag import.

Well as far as I can see --chunker=tar is completely undocumented.

I don't want to rely on an undocumented function.

But on the other hand, why is not every tar file automatically split with with chunker? 🤔

@aschmahmann
Copy link
Contributor

That chunker doesn't currently exist someone would have to build it, and again you can build something that does this even without added code to go-ipfs. My point is that you almost certainly want UnixFS chunking here rather than a custom IPLD format.

If it's all merged in you could figure out the UX for things like type detection and default format chunkers. However, that's super off topic for this issue which is "let's kill the unused and largely not useful ipfs tar command" not planning out alternative features which could have done it's proposed job better. If you want to talk about that I'd start a thread on discuss.ipfs.io or open a new feature request.

lidel added a commit that referenced this issue Apr 5, 2022
@lidel
Copy link
Member

lidel commented Apr 5, 2022

fysa I'm officially marking them as deprecated in #8849

lidel added a commit that referenced this issue Apr 5, 2022
lidel added a commit that referenced this issue Apr 6, 2022
@michel47
Copy link

my use case for the ipfs tar comand is to backup files with same name which is not allows with plain "ipfs add

find /somedir -name 'README.md' | ipfs tar add -

what would be the new way for doing it once the ipfs-tar is depreciated ?

@Jorropo
Copy link
Contributor

Jorropo commented Aug 19, 2023

find /somedir -name 'README.md' | tar -c --no-recursion -T - | ipfs add -w --stdin-name readme.tar

ipfs tar used to have a custom tar specific encoding instead of being encoded as files.

ipfs adding tars is better because tars are just unixfs files, which means clients don't need to implement a whole new custom thing.
This maybe is less efficient but I want to add a content aware chunker so they would be efficiently deduplicate (soe the tar mode of the chunker would deduplicate both the tar and the underlying file by carefully chunking at the tar content boundries so that the CID of the tar is linking to tar's metadata and the original file CIDs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/proposed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants