Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gateway): TAR response format #9029

Merged
merged 30 commits into from
Nov 9, 2022
Merged

feat(gateway): TAR response format #9029

merged 30 commits into from
Nov 9, 2022

Conversation

hacdias
Copy link
Member

@hacdias hacdias commented Jun 9, 2022

Summary

How to use it

HTTP clients can request TAR response by passing the ?format=tar URL
parameter, or setting Accept: application/x-tar HTTP header:

$ export DIR_CID=bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq
$ curl -H "Accept: application/x-tar" "http://127.0.0.1:8080/ipfs/$DIR_CID" > dir.tar
$ curl "http://127.0.0.1:8080/ipfs/$DIR_CID?format=tar" | tar xv
bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq
bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq/1 - Barrel - Part 1 - alt.txt
bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq/1 - Barrel - Part 1 - transcript.txt
bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq/1 - Barrel - Part 1.png

Tasks

Things to note:

Closes #7746

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for driving this @hacdias

Looks sensible, next steps:

  • Update http-gateways/PATH_GATEWAY.md specs
    • Rationale: downloading UnixFS directory as TAR massively increases utility of gateways in use cases where CAR is not necessary, so this should be part of the spec
    • PR against my PR (Add HTTP Gateway Specs specs#283) for now
  • needs sharness test that downloads a directory as TAR and saves it to disk (curl piped to tar command should be enough)

core/corehttp/gateway_handler_tar.go Outdated Show resolved Hide resolved
core/corehttp/gateway_handler_tar.go Outdated Show resolved Hide resolved
@lidel lidel added this to the Best Effort Track milestone Jun 9, 2022
@hacdias hacdias force-pushed the feat/download-tar branch 2 times, most recently from 04248e4 to 39f1cb4 Compare June 10, 2022 12:13
@hacdias
Copy link
Member Author

hacdias commented Jun 10, 2022

@lidel thanks for the review. I added two tests (not sure if on the right file).

  1. This only adds support for .tar.
  2. Specs on IPIP-288: TAR Gateway Response Format specs#288
  3. .tar.gz discussed on feat: gateway support for tar.gz #9034
  4. Codecov is saying that the tests do not cover any of the code which I don't understand why.

@hacdias hacdias requested a review from lidel June 10, 2022 12:27
@hacdias hacdias marked this pull request as ready for review June 10, 2022 12:27
@hacdias hacdias changed the title feat(gateway): .tar response format feat(gateway): TAR response format Jun 10, 2022
@hacdias hacdias force-pushed the feat/download-tar branch from 3e895e2 to d82f396 Compare June 10, 2022 12:57
@lidel lidel added the status/blocked Unable to be worked further until needs are met label Jun 14, 2022
@lidel lidel self-assigned this Jul 26, 2022
@hacdias hacdias force-pushed the feat/download-tar branch from d82f396 to a3b08be Compare August 15, 2022 14:38
@lidel lidel removed the status/blocked Unable to be worked further until needs are met label Sep 1, 2022
@BigLep BigLep assigned hacdias and unassigned lidel Sep 1, 2022
@lidel lidel assigned lidel and unassigned hacdias Sep 1, 2022
@lidel
Copy link
Member

lidel commented Sep 1, 2022

(I have unfinished review of this pending and some local test code, re-assigning to myself, as some notes for implementers need to be bubbled up to IPIP)

@BigLep BigLep assigned hacdias and unassigned lidel Sep 30, 2022
@BigLep BigLep mentioned this pull request Sep 30, 2022
@hacdias hacdias force-pushed the feat/download-tar branch from 96bcd7e to 54fbf94 Compare October 3, 2022 11:53
@hacdias
Copy link
Member Author

hacdias commented Oct 4, 2022

To test the security concern raised by @lidel in this comment regarding overwriting files by having custom relative paths inside a custom made UnixFS DAG, I... built my own DAG that tries to exploit that same issue.

CID: bafybeieq5yvoaqyrfvyiwqvi7hqhi54hyqhyatkn3pgbjxylpxdm5cj5t4 (CAR here)

└─dir
  │ file
  │ ../../../malicious-file		<- Would rewrite a file outside the root directory
└─../../malicious-dir			<- Would rewrite a directory outside the root directory
  │ file 

I just pushed a WIP version of a "Base Dir TAR Writer" which is essentially a copy of the TAR Writer in go-ipfs-files. The difference is that we always ensure that all files are under a certain base directory.

	if !strings.HasPrefix(fpath, w.baseDir) {
		fpath = strings.Replace(fpath, ".", "", -1)
		fpath = strings.Replace(fpath, "..", "", -1)
		fpath = path.Join(w.baseDir, fpath)
	}

It does what it is supposed to do (only tested locally, no testcase). Some things I'm still unhappy with:

  1. We do not keep the intended file tree of the directory. Since all files that would fall outside of the base directory are now at the root of the base directory, the tree isn't maintained. A way to maintain the structure would be to traverse the entire file tree first, and "calculate" the correct root directory in order to preserve the entire file tree.
    • But maybe this would take more valuable processing time?
    • One of the reasons why I did not write a test case yet because it will change slightly depending on whether we want to preserve the whole tree or just truncate the odd files to the root directory.
  2. It is just a copy paste of the original TAR Writer with 5 lines added. There may be an easier way to reuse the original code that I can't see right now.

Note that the original TAR Writer is also used in ipfs get --archive, which means that that command also suffers from the same security 'concern'. I assume that it is not a real concern since for a command line utility, we assume the user knows what they are downloading and its their responsibility. Let me know if I'm wrong here.

Thoughts? /cc @lidel

@BigLep
Copy link
Contributor

BigLep commented Oct 5, 2022

I haven't been following the details closely, but I know we have had security issues with tar issues over the years. Can we make sure the security hardening is all in one place and shared across the stack (rather than copy and pasted)?

@hacdias
Copy link
Member Author

hacdias commented Oct 5, 2022

I haven't been following the details closely, but I know we have had security issues with tar issues over the years. Can we make sure the security hardening is all in one place and shared across the stack (rather than copy and pasted)?

I definitely agree and I think it'd be better to just update this on go-ipfs-files. However, I also want to ensure that everyone agrees with the strategy. Right now, all files that would otherwise fall outside the root directory are forced to stay within the root directory. This does not keep the tree structure.

Yesterday during the colo @lidel mentioned that it could also be a great idea to just return an error and fail building the TAR. That is the behaviour of ipfs get, as ipfs get fetches a TAR and unpacks it . The unpacking fails if there are files that point outside the root directory. This error is triggered in tar-utils:

https://github.com/ipfs/tar-utils/blob/91e41e72cd61595a086c48862277c419a994d971/extractor.go#L189-L207

And also:

https://github.com/ipfs/tar-utils/blob/91e41e72cd61595a086c48862277c419a994d971/extractor.go#L156-L167

Instead of failing during the TAR unpacking, we can make it fail during the TAR creation, covering all cases where malicious TARs may be created. However, as @lidel mentioned yesterday, this may break someone else's use case (any idea?). But so would truncating the paths. Nowadays we always have CAR files where the user can always download the RAW, unmodified data if they please.

So two options:

  1. Fail during creation.
  2. Truncate file paths.

In either case, I will be updating the original TarWriter in https://github.com/ipfs/go-ipfs-files.

/cc @lidel

@lidel
Copy link
Member

lidel commented Oct 5, 2022

  • I'd go with (1): an explicit error is better than silently mutating directory tree
  • TAR processing should error when DAG has items with paths named ../ that go beyond the /ipfs/cid
    • error should note that relative UnixFS paths beyond /ipfs/cid root are not allowed for TAR, and suggest use of CAR instead
  • create test fixtures for /ipfs/cid/sub/../dir (no error) and /ipfs/cid/../dir (should error) and include their CIDs in specs (IPIP-288: TAR Gateway Response Format specs#288)

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hacdias, let's include this in Kubo 0.17-rc1 (will merge after CI passes).

(pushed some cosmetic changes around Etag + added entry to changelog)

@lidel lidel enabled auto-merge (squash) November 9, 2022 18:16
@lidel lidel merged commit a210abd into master Nov 9, 2022
@hacdias hacdias deleted the feat/download-tar branch November 9, 2022 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

add tarfile/zipfile downloads to HTTP gateway
3 participants