Allow file caching by URL, not (just) filename #111

pjf · 2014-10-21T01:00:00Z

It may be possible for multiple CKAN files to reference the same URL. This is particularly useful if a mod has optional components; they can be bundled in the same file, but may have their own identifiers which allow for complex relationships to be expressed.

pjf · 2014-10-27T01:10:16Z

Upgrading the priority on this, because it means we can finally deprecate bundles and other troublesome parts of our metadata ( #113 ).

As an example use-case, a dependency graph which requires that RSS gets its own TACLS config could look like this:

TACLS depends upon TACLS-Config
TACLS-Kerbal provides TACLS-Config, depends upon TACLS, conflicts with TACLS-Config
TACLS-RSS provides TACLS-Config, depends upon TACLS, conflicts with TACLS-Config
RealSolarSystem depends upon TACLS-RSS

In this instance TACLS-Kerbal would live inside the same archive as TACLS (because it does), and TACLS-RSS would live inside the same archive as RSS (because it does). This avoids all the awfulness of having to overwrite files from other packages, simply by allowing zipfiles to essentially contain multiple packages.

The gotcha here is that caching by URL means we have to do something we don't like. Either having long URLs for filenames (which are unique, but ugly, and may not work on all systems), computing a hash for each URL (which are unique and even MORE ugly, but will work on all filesystems), using hard links (which are likely a cross-platform nightmare), or keeping a registry of which URL each file came from (which means the filenames on disk may not be what the user expects them to be).

I'm undecided as to which of these options is least bad. :/

Refers to #110 .

pjf · 2014-11-07T03:13:24Z

Sane way of handling caching by URL:

Save our downloads to a temporary file.
Take the URL, hash it, and extract the first 8 bytes.
Rename our downloaded file to "hash-original-filename.zip"

This means that files don't enter the cache until they finish downloading, which means that on error we won't have have written files that we might try to use later. It means we're not worried about remote files with the same name (I see this a lot), because we're pre-pending with the hash, and it means that filenames are still human readable, which is great when you want to pop into the cache directory and see what's really in a zip file.

An alternative to this is to post-pend the hash, but I don't think that looks anywhere near as pretty.

We should also try to use the HTTP headers to find out filenames. I seem to recall that both Github and KerbalStuff have URLs with just the release number at the end, but supply suggested filename information in the headers.

Changed hash function from private to public.

pjf added Core (ckan.dll) Issues affecting the core part of CKAN Enhancement New features or functionality ★★☆ and removed Core (ckan.dll) Issues affecting the core part of CKAN labels Oct 21, 2014

pjf added this to the v1.00 - Usable Release milestone Oct 27, 2014

pjf added ★★★ and removed ★★☆ labels Oct 27, 2014

pjf self-assigned this Oct 30, 2014

pjf added In progress We're still working on this ★★★ and removed ★★★ In progress We're still working on this labels Oct 30, 2014

pjf removed their assignment Oct 30, 2014

AlexanderDzhoganov self-assigned this Nov 7, 2014

AlexanderDzhoganov mentioned this issue Nov 7, 2014

NetFileCache and LocalRepo #282

Closed

pjf added pull request and removed ★★★ labels Nov 8, 2014

pjf mentioned this issue Nov 9, 2014

Cache entirely based upon URLs. #296

Merged

AlexanderDzhoganov closed this as completed in #296 Nov 9, 2014

AlexanderDzhoganov removed the pull request label Nov 9, 2014

RichardLake pushed a commit to RichardLake/CKAN that referenced this issue May 30, 2015

Merge pull request KSP-CKAN#111 from mgsdk/show_downloaded_filename

fd22b49

Changed hash function from private to public.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow file caching by URL, not (just) filename #111

Allow file caching by URL, not (just) filename #111

pjf commented Oct 21, 2014

pjf commented Oct 27, 2014

pjf commented Nov 7, 2014

Allow file caching by URL, not (just) filename #111

Allow file caching by URL, not (just) filename #111

Comments

pjf commented Oct 21, 2014

pjf commented Oct 27, 2014

pjf commented Nov 7, 2014