Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow file caching by URL, not (just) filename #111

Closed
pjf opened this issue Oct 21, 2014 · 2 comments · Fixed by #296
Closed

Allow file caching by URL, not (just) filename #111

pjf opened this issue Oct 21, 2014 · 2 comments · Fixed by #296
Assignees
Labels
Core (ckan.dll) Issues affecting the core part of CKAN Enhancement New features or functionality

Comments

@pjf
Copy link
Member

pjf commented Oct 21, 2014

It may be possible for multiple CKAN files to reference the same URL. This is particularly useful if a mod has optional components; they can be bundled in the same file, but may have their own identifiers which allow for complex relationships to be expressed.

@pjf pjf added Core (ckan.dll) Issues affecting the core part of CKAN Enhancement New features or functionality ★★☆ and removed Core (ckan.dll) Issues affecting the core part of CKAN labels Oct 21, 2014
@pjf
Copy link
Member Author

pjf commented Oct 27, 2014

Upgrading the priority on this, because it means we can finally deprecate bundles and other troublesome parts of our metadata ( #113 ).

As an example use-case, a dependency graph which requires that RSS gets its own TACLS config could look like this:

  • TACLS depends upon TACLS-Config
  • TACLS-Kerbal provides TACLS-Config, depends upon TACLS, conflicts with TACLS-Config
  • TACLS-RSS provides TACLS-Config, depends upon TACLS, conflicts with TACLS-Config
  • RealSolarSystem depends upon TACLS-RSS

In this instance TACLS-Kerbal would live inside the same archive as TACLS (because it does), and TACLS-RSS would live inside the same archive as RSS (because it does). This avoids all the awfulness of having to overwrite files from other packages, simply by allowing zipfiles to essentially contain multiple packages.

The gotcha here is that caching by URL means we have to do something we don't like. Either having long URLs for filenames (which are unique, but ugly, and may not work on all systems), computing a hash for each URL (which are unique and even MORE ugly, but will work on all filesystems), using hard links (which are likely a cross-platform nightmare), or keeping a registry of which URL each file came from (which means the filenames on disk may not be what the user expects them to be).

I'm undecided as to which of these options is least bad. :/

Refers to #110 .

@pjf pjf added this to the v1.00 - Usable Release milestone Oct 27, 2014
@pjf pjf added ★★★ and removed ★★☆ labels Oct 27, 2014
@pjf pjf self-assigned this Oct 30, 2014
@pjf pjf added In progress We're still working on this ★★★ and removed ★★★ In progress We're still working on this labels Oct 30, 2014
@pjf pjf removed their assignment Oct 30, 2014
@pjf
Copy link
Member Author

pjf commented Nov 7, 2014

Sane way of handling caching by URL:

  1. Save our downloads to a temporary file.
  2. Take the URL, hash it, and extract the first 8 bytes.
  3. Rename our downloaded file to "hash-original-filename.zip"

This means that files don't enter the cache until they finish downloading, which means that on error we won't have have written files that we might try to use later. It means we're not worried about remote files with the same name (I see this a lot), because we're pre-pending with the hash, and it means that filenames are still human readable, which is great when you want to pop into the cache directory and see what's really in a zip file.

An alternative to this is to post-pend the hash, but I don't think that looks anywhere near as pretty.

We should also try to use the HTTP headers to find out filenames. I seem to recall that both Github and KerbalStuff have URLs with just the release number at the end, but supply suggested filename information in the headers.

@AlexanderDzhoganov AlexanderDzhoganov self-assigned this Nov 7, 2014
@pjf pjf added pull request and removed ★★★ labels Nov 8, 2014
RichardLake pushed a commit to RichardLake/CKAN that referenced this issue May 30, 2015
Changed hash function from private to public.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core (ckan.dll) Issues affecting the core part of CKAN Enhancement New features or functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants