-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for non-amazonaws endpoint urls (e.g. s3.<company>.com) #436
Comments
Yes, we'll soon support At the moment you can select a profile to be used with |
Thanks! No - that environment variable was a fictional creation on my part. Sticking to a well known standard is best. Or perhaps parsing additional information from the supplied s3 (default or named) profile? I will play a bit with s3://profile@bucket/... syntax! Happy to provide additional feedback and or some basic documentation regarding htslib / s3 interaction. |
Allow the user to specify an endpoint other than s3.amazonaws.com. This can be set using ~/.s3cfg's host_base setting (only; we ignore host_bucket); when there's a blessed setting key for .aws/credentials, we'll support it there too (perhaps endpoint_url; cf aws/aws-cli#1270). Fixes (part of) #436.
I can see in samtools 1.4 there is a hfile_s3.c that seems to have all the necessary code for getting the correct domain from ~/.s3cfg, but is it actually supposed to be working now? I tried installing by:
But when I try to view a file in s3 (that I can get with s3cmd):
I don't have a ~/.aws/credentials file, which btw should not stop it reading ~/.s3cfg to get the host_base. How do I debug this? |
You can get more debugging output by using the
Htslib doesn't understand v4 signatures yet. Depending on where your data is, this may explain the problem. |
I'm using Ceph Object Gateway, which sort of works with v4, but is best used with v2 signatures.
So I guess my first question is, how are you supposed to build samtools and get it to use your existing configuration/build of htslib in the sub dir? I rebuilt htslib:
And tried again:
So, it's not reading my config file:
|
Could you please test pulling in this PR #506 to see if it fixes the issue? It's sounding much like you're hitting the same problem with s3cfg not being read. |
Probably you also have a ~/.aws/credentials file, which htslib reads first and so has already provided settings to be used. And this very issue is about not being able to specify the endpoint in ~/.aws/credentials at present — we should just bless Remove the default section of ~/.aws/credentials or delete the file entirely. Alternatively use a distinctive profile that's defined in your ~/.s3cfg: introduce it with |
My reading of the existing code is that it should be working already if I don't have a ~/.aws/credentials file (and I don't), but I'll give the PR a go... |
@sb10: In that case, running under |
Trying the PR:
And tried again:
... so somehow the PR helped, but I don't know why. Remaking samtools then also made that work. |
Oh, the current code also doesn't parse the config files if the $AWS_ACCESS_KEY_ID env var is set, and I had it set. Hence the PR working. |
Yes indeed, and sorry for the confusion. As noted previously on this issue, this S3 stuff needs documentation about how it works and is configured. Probably an HTML page on htslib.org rather than a man page. While implementing it, it became apparent that mixing up config settings from different sources would be hard to implement, impossible to document, and confusing to use. So it takes the first of the following that provides access_key:
|
Adding visibility from #506 discussion - it would also be helpful to add a way to force path_style/virtual_style urls. Our use-case is to always force path_style. Something like: ~/.aws/credentials [profile] |
We patch our htslib locally to force path_style URLs via a 'url_mode' key in .s3cfg so that we can access a local cleversafe store. It would be nice if this mechanism or one like it could be made official. |
I've hit this issue again: now that I had a ~/.aws/credentials file, it didn't work until I renamed it.
I disagree. May I suggest you take a "it just works" approach that I use for muxfys:
|
We are utilizing EMC on-premise s3-compatible storage.
Htslib does not accommodate non (.s3.amazonaws.com) endpoint urls.
We can successfully modify, re-compile and use htslib/samtools if we swap in our own endpoint url as below:
line 846 of hfile_libcurl.c
Can htslib be modified to accept custom endpoint urls from all potential config locations?
e.g.
s3://id:secret:endpoint@bucket/
(or similar)AWS_ENDPOINT_URL=
aws_endpoint_url=
(or some other standard, extracted from aws profile?)Or modify to utilize a 's3_domain' parameter?
Looks like you've already touched on a few of these related to gcs:
#390
It would also be nice to modify the s3 logic to also take in a --profile parameter as used in aws-cli.
The text was updated successfully, but these errors were encountered: