Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boto3 incorrectly disallows unicode metadata on S3, despite documented S3 support #3063

Closed
AMZN-hgoffin opened this issue Nov 1, 2021 · 2 comments
Labels
needs-triage This issue or PR still needs to be triaged.

Comments

@AMZN-hgoffin
Copy link

AMZN-hgoffin commented Nov 1, 2021

Attempting to put a non-ASCII string as a user metadata value on a S3 object fails due to an explicit prohibition in the boto3 source code:
https://github.com/boto/botocore/blob/04d1fae43b657952e49b21d16daa86378ddb4253/botocore/handlers.py#L543

Example:

>obj.put(Body="hello",Metadata={"meta":"™"})
Traceback (most recent call last): ...
UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in position 0: ordinal not in range(128)

The documentation cited in the validation function linked above has changed since the code was written, and now states "Amazon S3 allows arbitrary Unicode characters in your metadata values" (https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html)

If I monkey-patch this check out, everything works fine and the PUT succeeds, as the documentation suggests it should. When fetching the object, the metadata is UTF-8 encoded and then base64 encoded for ASCII transmission via http headers, again totally as documented.

>import botocore
>botocore.handlers.BUILTIN_HANDLERS = [elem for elem in botocore.handlers.BUILTIN_HANDLERS if not (elem[0].startswith('before-parameter-build.s3.') and elem[1] == botocore.handlers.validate_ascii_metadata)]

>sess = boto3.Session()
>s3 = session.resource('s3')
>obj = s3.Object('[bucket omitted]','testupload.txt')
>obj.put(Body="hello",Metadata={"meta":"™"})
{'ResponseMetadata': ...}
>obj.get()
{ ... 'x-amz-meta-meta': '=?UTF-8?B?w6LChMKi?=' ... }

I believe that the correct resolution today is to revert boto/botocore#861 or else make it check keys only (instead of values). Now that boto3 only supports python3 with native Unicode strings, and all the signing libraries have been updated to support UTF-8, there is no need for this out-of-date explicit "validation" step.

@AMZN-hgoffin AMZN-hgoffin added the needs-triage This issue or PR still needs to be triaged. label Nov 1, 2021
@AMZN-hgoffin
Copy link
Author

AMZN-hgoffin commented Nov 1, 2021

Re-filed under botocore instead of boto3, sorry for the spam. boto/botocore#2552

@github-actions
Copy link

github-actions bot commented Nov 1, 2021

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage This issue or PR still needs to be triaged.
Projects
None yet
Development

No branches or pull requests

1 participant