s3 decode user-defined metadata with unicode.

0

How to decode user defined metadata with unicode? https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html

I saw in examples that ÄMÄZÕÑ S3 converted to =?UTF-8?B?w4PChE3Dg8KEWsODwpXDg8KRIFMz?=.

It looks like mime encoding, but I can't decode this string back.

I would be grateful if you could tell me how to do it in golang or help me understand how this string is encoded.

PS. Also I have some experiments and sometime it converts very strange ФывукЕЙÄMÄZÕÑS3ФывукЕÄZÕÑS3 to multiple time encoded string=?UTF-8?B?w5DCpMORwovDkMKyw5HCg8OQwrrDkMKVw5DCmcODwoRNw4PChFrDg8KVw4M=?= =?UTF-8?B?wpFTM8OQwqTDkcKLw5DCssORwoPDkMK6w5DClcODwoRaw4PClcODwpFTMw==?=

  • Not an answer, but poking around with your example it looks like it's =?UTF-8?B? followed by the bytes of the input encoded in base64, followed by ?= to terminate. However, when if I decode your first example I get extra characters in the output. Your input, encoded as utf8 in Python gives me \xc3\x84M\xc3\x84Z\xc3\x95\xc3\x91 S3, and the output of your example when decoded gives \xc3\x83\xc2\x84M\xc3\x83\xc2\x84Z\xc3\x83\xc2\x95\xc3\x83\xc2\x91 S3. The 0xc2 0x84 pair appear to be a control character, which may be present in your input and stripped out by rePost? Hope that helps.

  • Initially I also came to this conclusion, I also tried to determine the encoding with chardet, but for some reason it also answers utf8. After that I found that it's look very similar to mime encoding, but It didn't return the original string. https://docs.python.org/3/library/email.header.html#email.header.decode_header

asked 2 years ago139 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions