Read a tar file from s3 and uncompress it

0

Hi,

I want to read a tar file from s3, uncompress it and load it to another s3 bucket using Glue job. But I am facing "fileobj must implement read".

obj=s3.getObject(bucketname,key) objbuffer = io.BytesIO(obj["Body"].read()) tarf = tarfile.open(fileobj=objbuffer) files = tarf.getnames() for file in files: with open(file, 'rb') as f: s3.upload_fileobj(f, tgt_bucket, filepath, Config=config)

Note : I am using upload_fileobj to handle mutlipart upload and Config has TransferConfig details

Bharath
asked 20 days ago143 views
1 Answer
1

Hi,

Are you 100% sure that the tarf.getnames() returns "real" files ? It can also return symlinks, directories, etcetc.

Look at https://docs.python.org/3/library/tarfile.html#tarfile.TarInfo

TarInfo.type
File type. type is usually one of these constants: REGTYPE, AREGTYPE, LNKTYPE, 
SYMTYPE, DIRTYPE, FIFOTYPE, CONTTYPE, CHRTYPE, BLKTYPE, GNUTYPE_SPARSE. 
To determine the type of a TarInfo object more conveniently, use the is*() methods below.

So, you may want to check the type of the tar member before uploading it.

Best,

Didier

profile pictureAWS
EXPERT
answered 20 days ago
profile pictureAWS
EXPERT
iBehr
reviewed 20 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions