I'm working on a feature to archive old data from our Aurora Postgres database to S3 using the aws_s3
extension. This operation takes 20-30 minutes, and sometimes my client gets disconnected and retries. It appears that, even if my client gets disconnected, the aws_s3
extension continues in the background, so on a retry, I end up transferring the entire amount of data twice, and I would prefer if there is a reliable way to query the S3 objects and know they are complete copy of the data in the query. The table partitions I am archiving are 15-20Gb each, and I notice that the S3 objects appear to be chunked into 6Gb chunks, so it seems that the existence of an S3 object following the right naming convention that is significantly smaller than 6Gb would imply that an earlier operation completed, but it's hard to be certain.