How to delete Gacier vault with more than 700 000 archives

0

Hello,

I’ve tried everything, but it seems hopeless right now. The situation is this: I have one S3 Glacier Vault XYZ with 738,900 archives. I need to delete it and stop paying for unused resources. So, I found the documentation on AWS about it and also some scripts to delete archives in AWS CloudShell. For the last week, I’ve succeeded with no more than 5,000 archives. Every time when I run the script, after 20-30 minutes, the AWS CloudShell session was closed, and I need to start over.

Example of the script: *export AWS_ACCOUNT_ID=11111111111 export AWS_REGION=rr-wwwww-2 export AWS_VAULT_NAME=XYZ0

#!/bin/bash

file='./output5.json'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then echo "Please set the following environment variables: " echo "AWS_ACCOUNT_ID" echo "AWS_REGION" echo "AWS_VAULT_NAME" exit 1 fi

jq -r .ArchiveList[].ArchiveId < $file | xargs -P12 -n1 bash -c "echo "Deleting: $1"; aws glacier delete-archive --archive-id=$1 --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}" {}*

How can I do this more effective? Anyone had simillar problem?

Kapru
asked 3 months ago157 views
2 Answers
1

It sounds like you are using this example from Github.

#!/bin/bash

file='./output.json'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
	echo "Please set the following environment variables: "
	echo "AWS_ACCOUNT_ID"
	echo "AWS_REGION"
	echo "AWS_VAULT_NAME"
	exit 1
fi

archive_ids=$(jq .ArchiveList[].ArchiveId < $file)

for archive_id in ${archive_ids}; do
    echo "Deleting Archive: ${archive_id}"
    aws glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}
done

echo "Finished deleting archives"

It seems in the comments that it's working single thread and crashing CloudShell for some users. Further down someone changed it to support multi-threaded.

#!/usr/bin/env bash

file='./output.json'
id_file='./output-archive-ids.txt'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
        echo "Please set the following environment variables: "
        echo "AWS_ACCOUNT_ID"
        echo "AWS_REGION"
        echo "AWS_VAULT_NAME"
        exit 1
fi

echo "Started at $(date)"

echo -n "Getting archive ids from $file..."
if [[ ! -f $id_file ]]; then
  cat $file | jq -r --stream ". | { (.[0][2]): .[1]} | select(.ArchiveId) | .ArchiveId" > $id_file 2> /dev/null
fi
total=$(wc -l $id_file | awk '{print $1}')
echo "got $total"

num=0
while read -r archive_id; do
  num=$((num+1))
  echo "Deleting archive $num/$total at $(date)"
  aws glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION} &
  [ $( jobs | wc -l ) -ge $( nproc ) ] && wait
done < "$id_file"

wait
echo "Finished at $(date)"
echo "Deleted archive ids are in $id_file"

I do not know of an internal way to recursively removal all of these archives, but please take care running code you find on the internet.

profile pictureAWS
EXPERT
David
answered 3 months ago
  • Yes i'm using that script or a few other versions. All are working fine, but time needed to delete 700000 archives is beyond my imagination. After 4 days of trying i was deleted no more ten 10%. One of the problems is that CloudShell disconnects after 20-30 minuts, when the script needs at least few hours. Is there any other way? BTW: IMHO the fastest scrpit way is to use pararell, for example like that: #!/bin/bash

    file='./outputX.json'

    if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then echo "Please set the following environment variables: " echo "AWS_ACCOUNT_ID" echo "AWS_REGION" echo "AWS_VAULT_NAME" exit 1 fi

    total_archives=$(jq -r '.ArchiveList | length' $file) completed_archives=0

    jq -r .ArchiveList[].ArchiveId < $file | parallel -j8 --bar aws glacier delete-archive --archive-id={} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION} && ((completed_archives++))

    echo "Completed $completed_archives out of $total_archives archives."

0
Accepted Answer

I've finally deleted all the archives and the vault itself. The answer from technical AWS support was helpful. The main part of it:

"we don't have the option to delete these resources in a bulk request. However, you can consider using this third-party tool named 'FastGlacier' for interacting with your Glacier vaults from Windows clients. 

Please review the following documentation about this: 
>> https://fastglacier.com/ 

-> When downloaded, install and open, click on 'Accounts' on the top left.
-> Click 'Add New Account'
-> Enter the Access Key and Secret Access Key for your IAM user that has permissions to interact with Glacier.
-> Click on 'Add new account' at the bottom
-> Then, it may take a few minutes for the vaults list to update
-> Select the correct region from the drop down and it will open the archives in the vault
-> Right click on one of the archives and then click on 'Select All' (or press Ctrl + A)
-> Next, right click again and now click 'Delete' (or press Delete button on keyboard)
-> When the archives are deleted, you can delete your vault. You may need to again let the inventory update in the backend before deleting the vault, which can take up to 24 hours."

The tool - FastGlacier - is fantastic, works perfect and can be left for long time without any need of take care. The app remain connected for the entire time (more than a week), no technical problem.

If you need to delete AWS Glacier vault with large number of archives, it seems Fast Glacier is the best tool.

Kapru
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions