Backup elasticsearch from on-premise to AWS with encryption

0

I have on-premise elasticsearch cluster and i have daily snapshots on NFS

as we know elasticsearch make first full snapshot and then makes incremental ( only new data will write to new snapshot ) so i have 1 big directory where snapshots are stored for a year ( in elasticsearch the place where snapshot locates called snapshot repository )

but for now i want to have a copy of my snapshots on AWS, and that data must be encrypted ( on client side preferably ) but elasticsearch cannot work encrypted snapshot repository ( when new snapshot are running, elasticsearch may delete or change older files, so if this files encrypted the snapshot process turn into failed state ) so for case with client-side encryption i choose these variant

  1. everyday create new folder with (name-Date) format then register that snapshot repository
  2. make FULL snapshot to that directory
  3. encrypt that directory using gpg or something else
  4. copy this encrypted directory to AWS S3

but i have about 150 gb of data, and that data are growing, I'm afraid that it's redundant to make full backups every day and copy so much data ( and i must encrypt 150 gb data everyday)

yeah i know about CMK (Customer managed keys and CloudHSM ) but according to internal reasons the client-side encryption method is preferable, is this solution good for that case or better solution exists? i think my solution works better when scheduling to run once a week but nor every day

asked a year ago393 views
3 Answers
1

Yeah by the looks of it, it doesn't seem sustainable for you to continuously create full backups

I would suggest you to start doing incremental backups, which has 1 baseline backup and every backup after that will only backup the data that has been accessed or altered. This can be done with Elastisearch as it has an incremental backup feature.

Another approach could be to use a backup solution that supports encryption at rest, such as AWS Backup. AWS Backup can be used to backup Elasticsearch snapshots stored in S3 and it supports encryption at rest using AWS KMS. This will allow you the ability to have encrypted data wthout the need of performing daily client side encryption with elastisearch

However, the best solution for your use case will depend on your specific requirements and constraints.

I hope this helps, have a good one

AWS_Guy
answered a year ago
  • yeah but when i encrypt my snapshot repository on client side, the elasticsearch cannot write or read from that repo :(

1

If client side encryption is not mandatory, then you can evaluate the s3 repository plugin to create snapshots directly in S3 without the need to take full backups everyday ( approach you suggested ). It supports SSE

--Syd

profile picture
Syd
answered a year ago
0

While your current solution could work, you're right that it might not be the most efficient or cost-effective method, especially when dealing with daily full backups. Here's an alternative approach that could be more suitable for your use case:

Create an initial full snapshot of your Elasticsearch data in the local NFS storage. Encrypt the snapshot using your preferred client-side encryption method (e.g., GPG). Upload the encrypted snapshot to AWS S3. For daily incremental backups: a. Create an incremental snapshot of your Elasticsearch data in the local NFS storage. b. Encrypt only the new or changed files from the incremental snapshot. c. Upload the encrypted files to the corresponding S3 folder, preserving the directory structure. Maintain an index file or manifest to keep track of the encrypted files and their relationships with the original snapshots. This approach ensures that you're only encrypting and uploading the new or changed data every day, reducing the storage and bandwidth requirements. You can also consider using AWS S3 Transfer Acceleration to speed up the transfer process.

Please note that this solution assumes you have the necessary infrastructure and permissions to access and manipulate the Elasticsearch snapshots at the file level.

In addition, you should consider implementing a data retention policy and lifecycle rules on your S3 buckets to automatically transition older snapshots to lower-cost storage classes or delete them when they're no longer needed. This will help you optimize storage costs while maintaining data availability.

Lastly, be sure to test your backup and recovery process to ensure the encrypted snapshots can be decrypted and restored correctly.

AWS
EXPERT
ZJon
answered a year ago
  • Thank you so much for you answer! but elasticsearch snapshots works not like that ;(

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions