1回答
- 新しい順
- 投票が多い順
- コメントが多い順
0
The easiest solution is to replicate the table using S3 replication but the replicate could have missing files for a few seconds and fail reads (if it's for DR or if readers are aware and retry, it might be ok)
Or just recreating the destination table each time (like a snapshot).
The other way I would consider doing that is streaming from the Iceberg table so it takes the increments and moves then into the other table: https://iceberg.apache.org/docs/latest/spark-structured-streaming/ (assuming you don't do deletions)
I tried the stream solution, but it's not compatible with Athena PySpark (I got an explicit error about Athena not supporting it). So I guess it would require an EMR to run without interruption... (and I also do some deleting through Athena SQL).
The S3 replication solution is more interesting in my situation, however my requirement is to have high availability on another region (in case of region shutdown), so I would need to query a "disaster region iceberg table" simply. If I replicate the data, I will need to do this: 1- Call a lambda every time a .json file is deposited, and if it's from replication (using userIdentity.principalId to verify), patch the S3 bucket name in it, and rewrite it. 2- This lambda should also update the "disaster iceberg table" glue entry to point its "metadata_location" to this current metadata json (and ideally patch the previous_metadata_location with the previous value), and the table schema in case of evolution (new columns, ...).
I think this might work with this solution... But is there anything else I should be aware of? (due to replication delay/...) or maybe there's an easier solution?
If there is latency before the disaster table is queryiable, it may not be a problem for me, and if I can't write to the disaster either (eventually I could maybe change the replication direction and have the same logic on the other region in case of)
Because Iceberg doesn't update file, you just delay a bit the table update so the data is consistent. The issue with the disaster happening while replicating you are always going to have it, unless when you write to the table you update both at the same time and use Nessie to coordinate it. IMO, it's not worth it and I would reconsider the design.