Our Transaction application has a hard delete implemented on MySQL DB on record inactivation scenarios . For the lakehouse , The RDS CDC is exported to S3 through DMS . The problem is , if the DMS is restarted for some reason , the Delete marker of the records get erased , causing orphan records and duplicates in the datalake.
What is the best way to handle this scenario ? Should the glue job transforming data from S3 to the datalake handle such data ? Like a source to target comparison and discard records that are present in the Datalake but not in Source ?