- Newest
- Most votes
- Most comments
Thank you for sharing your code. I understand you're trying to implement incremental data loading from SAP ECC using AWS Glue and an SAP OData connection. Let's address the issue you're facing with fetching all data every time instead of just the new or updated records.
The approach you're using with the state management script for timestamp-based incremental transfers is correct. However, there might be a few things to check and adjust:
-
Ensure that the "Kdatb" column you're using as the timestamp_column is indeed the correct column that represents the last modified or created date for each record in your SAP ECC system.
-
Verify that the StateManagerFactory and related classes are properly imported and available in your Glue job's environment.
-
Make sure that the job has the necessary permissions to read and write state information using the job tags.
-
After processing your data, you need to update the state with the new maximum timestamp. Add this line after your data processing logic:
state_manager.update_state(key, df_1.toDF())
-
If you're using an ODP-enabled entity that supports Change Data Capture (CDC), you might want to consider using Delta Token-based incremental transfers instead of timestamp-based. This method is more reliable for CDC-enabled entities.
Here's a modified version of your code that incorporates these suggestions:
from sap_odata_state_management import StateManagerFactory, StateManagerType, StateType # Step 1: Initialize State Manager for Timestamp-based Incremental Transfer state_manager_opt = { "job_name": args['JOB_NAME'], "logger": glueContext.get_logger(), "timestamp_column": "Kdatb" } state_manager = StateManagerFactory.create_manager( manager_type=StateManagerType.JOB_TAG, state_type=StateType.TIMESTAMP, options=state_manager_opt ) # Get the last known timestamp (if available) to filter the data key = "SAPODataNode" connector_options = state_manager.get_connector_options(key) # Step 2: Read data from SAP OData using Glue's DynamicFrame sap_options = { "connectionName": "Sapodata connection", "ENTITY_NAME": "/sap/opu/odata/sap/cencored/cencored", **connector_options # Include filtering based on last timestamp } df_1 = glueContext.create_dynamic_frame.from_options( connection_type="sapodata", connection_options=sap_options, transformation_ctx="df_1" ) # Process your data here... # Update the state after processing state_manager.update_state(key, df_1.toDF())
If you're still experiencing issues after making these changes, you may want to check the following:
- Verify that the SAP OData connection is properly configured in AWS Glue.
- Ensure that the SAP system is correctly set up for OData access and that the entity you're querying supports incremental transfers.
- Check the Glue job logs for any error messages or warnings that might provide more insight into the problem.
By implementing these changes and verifications, you should be able to achieve incremental data loading from your SAP ECC system using AWS Glue and the SAP OData connector.
Sources
Incremental transfers - AWS Glue
Connecting to SAP OData - AWS Glue
AWS Glue support for SAP OData - AWS Glue
Relevant content
- asked a year ago
- asked 8 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 months ago
Thanks for your information. May I know how to ensure that the SAP system is properly prepared for OData access and that the entity we are querying supports incremental transfers?
I have already followed this documentation https://docs.aws.amazon.com/glue/latest/dg/sap-odata-configuring.html
but still not working