load data incrementally in sap o data (sap ecc) using aws glue

0

Hello, I am working on a job in AWS Glue to fetch data from SAP ECC using an SAP OData connection. I’ve used the code below, but it’s not working. Every time, it fetches all the data again. Please help!

# Step 1: Initialize State Manager for Timestamp-based Incremental Transfer
state_manager_opt = {
    "job_name": args['JOB_NAME'],
    "logger": glueContext.get_logger(),
    "timestamp_column": "Kdatb"
}

state_manager = StateManagerFactory.create_manager(
    manager_type=StateManagerType.JOB_TAG,
    state_type=StateType.TIMESTAMP,
    options=state_manager_opt
)

# Get the last known timestamp (if available) to filter the data
key = "SAPODataNode"
connector_options = state_manager.get_connector_options(key)

# Step 2: Read data from SAP OData using Glue's DynamicFrame
sap_options = {
    "connectionName": "Sapodata connection",
    "ENTITY_NAME": "/sap/opu/odata/sap/cencored/cencored",
    **connector_options  # Include filtering based on last timestamp
}

df_1 = glueContext.create_dynamic_frame.from_options(
    connection_type="sapodata",
    connection_options=sap_options,
    transformation_ctx="df_1"
)
1 Answer
2

Thank you for sharing your code. I understand you're trying to implement incremental data loading from SAP ECC using AWS Glue and an SAP OData connection. Let's address the issue you're facing with fetching all data every time instead of just the new or updated records.

The approach you're using with the state management script for timestamp-based incremental transfers is correct. However, there might be a few things to check and adjust:

  1. Ensure that the "Kdatb" column you're using as the timestamp_column is indeed the correct column that represents the last modified or created date for each record in your SAP ECC system.

  2. Verify that the StateManagerFactory and related classes are properly imported and available in your Glue job's environment.

  3. Make sure that the job has the necessary permissions to read and write state information using the job tags.

  4. After processing your data, you need to update the state with the new maximum timestamp. Add this line after your data processing logic:

    state_manager.update_state(key, df_1.toDF())

  5. If you're using an ODP-enabled entity that supports Change Data Capture (CDC), you might want to consider using Delta Token-based incremental transfers instead of timestamp-based. This method is more reliable for CDC-enabled entities.

Here's a modified version of your code that incorporates these suggestions:

from sap_odata_state_management import StateManagerFactory, StateManagerType, StateType

# Step 1: Initialize State Manager for Timestamp-based Incremental Transfer
state_manager_opt = {
    "job_name": args['JOB_NAME'],
    "logger": glueContext.get_logger(),
    "timestamp_column": "Kdatb"
}

state_manager = StateManagerFactory.create_manager(
    manager_type=StateManagerType.JOB_TAG,
    state_type=StateType.TIMESTAMP,
    options=state_manager_opt
)

# Get the last known timestamp (if available) to filter the data
key = "SAPODataNode"
connector_options = state_manager.get_connector_options(key)

# Step 2: Read data from SAP OData using Glue's DynamicFrame
sap_options = {
    "connectionName": "Sapodata connection",
    "ENTITY_NAME": "/sap/opu/odata/sap/cencored/cencored",
    **connector_options  # Include filtering based on last timestamp
}

df_1 = glueContext.create_dynamic_frame.from_options(
    connection_type="sapodata",
    connection_options=sap_options,
    transformation_ctx="df_1"
)

# Process your data here...

# Update the state after processing
state_manager.update_state(key, df_1.toDF())

If you're still experiencing issues after making these changes, you may want to check the following:

  1. Verify that the SAP OData connection is properly configured in AWS Glue.
  2. Ensure that the SAP system is correctly set up for OData access and that the entity you're querying supports incremental transfers.
  3. Check the Glue job logs for any error messages or warnings that might provide more insight into the problem.

By implementing these changes and verifications, you should be able to achieve incremental data loading from your SAP ECC system using AWS Glue and the SAP OData connector.
Sources
Incremental transfers - AWS Glue
Connecting to SAP OData - AWS Glue
AWS Glue support for SAP OData - AWS Glue

profile picture
answered a month ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions