Pass parameter from one glue job to another in step function?


I am looking for the best way to pass a parameter from one glue job to another within a step function.

Each day, I will receive a file. In the file there will be data for certain dates. The first job will normalize the data in the file and write it to an iceberg table. The second job will read the iceberg table and write one file per date in the first file.

If I only got one type of file, I could do all of this in one job. However, I will get files with similar data, but different schemas, which will necessitate multiple copies of the first job.

Is there a recommended way to pass that date information between the two? I don't think this was possible with glue jobs in the past, but I know that there have been a lot of recent improvements.

1 Answer

When you configure a Step Function to orchestrate AWS Glue jobs, you can use the output of one job as input to another. Here's how you can achieve it:

  • AWS Glue Job 1: This job normalizes the data in the file and writes it to an Iceberg table. You should modify this job to return the date information that you want to pass to the next job. This can be done by writing the date information to a location where the second job can access it, such as an S3 bucket or by using job parameters.
  • Step Function State Output: The Step Function state that runs the first AWS Glue job should be configured to capture the output of the Glue job. This can be done by setting the appropriate state output configuration in the Step Function definition. The output can include the location of the stored date information or the date itself.
  • Pass the Output to the Next State: You can then pass this output to the next state in the Step Function. The Step Function state that invokes the second Glue job should be configured to use the output of the first state as its input.
  • AWS Glue Job 2: The second Glue job should be configured to read the input passed from the previous state. This input will contain the date information you need. The job can then proceed to read the Iceberg table and write one file per date.

If this has resolved your issue or was helpful, accepting the answer would be greatly appreciated. Thank you!

profile picture
answered 3 months ago
  • I don't think Glue runs return any output details

  • Mina, we were hoping to be able to write from the first glue job to a parameter, but I haven't found that as an option. Do you know how it can be done? If not, then writing to another location is our best option

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions