How to retrieve job run id from the script itself in python shell glue job


I need to print my python shell glue job run id from the script itself into cloudwatch logs. But I didn’t find a way to find the current job run id, I have found a partial way using boto3 functions but its giving all glue job run ids and if we want to find out current job run id then its saying to filter with job run status, but this procedure works only there is one job run at a time, in my scenario I am expecting concurrent runs. Can you help me on how to find out the current job run id??

asked 2 years ago5331 views
3 Answers

You can pass this in your AWS Glue Scripts. See if this helps

import sys 
from awsglue.utils import getResolvedOptions 
args = getResolvedOptions(sys.argv, ['JOB_NAME']) 
job_run_id = args['JOB_RUN_ID']
answered 2 years ago
  • I have tried this, but the job is throwing an error -> KeyError: JOB_RUN_ID, and as per your message it seems like job is trying to retrieve a job parameter which is not even passed. Without passing job run id in the job parameters how can we retrieve it from script?


This function should help you get the job_run_id : Call it right in the beginning of your python job

def get_running_job_id(job_name):
    session = boto3.session.Session()
    glue_client = session.client('glue')
        response = glue_client.get_job_runs(JobName=job_name)
        for res in response['JobRuns']:
            print("Job Run id is:"+res.get("Id"))
            print("status is:"+res.get("JobRunState"))
            if res.get("JobRunState") == "RUNNING":
                return res.get("Id")
            return None
    except  botocore.exceptions.ClientError as e:
        raise Exception("boto3 client error in get_status_of_job_all_runs: " + e.__str__())
    except Exception as e:
        raise Exception("Unexpected error in get_status_of_job_all_runs: " + e.__str__())
answered a year ago

Apparently the

getResolvedOptions(sys.argv, ["JOB_NAME","JOB_RUN_ID"]) 

works only for Pyspark jobs. You can confirm this by firing a shell job and Pyspark job and doing a print(sys.argv) and having a look at the entire list of arguments returned.

For Job Run ID :

import boto3
glue_client = boto3.client("glue")
response = glue_client.get_job_runs(JobName = <your job name>)
job_run_id = response["JobRuns"][0]["Id"]

Use this code as early as possible within the Python shell job to get the job run id of the most recent execution.

For Job Name : There is a programmatic way to derive job name which I have explained below.

In a scenario where the job name and the python script name are same, we can read the the first element of sys.argv and then use below :

job_name = sys.argv[0].split('/')[-1]

returns Strip the ".py" if you need only the name part.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions