Skip to content

How can I use a Lambda function to automatically start an AWS Glue job when a crawler run completes?

3 minute read
0

I want to use an AWS Lambda function to automatically start an AWS Glue job when a crawler run completes.

Short description

To start a job when a crawler run completes, create an AWS Lambda function and an Amazon EventBridge rule.

Note: You can also use AWS Glue workflows to automatically start a job when a crawler run completes. For more information, see How do I use AWS Glue workflows to automatically start a job when a crawler run completes?

Resolution

Prerequisites:

You must have the following:

  • AWS Glue extract, transform, and load (ETL) job
  • AWS Glue crawler
  • AWS Identity and Access Management (IAM) role for Lambda with permission to run AWS Glue jobs. For example, set up a service-linked role for Lambda that has the AWSGlueServiceRole policy attached to the role.

Create the Lambda function

Complete the following steps:

  1. Open the Lambda console.
  2. Choose Create function.
  3. On the Create function page, enter the following information:
    Choose Author from scratch.
    For Function name, enter a name for your function.
    For Runtime, choose a Python option.
    For Architecture, use the default option, x86_64.
    Under Change default execution role, for Execution role, choose Use an existing role.
    For Existing role, select the IAM role that has permissions to run AWS Glue jobs.
  4. Choose Create function.
  5. On the function details page, under Code source, enter the following code:
    # Set up loggingimport json
    import os
    import logging
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    # Import Boto 3 for AWS Glue
    import boto3
    client = boto3.client('glue')
    
    # Variables for the job: 
    glueJobName = "MyTestJob"
    
    # Define Lambda function
    def lambda_handler(event, context):
        logger.info('## INITIATED BY EVENT: ')
        logger.info(event['detail'])
        response = client.start_job_run(JobName = glueJobName)
        logger.info('## STARTED GLUE JOB: ' + glueJobName)
        logger.info('## GLUE JOB RUN ID: ' + response['JobRunId'])
        return response
    Note: Replace MyTestJob with the name of your AWS Glue ETL job.
  6. Under Code source, choose Test.
  7. Open the AWS Glue console, and then confirm that the job started.

To test the Lambda function, run your AWS Glue crawler. Then, check the Jobs run resource list of your AWS Glue ETL job. The Run status displays Starting or Running.

Create the EventBridge rule

Complete the following steps:

  1. Open the EventBridge console.
  2. In the navigation pane, choose Rules, and then choose Create rule.
  3. Enter a name and description for the rule, and then select Next.
  4. Use the default values for Event source and Sample event.
  5. Under the Event pattern section, select Custom Patterns (JSON editor).
  6. Enter the following code in the Event pattern field:
    {
        "detail-type": [
            "Glue Crawler State Change"
        ],
        "source": [
            "aws.glue"
        ],
        "detail": {
            "crawlerName": [
                "MyTestCrawl"
            ],
            "state": [
                "Succeeded"
            ]
        }
    }
    Note: Replace MyTestCrawl with the name of your AWS Glue crawler.
  7. For Select targets, enter the following information:
    For Target, select Lambda function.
    For Function, select your Lambda function.
  8. Choose Create.

To test the Lambda function and EventBridge rule, run your AWS Glue crawler. Then, check the Jobs run resource list of your AWS Glue ETL job. The Run status displays Starting or Running.

Related information

Creating rules that react to events in Amazon EventBridge

AWS OFFICIALUpdated 9 months ago