How can I trigger an AWS Glue job in one AWS account based on the status of an AWS Glue job in another account?

7 minute read
0

I want to create a pipeline where the completion of an AWS Glue job in one AWS account starts a crawler in another account.

Short description

You can create AWS Glue Data Catalog objects called triggers. Triggers can either manually or automatically start one or more crawlers or ETL jobs, but this feature can be used only within one AWS account. You can't use these triggers to start crawlers or ETL jobs residing in another AWS account. To trigger an AWS Glue job in one AWS account based on the status of a job in another account, use Amazon EventBridge and AWS Lambda.

Resolution

The following example gives an overview of how you can use EventBridge and a Lambda function to achieve your use case. Let’s assume you have two AWS Glue jobs, where Job 1 runs in AWS Account A, and Job 2 runs in AWS Account B. Job 2 is dependent on Job 1.

  1. Create a custom event bus in AWS Account B and an EventBridge rule in AWS Account A. The EventBridge rule in Account A watches for the AWS Glue Job 1 in the SUCCEEDED state. Then, the target is the event bus created in AWS Account B.
  2. Create a Lambda function in AWS Account B that triggers AWS Glue ETL Job 2.
  3. Create an EventBridge rule in Account B with the custom event bus that you created in step 1. Add a rule that watches for AWS Glue Job 1 in the SUCCEEDED state, and the Lambda function created earlier as target. The target triggers the AWS Glue ETL Job 2 when the event arrives using AWS Glue API calls.

For more information, see the list of Amazon CloudWatch Events generated by AWS Glue that can be used in EventBridge rules.

Create a custom event bus in Account B

1.    In Account B, open EventBridge. Choose Event buses, and then choose Create event bus. Add this resource-based policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "allow_account_to_put_events",
      "Effect": "Allow",
      "Principal": {
        "AWS": "<Account-A ID>"
      },
      "Action": "events:PutEvents",
      "Resource": "arn:aws:events:<Account-B Region>:<Account-B ID>:event-bus/<Account-B CustomEventBus Name>"
    }
  ]
}

Note: Be sure to replace the example items in <> with your own details. For example, replace <Account-B CustomEventBus Name> with the name of the event bus that you created in Account B.

2.    After creating the event bus, note its ARN.

3.    Choose the custom event bus that you created, and then choose Actions.

4.    Choose Start Discovery.

Create the event rule in Account A for Job 1

1.    From Account A, open the EventBridge console.

2.    Choose Rules, and then for Event bus, choose default.

3.    Choose Create rule. Add a Name, and then for Rule type choose Rule with an event pattern.

4.    On the Build event pattern page, under Creation method, choose Rule with an event pattern. Add this JSON:

{
  "source": ["aws.glue"],
  "detail-type": ["Glue Job State Change"],
  "detail": {
    "jobName": ["<Job 1 name>"],
    "severity": ["INFO"],
    "state": ["SUCCEEDED"]
    }
}

Note: Be sure to replace <Job 1 name> with the name of the AWS Glue job that you're using.

5.    For Target types, choose EventBridge event bus, and then choose Event bus in another AWS account or Region.

6.    Enter the ARN of the event bus that you created previously in Account B. This ARN is used as the target.

7.    For Execution role, choose Create a new role for this specific resource. If you choose Use existing role instead, then make sure that your AWS Identity and Access Management (IAM) policy has the following permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "events:PutEvents"
      ],
      "Resource": [
        "arn:aws:events:<Account-B Region>:<Account-B ID>:event-bus/<Account-B CustomEventBus Name>"
      ]
    }
  ]
}

Note: Be sure to replace the <Account-B CustomEventBus Name> in this example with the name of the event bus that you created in Account B.

8.    Choose Next, review your settings, and then choose Create.

Create a Lambda function in Account B with a target that starts AWS Glue Job 2

1.    Open the Lambda console.

2.    Choose Functions, and then choose Create function.

3.    Enter a function name, and for Runtime choose Python 3.x version.

4.    Under Change default execution role, choose Create a new role with basic Lambda permissions.

5.    If you are using an existing role, make sure that it has the required permissions. If it doesn't, add these permissions using the IAM console. First, add the AWSGlueServiceRole (AWS management policy). Then, give Lambda the IAM permissions to run based on events:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "logs:CreateLogGroup",
      "Resource": "arn:aws:logs:<Account-B Region>:<Account-B ID>:*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": [
        "arn:aws:logs:<Account-B Region>:<Account-B ID>:log-group:/aws/lambda/<Lambda Function Name>:*"
      ]
    }
  ]
}

Note: Be sure to replace the example items in <> with your own details. For example, replace <Account-B ID> with the account ID for Account B

6.    Choose Create function.

7.    In the code section of the function that you created, add this code:

# Set up logging
import json
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Import Boto 3 for AWS Glue
import boto3
client = boto3.client('glue')

# Variables for the job: 
glueJobName = "<Job 2 Name>"

# Define Lambda function
def lambda_handler(event, context):
    logger.info('## INITIATED BY EVENT: ')
    response = client.start_job_run(JobName = glueJobName)
    logger.info('## STARTED GLUE JOB: ' + glueJobName)
    logger.info('## GLUE JOB RUN ID: ' + response['JobRunId'])
    return response.

Note: Be sure to replace <Job 2 Name> with the name of the AWS Glue job that you're using in Account A.

8.    Choose Deploy. You can now test the function to check that it triggers Job 2 in Account B.

Create an event rule in Account B for Job 1

1.    From Account B, open the EventBridge console.

2.    Choose Rules, and then choose the event bus that you previously created.

3.    Create a new rule under the event bus. Enter a Rule name, and for Rule type, choose Rule with an event pattern.

4.    On the Build event pattern page, under creation method, choose Custom pattern, and then add this JSON:

{
  "source": ["aws.glue"],
  "detail-type": ["Glue Job State Change"],
  "detail": {
    "jobName": ["<Job 1 name>"],
    "severity": ["INFO"],
    "state": ["SUCCEEDED"]
  }
}

Note: Be sure to replace <Job 1 name> with the name of the AWS Glue job that you're using in Account B.

5.    On the Select Targets page, for Target types, choose AWS service.

6.    For Select target, choose or type Lambda function, and then choose the function that you previously created from the dropdown list.

7.    Choose Next, review your settings, and then choose Create.

Test your cross-account AWS Glue job trigger

1.    Run Job 1 in Account A. When the job completes, a SUCCEEDED state is sent to the event bus in Account A.

2.    Account A sends the event information to the event bus in Account B.

3.    The event bus in Account B runs the event rule. This event rule triggers the Lambda function in Account B. To check Lambda logs, open the Amazon CloudWatch console, choose Log groups, and then choose your Lambda function group. The function group is in the format /aws/lambda/<LambdaFunctionName>.

4.    The Lambda function triggers Job 2 in Account B.


Related information

Sending and receiving Amazon EventBridge events between AWS accounts

AWS OFFICIAL
AWS OFFICIALUpdated a year ago