send email from glue job

0

Hi team,

I have an AWS glue job that reads data from S3 and injects it into RDS MySQL.

I'm trying to send a notification email at the end of the job (from within the pySpark glue job itself) using the example on the following link :

https://docs.aws.amazon.com/ses/latest/dg/send-an-email-using-sdk-programmatically.html

I also added to the role attached to my glue job the following permissions :

 "ses:SendEmail",  
 "ses:SendRawEmail"

When I run My job I make soo much time running and it doesn't do anything (no failure, no success, it keeps running and nothing happens)

so I need to stop it each time.

this is the sample code I'm using to send an email from the glue script :

import boto3
from botocore.exceptions import ClientError

# Replace sender@example.com with your "From" address.
# This address must be verified with Amazon SES.
SENDER = "test@test.com"

# Replace recipient@example.com with a "To" address. If your account 
# is still in the sandbox, this address must be verified.
RECIPIENT = "test.test@test.com"



# If necessary, replace us-west-2 with the AWS Region you're using for Amazon SES.
AWS_REGION = "us-east-1"

# The subject line for the email.
SUBJECT = "Amazon SES Test (SDK for Python)"

# The email body for recipients with non-HTML email clients.
BODY_TEXT = ("Amazon SES Test (Python)\r\n"
             "This email was sent with Amazon SES using the "
             "AWS SDK for Python (Boto)."
            )
            
# The HTML body of the email.
BODY_HTML = """<html>
<head></head>
<body>
  <h1>Amazon SES Test (SDK for Python)</h1>
  <p>This email was sent with
    <a href='https://aws.amazon.com/ses/'>Amazon SES</a> using the
    <a href='https://aws.amazon.com/sdk-for-python/'>
      AWS SDK for Python (Boto)</a>.</p>
</body>
</html>
            """            

# The character encoding for the email.
CHARSET = "UTF-8"

# Create a new SES resource and specify a region.
client = boto3.client('ses',region_name=AWS_REGION)

# Try to send the email.
try:
    #Provide the contents of the email.
    response = client.send_email(
        Destination={
            'ToAddresses': [
                RECIPIENT,
            ],
        },
        Message={
            'Body': {
                'Html': {
                    'Charset': CHARSET,
                    'Data': BODY_HTML,
                },
                'Text': {
                    'Charset': CHARSET,
                    'Data': BODY_TEXT,
                },
            },
            'Subject': {
                'Charset': CHARSET,
                'Data': SUBJECT,
            },
        },
        Source=SENDER,
    )
# Display an error if something goes wrong.	
except ClientError as e:
    print(e.response['Error']['Message'])
else:
    print("Email sent! Message ID:"),
    print(response['MessageId'])
    

I there any ting wrong that makes the glue job stuck?

Thanks for your help.

3 Answers
0
Accepted Answer

Hi, based on your troubleshooting, the issue is that Glue needs to connect to the SES endpoint that by default is reached over the internet. To avoid the need to open the outbound rule or to find out the SES IP-range to restrict the rule, you could look at creating a VPC endpoint for SES.

The suggestion to split the job remains valid, you might want to exit the job with an error code if the store procedure fails, and then in a subsequent job send the notification, in this case you would not need to set up outbound rules or VPC endpoints for SES because the second (python shell) job would not be linked to the VPC and could reach SES out-of-the-box .

hope this helps.

AWS
EXPERT
answered 2 years ago
profile picture
EXPERT
reviewed 2 months ago
  • If I fail the job due to the failure of the SP in this case I can use the solution above to send an email (event bridge + lambda). But this let that every time the SP fails I need to re-run the whole job from scratch even if only the SP only fails and the job itself a succeeded. Otherwise, you mention second (python shell) job would not be linked to the VPC and could reach SES out-of-the-box, so from where this second python will run? outside the vpc?

0

The code look ok, Have you completed prerequisite such as Verified email address? you may try to send email directly via console. https://docs.aws.amazon.com/ses/latest/dg/event-publishing-cloudwatch-tutorial-prerequisites.html

AWS
answered 2 years ago
  • Thanks for your answer, in fact, I found the issue. I added an outbound rule into the SG of my RDS instance (in which glue injects his workers ?) to allow all traffic and the email was sent.

    I added an outbound rule: All traffic to 0.0.0.0/0 only to test and make sure that the issue comes from the SG.

    now what will be the best rule to add to my SG to allow only the necessary outbound traffic to SES?

    thank you!!

0

Hello,

Re: When I run My job I make soo much time running and it doesn't do anything (no failure, no success, it keeps running and nothing happens) - This could happen for any number of reasons. Can you please check the Cloudwatch logs (Log group /aws-glue/jobs/error or /aws-glue/python-jobs/error depending on jobType) for Glue ETL JobRun to see if there is any error reported or the job is stuck at a certain step?

My recommendation here would be to separate the email notification script above from your actual ETL script and use event-driven architecture to achieve this. You can make use of AWS Lambda and AWS EventBridge (Cloudwatch events) to capture Glue ETL JobRun status and notify!

Please refer AWS Glue Docs - Automating AWS Glue with CloudWatch Events for information on different types of events emitted by AWS Glue. For your use case, I believe "detail-type":"Glue Job State Change" event will be appropriate.

  • Create an AWS Lambda function which consumes the event and prepares email content based on your requirements (sample script below).
  • Create an event rule in AWS EventBridge (based on event pattern) - Ref. Creating Amazon EventBridge rules that react to events with following configuration:
    • Define pattern: Event pattern
    • Event matching pattern: Custom Pattern
    • Pattern: Use one of the patterns from samples below.
    • Event Bus: AWS default event bus
    • Target: Lambda Function and select the lambda function created.
  • Save Event rule
  • Run a Glue ETL job to test

Sample Event patterns:

To capture all JobRuns:

{
  "source": ["aws.glue"],
  "detail-type": ["Glue Job State Change"]
}

To capture failed/stopped/timeout JobRuns:

{
  "source": ["aws.glue"],
  "detail-type": ["Glue Job State Change"],
  "detail": {
    "state": ["FAILED", "TIMEOUT", "STOPPED"]
  }
}

Sample lambda script to handle Glue Job State event:

import sys
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# edit the section below to accommodate values available in specific event type. Samples available at https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types
def lambda_handler(event, context):
    logger.info('Event: %s', event)
    payload_vars = {
        "id": event['id'], 
        "detail-type": event['detail-type'], 
        "account": event['account'], 
        "time": event['time'], 
        "region": event['region'], 
        "jobName": event['detail']['jobName'], 
        "severity": event['detail']['severity'], 
        "state": event['detail']['state'], 
        "jobRunId": event['detail']['jobRunId'], 
        "message": event['detail']['message']
    }
    # Edit the following message as per your requirements. you can build HTML messages instead of plaintext message below.
    message = """Job State Changed for AWS Glue Job  :alert:

    Job Name: {jobName}
    Account ID: {accountId}
    Region: {region}
    JobRun ID: {jobRunId}
    Event ID: {eventId}
    Status: {state}
    Message: {message}
    Timestamp: {timestamp}
    """.format(jobName=payload_vars['jobName'], accountId=payload_vars['account'], region=payload_vars['region'], jobRunId=payload_vars['jobRunId'], eventId=payload_vars['id'], state=payload_vars['state'], message=payload_vars['message'], timestamp=payload_vars['time'])

    # INSERT SES EMAIL SEND LOGIC HERE
AWS
SUPPORT ENGINEER
answered 2 years ago
AWS
EXPERT
reviewed 2 years ago
  • Thank you for your answers, my case is a bit specific, in my glue job I call an RDS stored procedure and it happens that the glue job itself succeeded but the stored procedure fails. that's why I want to capture the result of the stored procedure from within the glue job and send an email in case of job success but SP fails. I found the solution, it was the SG (no appropriate outbound rules) that blocks email sending. just now want to know which specific SG outbound rule should I add to allow sending email? from within private SN where glue inject his workers? Thanks

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions