I have a need to send traces from AWS Glue job (written in Python script) to AWS X-Ray. Since X-Ray does not support AWS Glue out of the box, I needed to write little more code to instrument Python script to be able to send traces. I found this link from Chariot Solutions and tried to follow the steps but it's not working, it doesn't give error either. According to this article, it seems we don't even need to spin up a daemon because we have custom emitter.
Here is code
import boto3
import io
import json
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.context import SparkContext
import aws_xray_sdk.core
aws_xray_sdk.core.patch_all()
class DirectEmitter:
def __init__(self):
self.xray_client = None # lazily initialize
def send_entity(self, entity):
if not self.xray_client:
self.xray_client = boto3.client('xray')
segment_doc = json.dumps(entity.to_dict())
self.xray_client.put_trace_segments(TraceSegmentDocuments=[segment_doc])
def set_daemon_address(self, address):
pass
@property
def ip(self):
return None
@property
def port(self):
return None
aws_xray_sdk.core.xray_recorder.configure(
emitter=DirectEmitter(),
context_missing='LOG_ERROR',
sampling=False)
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
zip_file = 'data.zip'
bucket_name = 'mybucket-dev-etl-1'
output_folder = 'myfolder/obf/output'
raw_folder = 'myfolder/obf/raw'
segment = aws_xray_sdk.core.xray_recorder.begin_segment('segment_name')
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket_name, Key=zip_file)
zip_data = io.BytesIO(obj['Body'].read())
segment.put_metadata('key', 'krish-dict', 'namespace')
subsegment = aws_xray_sdk.core.xray_recorder.begin_subsegment('subsegment_name')
with aws_xray_sdk.core.xray_recorder.capture('subsegment_name'):
extracted_files = extract_zip(zip_data) #this line calls exernal library to extract the file but library is not imported here for security
for file_name, file_content in extracted_files.items():
subsegment.put_annotation('key', 'krish-value')
s3.put_object(Bucket=bucket_name, Key=f'{raw_folder}/{file_name}', Body=file_content)
print('extracting complete')
job.commit()
aws_xray_sdk.core.xray_recorder.end_subsegment()
aws_xray_sdk.core.xray_recorder.end_segment()
Am I missing anything here? It seems Daemon is not working for some reason, may be because there is no daemon? but my understanding is that if we have that custom emitter, we don't need to create separate daemon running explicitly?
Your any comment or advise would be much appreciated.
Thanks so much for the response. I am with you, I thought that how would it work with having x-ray to run in between Glue and X-Ray service. Why I got confused was - when I read the first sentence here (https://chariotsolutions.com/blog/post/analyzing-glue-jobs-with-aws-x-ray/#:~:text=going%20to%20recommend.-,A%20Custom%20Emitter,-Fortunately%2C%20you%20don%E2%80%99t) under "Custom Emitter" and the paragraph just before "Custom Emitter", it made me feel that we don't really need to explicitly run daemon. On the top of that, Author also didn't configure xray daemon's address in the following section, I think author should have mentioned added daemon in the following configuration?
xray_recorder.configure( emitter=DirectEmitter(), context_missing='LOG_ERROR', sampling=False)
I am trying to run the Daemon and see if that works. I will update