AWS Glue - Read a 'local' file in Python


In AWS Glue I use a legacy Python package that reads a constant json file from the same package. To simplify things, let's say testLib package has a and data.json files. has a function:

def test():
    f = open('data.json')

I uploaded to S3 and use it in the AWS Glue job:

from test_lib import test


The AWS Glue job fails with the error: FileNotFoundError: [Errno 2] No such file or directory: 'data.json'
There are many questions about how to manage files on S3 in AWS Glue. But it requires changing the legacy package that I don't want to do.
Is there any way to configure AWS Glue job to allow open 'local' files?

gefragt vor 2 Jahren3233 Aufrufe
1 Antwort

For a pythonshell job, dependency should be packaged as .egg or .whl file.

To refer to a file or .zip file in S3 from a Glue ETL pythonshell job, add these file path under 'Referenced files path' (ex: s3://bucketName/foldername/depFile1.txt, s3://bucketName/foldername/depFile2.txt) or 'Python library path'

These files are available in a directory under '/temp'. Directory has naming structure like 'glue-python-libs*'

These files can be referred in ETL job using following code snippet:

import sys, os
dirs = os.listdir('.')

gluelib = list(filter(lambda c: c.startswith('glue-python-libs-'),dirs))

## to print files/directories added in job configuration

## to read a file added to job configuration 'Referenced files path'
with open('/tmp/'+gluelib[0]+'/depFile1.txt', "r") as f:
    lines = f.readlines()
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen