AWS Glue - Read a 'local' file in Python

0

In AWS Glue I use a legacy Python package that reads a constant json file from the same package. To simplify things, let's say testLib package has a test-lib.py and data.json files. test-lib.py has a function:

def test():
    f = open('data.json')

I uploaded testLib.zip to S3 and use it in the AWS Glue job:

from test_lib import test

test()

The AWS Glue job fails with the error: FileNotFoundError: [Errno 2] No such file or directory: 'data.json'
There are many questions about how to manage files on S3 in AWS Glue. But it requires changing the legacy package that I don't want to do.
Is there any way to configure AWS Glue job to allow open 'local' files?

Alex
preguntada hace 2 años3232 visualizaciones
1 Respuesta
1

For a pythonshell job, dependency should be packaged as .egg or .whl file.

To refer to a file or .zip file in S3 from a Glue ETL pythonshell job, add these file path under 'Referenced files path' (ex: s3://bucketName/foldername/depFile1.txt, s3://bucketName/foldername/depFile2.txt) or 'Python library path'

These files are available in a directory under '/temp'. Directory has naming structure like 'glue-python-libs*'

These files can be referred in ETL job using following code snippet:

import sys, os
dirs = os.listdir('.')

gluelib = list(filter(lambda c: c.startswith('glue-python-libs-'),dirs))

## to print files/directories added in job configuration
print(os.listdir(gluelib[0])) 

## to read a file added to job configuration 'Referenced files path'
with open('/tmp/'+gluelib[0]+'/depFile1.txt', "r") as f:
    lines = f.readlines()
    print(lines)
AWS
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas