AWS Glue - Read a 'local' file in Python

0

In AWS Glue I use a legacy Python package that reads a constant json file from the same package. To simplify things, let's say testLib package has a test-lib.py and data.json files. test-lib.py has a function:

def test():
    f = open('data.json')

I uploaded testLib.zip to S3 and use it in the AWS Glue job:

from test_lib import test

test()

The AWS Glue job fails with the error: FileNotFoundError: [Errno 2] No such file or directory: 'data.json'
There are many questions about how to manage files on S3 in AWS Glue. But it requires changing the legacy package that I don't want to do.
Is there any way to configure AWS Glue job to allow open 'local' files?

Alex
asked 2 years ago3042 views
1 Answer
1

For a pythonshell job, dependency should be packaged as .egg or .whl file.

To refer to a file or .zip file in S3 from a Glue ETL pythonshell job, add these file path under 'Referenced files path' (ex: s3://bucketName/foldername/depFile1.txt, s3://bucketName/foldername/depFile2.txt) or 'Python library path'

These files are available in a directory under '/temp'. Directory has naming structure like 'glue-python-libs*'

These files can be referred in ETL job using following code snippet:

import sys, os
dirs = os.listdir('.')

gluelib = list(filter(lambda c: c.startswith('glue-python-libs-'),dirs))

## to print files/directories added in job configuration
print(os.listdir(gluelib[0])) 

## to read a file added to job configuration 'Referenced files path'
with open('/tmp/'+gluelib[0]+'/depFile1.txt', "r") as f:
    lines = f.readlines()
    print(lines)
AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions