This document lists the challenges and solution to connecting an Oracle DB from Glue Python shell.
Problem Statement:
Glue Python shell has to use the ‘Oracle’ python package to connect to Oracle database from the python shell jobs and until the ‘cx_Oracle’ version by default Oracle client library had to be present on the server/node for the ‘cx_Oracle’ package to work. Unfortunately installing an Oracle client is not an option due to the serverless nature of Glue and hence the below discussed solution.
Note: The latest version of python package ‘python-oracledb’ version provides a thin client but this version is not yet supported within Glue Python shell.
Work Around for ‘cx_Oracle’ Package:
Fundamental change to make ‘cx_Oracle’ work is to re-point the package to use Oracle client library files from a given set of files instead of the default client installation. Below mentioned activities are required for the workaround.
- Get Oracle client archive, patchelf and cx_Oracle packages.
- Remove symlinks and move the file they point to the one that is going to be looked up by ‘cx_oracle’.
- Patch the ‘rpath’ to point to a static directory
- Add libaio.so.1 in the archive
- Zip the archive
- Upload both client archive and cx-oracle to S3 bucket.
a. Put the S3 URL to the archive in "Referenced files path" configuration parameter
b. Put the S3 URL to the cx-Oracle wheel in "Python library path" configuration parameter
- Configure Glue Job by adding a bit of code to the glue job to set up the libraries. This code must be executed before any usage of cx-Oracle
Code Snippets:
Library Prep:
wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/p/patchelf-0.12-1.el7.x86_64.rpm
sudo yum install patchelf-0.12-1.el7.x86_64.rpm
python3 -m venv cxora18
source cxora18/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install cx_Oracle
cd ~/cxora18/lib64/python3.7/site-packages
wget https://download.oracle.com/otn_software/linux/instantclient/185000/instantclient-basic-linux.x64-18.5.0.0.0dbru.zip
unzip instantclient-basic-linux.x64-18.5.0.0.0dbru.zip
mkdir oralib
cp /lib64/libaio.so.1 ./oralib
cp .so ./oralib
cp -r ./instantclient_18_5 ./oralib
cd oralib
patchelf --set-rpath /tmp .so
chmod 755 -R ./instantclient_18_5/.so
patchelf --set-rpath /tmp ./instantclient_18_5/.so
aws s3 cp . s3://YOUR S3 BUCKET*/lib-ora18/ --recursive
Glue Job Code:
import zipfile
import os
from pathlib import Path
import glob
import cx_Oracle
filename = 'instantclient-basic-linuxx64_patched.zip'
oracle_archive = next(Path('/tmp').glob(f'**/{filename}'))
with zipfile.ZipFile(oracle_archive, 'r') as f:
f.extractall('/tmp/libs')
files = glob.glob('/tmp/**/*', recursive = True)
for file in files:
if file.find('/tmp/libs/instantclient_18_5/') > -1 and file.find('network') < 0:
os.rename(file, "/tmp/" + file.split("/")[-1])
cx_Oracle.init_oracle_client(lib_dir='/tmp')