AWS Glue 3.0 via Docker container - local development

0

Attempting a basic glue example inside the latest amazon glue container fails with

An error occurred while calling o47.getDynamicFrame.
: java.io.FileNotFoundException:  (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at com.amazonaws.glue.jdbc.commons.CustomCertificateManager.importCustomJDBCCert(CustomCertificateManager.java:127)
	at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:952)
	at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:738)
	at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:738)
	at com.amazonaws.services.glue.util.JDBCWrapper.tableDF(JDBCUtils.scala:864)
	at com.amazonaws.services.glue.util.NoCondition$.tableDF(JDBCUtils.scala:86)
	at com.amazonaws.services.glue.util.NoJDBCPartitioner$.tableDF(JDBCUtils.scala:172)
	at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:967)
	at com.amazonaws.services.glue.DataSource.getDynamicFrame(DataSource.scala:99)
	at com.amazonaws.services.glue.DataSource.getDynamicFrame$(DataSource.scala:99)
	at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:714)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

Am I supposed to be passing what looks to be a certificate manager of some sort, if so how do I do that?

I'm loading this up in a very simple way

def spark_sql_query(
    glueContext, 
    query, 
    mapping) -> DynamicFrame:

    for alias, frame in mapping.items():
        frame.toDF().createOrReplaceTempView(alias)
    
    result = spark.sql(query)
    
    return DynamicFrame.fromDF(result, glueContext)


def jdbc_source(
    glueContext,
    connectionName,
    table,
) -> DynamicFrame:

    connection_options = {
        "useConnectionProperties": "true",
        "dbtable": table,
        "connectionName": connectionName,
        "customJdbcDriverS3Path": "s3://my_path/mysql-connector-java-5.1.49.jar", 
        "customJdbcDriverClassName": "com.mysql.jdbc.Driver"
    }

    return glueContext.create_dynamic_frame.from_options(
        connection_type='mysql',
        connection_options=connection_options
    )


def main():
    data_source = jdbc_source(
        glueContext=glueContext, 
        connectionName='my_connection_name',
        table="my_table")

    spark_sql_query(
        glueContext=glueContext,
        query=query,
        mapping={"myDataSource": data_source}
    ).show()

    job.commit()
asked a year ago235 views
2 Answers
0

You are taking properties from "connectionName", since you have "useConnectionProperties" enabled, seems that connection points to a relative certificate, so the engine is looking for it in the local filesystem instead of s3.
If normally you provide the certificate file to Glue via --extra-files, you would need to put that file inside the container manually so Glue can find it.

profile pictureAWS
EXPERT
answered a year ago
-1

This is how I run my local env. On the command prompt , run :

  1. Note the MyProfile is your AWS profile set in your local system
  2. Replace the path and all other paths containing <UserName> as appropriate
PROFILE_NAME="MyProfile"
JUPYTER_WORKSPACE_LOCATION=/Users/<UserName>/WorkSpace/jupyter_workspace/
WORKSPACE_LOCATION=/Users/<UserName>/WorkSpace/jupyter_workspace/

docker run -it -v ~/.aws:/home/glue_user/.aws -v $JUPYTER_WORKSPACE_LOCATION:/home/glue_user/workspace/jupyter_workspace/ -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 -p 8998:8998 -p 8888:8888 --name glue_jupyter_lab amazon/aws-glue-libs:glue_libs_3.0.0_image_01-arm64 /home/glue_user/jupyter/jupyter_start.sh

Also follow : https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/ Make a note of the system that you are running on . I run it on Mac M1 chip , and find the arm64 image appropriate

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions