AWS Glue 3.0 via Docker container - local development

0

Attempting a basic glue example inside the latest amazon glue container fails with

An error occurred while calling o47.getDynamicFrame.
: java.io.FileNotFoundException:  (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at com.amazonaws.glue.jdbc.commons.CustomCertificateManager.importCustomJDBCCert(CustomCertificateManager.java:127)
	at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:952)
	at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:738)
	at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:738)
	at com.amazonaws.services.glue.util.JDBCWrapper.tableDF(JDBCUtils.scala:864)
	at com.amazonaws.services.glue.util.NoCondition$.tableDF(JDBCUtils.scala:86)
	at com.amazonaws.services.glue.util.NoJDBCPartitioner$.tableDF(JDBCUtils.scala:172)
	at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:967)
	at com.amazonaws.services.glue.DataSource.getDynamicFrame(DataSource.scala:99)
	at com.amazonaws.services.glue.DataSource.getDynamicFrame$(DataSource.scala:99)
	at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:714)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

Am I supposed to be passing what looks to be a certificate manager of some sort, if so how do I do that?

I'm loading this up in a very simple way

def spark_sql_query(
    glueContext, 
    query, 
    mapping) -> DynamicFrame:

    for alias, frame in mapping.items():
        frame.toDF().createOrReplaceTempView(alias)
    
    result = spark.sql(query)
    
    return DynamicFrame.fromDF(result, glueContext)


def jdbc_source(
    glueContext,
    connectionName,
    table,
) -> DynamicFrame:

    connection_options = {
        "useConnectionProperties": "true",
        "dbtable": table,
        "connectionName": connectionName,
        "customJdbcDriverS3Path": "s3://my_path/mysql-connector-java-5.1.49.jar", 
        "customJdbcDriverClassName": "com.mysql.jdbc.Driver"
    }

    return glueContext.create_dynamic_frame.from_options(
        connection_type='mysql',
        connection_options=connection_options
    )


def main():
    data_source = jdbc_source(
        glueContext=glueContext, 
        connectionName='my_connection_name',
        table="my_table")

    spark_sql_query(
        glueContext=glueContext,
        query=query,
        mapping={"myDataSource": data_source}
    ).show()

    job.commit()
gefragt vor einem Jahr262 Aufrufe
2 Antworten
0

You are taking properties from "connectionName", since you have "useConnectionProperties" enabled, seems that connection points to a relative certificate, so the engine is looking for it in the local filesystem instead of s3.
If normally you provide the certificate file to Glue via --extra-files, you would need to put that file inside the container manually so Glue can find it.

profile pictureAWS
EXPERTE
beantwortet vor einem Jahr
-1

This is how I run my local env. On the command prompt , run :

  1. Note the MyProfile is your AWS profile set in your local system
  2. Replace the path and all other paths containing <UserName> as appropriate
PROFILE_NAME="MyProfile"
JUPYTER_WORKSPACE_LOCATION=/Users/<UserName>/WorkSpace/jupyter_workspace/
WORKSPACE_LOCATION=/Users/<UserName>/WorkSpace/jupyter_workspace/

docker run -it -v ~/.aws:/home/glue_user/.aws -v $JUPYTER_WORKSPACE_LOCATION:/home/glue_user/workspace/jupyter_workspace/ -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 -p 8998:8998 -p 8888:8888 --name glue_jupyter_lab amazon/aws-glue-libs:glue_libs_3.0.0_image_01-arm64 /home/glue_user/jupyter/jupyter_start.sh

Also follow : https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/ Make a note of the system that you are running on . I run it on Mac M1 chip , and find the arm64 image appropriate

beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen