Glue running in Docker not able to find com.mysql.cj.jdbc.Driver

0

Following along with this blog post I'm attempting to debug/breakpoint my glue tasks running in VS Code using amazon/aws-glue-libs:glue_libs_3.0.0_image_01.

I can get up to the point where the job executes and I can step through the code right up until the point I try and connect to RDS to fetch data. As soon as I do I get back

An error occurred while calling o47.getDynamicFrame.
: java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at com.amazonaws.services.glue.util.JDBCUtils.loadDriver(JDBCUtils.scala:214)
	at com.amazonaws.services.glue.util.JDBCUtils.loadDriver$(JDBCUtils.scala:212)
	at com.amazonaws.services.glue.util.MySQLUtils$.loadDriver(JDBCUtils.scala:490)
	at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:746)
	at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:1006)
	at com.amazonaws.services.glue.JDBCDataSource.$anonfun$getJdbcJobBookmark$1(DataSource.scala:878)
	at scala.collection.MapLike.getOrElse(MapLike.scala:131)
	at scala.collection.MapLike.getOrElse$(MapLike.scala:129)
	at scala.collection.AbstractMap.getOrElse(Map.scala:63)
	at com.amazonaws.services.glue.JDBCDataSource.getJdbcJobBookmark(DataSource.scala:878)
	at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:953)
	at com.amazonaws.services.glue.DataSource.getDynamicFrame(DataSource.scala:99)
	at com.amazonaws.services.glue.DataSource.getDynamicFrame$(DataSource.scala:99)
	at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:714)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

I'm not sure how to solve this problem. I see in the blog post its mentioned that I can pass in extra libraries, however when I look in /home/glue_user/aws-glue-libs/jars I can see a jar named mssql-jdbc-7.0.0.jre8.jar so I'm not so sure thats the problem. I should mention this job runs without a problem when deployed to AWS.

I'm currently starting up the amazon/aws-glue-libs:glue_libs_3.0.0_image_01 using a very basic docker-compose file

version: "3.8"
services:
  glue:
    container_name: "glue-local-development"
    image: amazon/aws-glue-libs:glue_libs_3.0.0_image_01
    ports:
      - "4040:4040"
      - "18080:18080"
    environment:
      - DISABLE_SSL=true
      - AWS_PROFILE=my_profile
    volumes:
      - ~/.aws:/home/glue_user/.aws
      - ${PWD}:/home/glue_user/workspace/
    stdin_open: true

Then connecting as per the blog post. Is there something else I have to do here?

I don't think I should have to manually load in the mysql jars?

I've been stuck at this point for awhile so would really appreciate any help or suggestions people have

Edit:

Interestingly when I attempt to run amazon/aws-glue-libs:glue_libs_2.0.0_image_01 it fails with a very similar but different error

: An error occurred while calling o49.getDynamicFrame.
: java.io.FileNotFoundException:  (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at com.amazonaws.glue.jdbc.commons.CustomCertificateManager.importCustomJDBCCert(CustomCertificateManager.java:127)
        at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:947)
        at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:734)
        at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:734)
        at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:747)
        at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:996)
        at com.amazonaws.services.glue.JDBCDataSource$$anonfun$33.apply(DataSource.scala:868)
        at com.amazonaws.services.glue.JDBCDataSource$$anonfun$33.apply(DataSource.scala:868)
        at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
        at scala.collection.AbstractMap.getOrElse(Map.scala:59)
        at com.amazonaws.services.glue.JDBCDataSource.getJdbcJobBookmark(DataSource.scala:868)
        at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:943)
        at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:97)
        at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:707)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)
posta un anno fa977 visualizzazioni
1 Risposta
0

Hi,

I see that you are receiving the following error while trying to connect to RDS when you are following the blog post “Develop and test AWS Glue version 3.0 jobs locally using a Docker container” :

==========

An error occurred while calling o47.getDynamicFrame. : java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver

==========

I believe the issue here is with the class name “com.mysql.cj.jdbc.Driver”. You can provide a custom driver by uploading the connector "MySQL connector J 5.1.49" to S3 and referencing it in the parameter "customJdbcDriverS3Path". The glue code snippet for example would look similar to :

==========

connection_mysql_options = { "url": "jdbc:mysql://<host>:3306/dbName", "dbtable": <dbTableName>, "user": <userName>, "password": <password, "customJdbcDriverS3Path": "s3://s3bucket/path/mysql-connector-java-5.1.49.jar", "customJdbcDriverClassName": "com.mysql.jdbc.Driver" }

datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="mysql", connection_options=connection_mysql_options,transformation_ctx="datasource0")

==========

If the issue still persist, then please open a support case with AWS providing the connection details and code snippet used - https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

Thank you.

AWS
TECNICO DI SUPPORTO
con risposta un anno fa
  • Seems a bit odd that given that driver already exists in the container, and I'm trying to connect to a catalog via glue_context.create_dynamic_frame.from_catalog which works in production, that I'd have to go and change all my code just to debug a glue job in a docker container?

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande