Set classpath for dependent JARS in AWS Glue


I need to run Java code in my (PySpark) Glue job and I tried to follow the tutorial (Btw. there seems to be no Glue-specific instruction or documentation how to work with JARs configured as dependent JARs in the Glue job details.)

I uploaded my JAR to S3 and configured the S3 URI as dependent JAR in my job details. I also checked with print(spark_context.getConf().getAll()), that it appears as spark.glue.extra-jars in the spark config.

But when I try to register my UDF as

spark_session.udf.registerJavaFunction(name="square_test", javaClassName="SquareTest", returnType=pyspark.sql.types.IntegerType())

I get the error

AnalysisException: Can not load class SquareTest, please make sure it is on the classpath.

How do I add my JAR to the classpath?

Here is my Java code:


public class SquareTest implements UDF1<Integer, Integer>{

    public Integer call(Integer number) throws Exception {
        return number*number;

and this is how I compiled it:

javac -classpath lib/spark-sql_2.12-3.4.1.jar -d ./build
jar cvf SquareTest.jar build/*
asked a year ago1159 views
1 Answer
Accepted Answer

According the Spark documentation, javaClassName is the fully qualified class name but you are only specifying the classname.
Try with javaClassName =""

profile pictureAWS
answered 10 months ago
  • Unfortunately, this does not solve my problem. I get the same error:

    2023-07-20 10:21:23,724 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
      File "/tmp/", line 37, in <module>
        spark_session.udf.registerJavaFunction(name="square_test", javaClassName="", returnType=pyspark.sql.types.IntegerType())
      File "/opt/amazon/spark/python/lib/", line 409, in registerJavaFunction
        self.sparkSession._jsparkSession.udf().registerJava(name, javaClassName, jdt)
      File "/opt/amazon/spark/python/lib/", line 1305, in __call__
        answer, self.gateway_client, self.target_id,
      File "/opt/amazon/spark/python/lib/", line 117, in deco
        raise converted from None
    pyspark.sql.utils.AnalysisException: Can not load class, please make sure it is on the classpath
  • In what package is your Java Class?

    Maybe it is not on package

  • @acmanjon, I changed my Java code to

    package mytest;
    public class SquareTest implements UDF1<Integer, Integer>{
        public Integer call(Integer number) throws Exception {
            return number*number;

    and the python registration to

    spark_session.udf.registerJavaFunction(name="square_test", javaClassName="mytest.SquareTest", returnType=pyspark.sql.types.IntegerType())

    but I get the same error:

    AnalysisException: Can not load class mytest.SquareTest, please make sure it is on the classpath.
  • Indeed, it was a problem with my packaging. It now works in the latest version of my code from above and with the JAR looking like

    $ jar -tf SquareTest.jar

    Thanks for your help and fast replies, @acmanjon and Gonzalo Herreros!

