Why does running a custom JAR on AWS EMR give a file system error - error 2 No such file or directory


I'm trying to setup a jupyterhub environment in AWS EMR. I've been following the instructions on the documentation without issue.

I now want to add a step during set up to add users. The documentation gives a snippet on how to do so.

Example: Bash script to add multiple users

The following sample bash script ties together the previous steps in this section to create multiple JupyterHub users. The script can be run directly on the main node, or it can be uploaded to Amazon S3 and then run as a step.

# Bulk add users to container and JupyterHub with temp password of username
set -x
USERS=(shirley diego ana richard li john mary anaya)
TOKEN=$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1)
for i in "${USERS[@]}"; 
   sudo docker exec jupyterhub useradd -m -s /bin/bash -N $i
   sudo docker exec jupyterhub bash -c "echo $i:$i | chpasswd"
   curl -XPOST --silent -k https://$(hostname):9443/hub/api/users/$i \
 -H "Authorization: token $TOKEN" | jq

Save the script to a location in Amazon S3 such as s3://mybucket/createjupyterusers.sh. Then you can use script-runner.jar to run it as a step.


After following the procedure above I successfully launched the EMR cluster.

I now want to use my own script in place of the shell script above but I'm having issues running it.

I get the following error:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/tez/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "/mnt/var/lib/hadoop/steps/s-23QLFU7JXPPM7/./add_users_ERM.sh" (in directory "."): error=2, No such file or directory
	at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:143)
	at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:58)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: Cannot run program "/mnt/var/lib/hadoop/steps/s-23QLFU7JXPPM7/./add_users_ERM_Linuxx.sh" (in directory "."): error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:96)
	... 7 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 8 more

Could some explain why running script-runner.jar to run my shell fails but works fine when using the shell in the tutorial?

add_users_ERM.sh for reference:


import os
import subprocess
import traceback
import sys

TOKEN="$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1)"

def users_from_text(file):
    #Now we have users and their team, we can create a user account and assign them to a team
    for user in file:
                username = user
                print(f"Adding {user}")
                cmd = ["sudo", "docker", "exec", "jupyterhub","useradd", "-m" ,"-s" ,"/bin/bash" ,"-N" ,username]
                p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
                output, error = p.communicate()
                output = output.strip().decode("utf-8")
                error = error.decode("utf-8")
                if p.returncode != 0:
                    print(f"Error adding user: {error}")
                    print(F"{user} was added")

                cmd = ["sudo", "docker", "exec", "jupyterhub","bash", "-c" ,f"echo {username}:{username} | chpasswd"]
                p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
                output, error = p.communicate()
                output = output.strip().decode("utf-8")
                error = error.decode("utf-8")
                if p.returncode != 0:
                    print(f"Error adding password: {error}")
                    print(F"{user} password was added")

                cmd = ["curl", "-XPOST", "--silent", "-k",f"https://$(hostname):9443/hub/api/users/{username}", "-H" ,f"Authorization: token {TOKEN}", "|", "jq"]
                p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
                output, error = p.communicate()
                output = output.strip().decode("utf-8")
                error = error.decode("utf-8")
                if p.returncode != 0:
                    print(f"Error adding user to JH: {error}")
                    print(F"{user}  was added to JH")

                #To do: Convert api call to subprocess request
    return output

test_data = ["worker_1", "worker_2", "worker_3", "worker_4", "worker_5", "worker_6"]
txt_file = test_data  
print("Attempting add_user.sh script")
output = users_from_text(txt_file)`
