- Newest
- Most votes
- Most comments
Hi,
To run a Hadoop Jar file (a MapReduce job) on an AWS EMR cluster in CLI mode, you can follow these steps:
-
Open a terminal or command prompt and connect to your AWS EMR cluster using SSH. You can find the SSH command in the EMR console by selecting your cluster and clicking on "Connect"->"SSH".
-
Once connected to the cluster, navigate to the directory where your JAR file is located using the
cd
command. For example,cd /path/to/jar/files/
. -
Use the
hadoop
command to submit your MapReduce job to the cluster. The command syntax is as follows:
hadoop jar <JAR_FILE> <MAIN_CLASS> [optional arguments]
Replace <JAR_FILE>
with the name of your JAR file (e.g., myjob.jar
) and <MAIN_CLASS>
with the main class of your MapReduce job.
-
Provide any additional required arguments specific to your job. These will depend on how your MapReduce job is set up. You can consult the documentation or README file that came with your job for details on these arguments.
-
Execute the Hadoop command. For example:
hadoop jar myjob.jar com.example.MyJob -input s3://input-bucket/input-file -output s3://output-bucket/output-dir
In this example, com.example.MyJob
is the main class, and -input
and -output
are arguments specific to the job.
- Monitor the progress of your job. Once the job is submitted, you will see a job ID in the console output. You can use this ID to track the progress of the job using the
yarn
orhadoop
command, depending on your Hadoop version. For example:
yarn application -status <job_id>
Replace <job_id>
with the actual job ID provided by the previous step.
That's it! Your MapReduce job should now be running on your AWS EMR cluster. You can check the output and logs using the S3 paths or other configured output locations for your specific job.
Relevant content
- asked 2 years ago
- asked a year ago
- Accepted Answerasked 5 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago