I want to troubleshoot Amazon EMR Workspace connectivity issues.
Short description
On the Amazon EMR console, EMR Notebooks merged with Amazon EMR Studio Workspaces. To use EMR Studio, you must configure an additional Identity and Access Management (IAM) role and permissions. You can also create and configure different Workspaces to run the notebooks.
The following are errors that you might receive when you try to connect an Amazon EMR notebook to an Amazon EMR cluster:
"Unable to attach to cluster <example-cluster> Reason: Attaching the workspace (notebook) failed. Internal error."
"Notebook is not supported in the chosen Availability Zone. Please try using a cluster in another availability zone."
"Attaching the workspace (notebook) failed. Invalid configuration."
"Workspace (notebook) is stopped. Cluster <example-cluster> does not have JupyterEnterpriseGateway application installed. Please retry with another cluster."
"Workspace errors: Not able to attached EMR notebook to running cluster. Error starting kernel. HTTP 403: Forbidden (Workspace is not attached to cluster. Click 'Ok' to continue)."
Resolution
Before you begin to troubleshoot your connectivity issues, take the following actions:
Note: For a list of known issues and solutions for Amazon EMR Studio and Workspaces, see Known issues.
Unable to attach to example-cluster Reason: Attaching the workspace (notebook) failed. Internal error
This error occurs when the livy.impersonation.enabled variable is set to true on Amazon EMR clusters. By default, this variable is set to true on Amazon EMR version 6.4.0. Also, HttpFS is turned off by default on Amazon EMR notebooks that have Livy user impersonation turned off. As a result, the Amazon EMR notebook can't connect to clusters that have Livy impersonation turned on.
To resolve this issue, use an earlier or later version of Amazon EMR 6.4 that runs the hadoop-httpfs service. Or, restart the hadoop-httpfs service on the cluster.
To restart the hadoop-httpfs service, complete the following steps:
-
Use SSH to connect to the Amazon EMR primary node.
-
Start the hadoop-httpfs service:
sudo systemctl start hadoop-httpfs
-
(Optional) Use an Amazon EMR step to start the hadoop-httpfs service:
==========
JAR location: command-runner.jar
Main class: None
Arguments: bash -c "sudo systemctl start hadoop-httpfs"
Action on failure: Continue
==========
-
Check the status of HttpFS:
$ sudo systemctl status hadoop-httpfs
Example output:
hadoop-httpfs.service - Hadoop httpfs
Loaded: loaded (/etc/systemd/system/hadoop-httpfs.service; disabled;
vendor preset: disabled)
Active: active (running)...
-
Reattach the Amazon EMR cluster.
Workspace errors
The following are common workspace errors that occur when you try to connect your Amazon EMR cluster to an EMR notebook:
- Not able to attach EMR notebook to a running cluster.
- Error starting kernel.
- HTTP 403: Forbidden (Workspace is not attached to cluster. Choose Ok to continue).
These errors occur because the AWS root account isn't authorized to attach Amazon EMR notebooks to Amazon EMR clusters. The root user is an unauthorized user to start kernels. If the value of KERNEL_USERNAME appears in the unauthorized_users list, then the request to connect fails. For more information, see Security features on the Jupyter Enterprise Gateway website.
To resolve these errors, create an IAM user, and then attach the cluster to the notebook.