- Newest
- Most votes
- Most comments
Are you receiving any specific error during installation? Please see below documentation related to installing python libraries in EMR notebooks:
Unfortunately, the contents posted by @Taka_M are old.
I have same question, and posted the answer at https://stackoverflow.com/a/77750780/2000548
It also helps automate creating JupterLab kerner during EMR provision.
Here is a copy:
I found out JupterLab Python is separate with the EMR cluster custom Python version.
I need first create a new conda Python 3.11 environment for JupterLab, and then register it as a new kernel.
As the JupterLab got installed after the bootstrap script, so I need add a EMR step with script:
#!/usr/bin/env bash set -e echo "# Install JupyterLab-scoped dependencies" PYTHON_VERSION=3.11.7 sudo /emr/notebook-env/bin/conda create --name="python${PYTHON_VERSION}" python=${PYTHON_VERSION} --yes sudo "/emr/notebook-env/envs/python${PYTHON_VERSION}/bin/python" -m pip install \ apache-sedona[spark]==1.5.0 \ attrs==23.1.0 \ descartes==1.1.0 \ ipykernel==6.28.0 \ matplotlib==3.8.2 \ pandas==2.1.4 \ shapely==2.0.2 echo "# Add JupyterLab kernel" sudo "/emr/notebook-env/envs/python${PYTHON_VERSION}/bin/python" -m ipykernel install --name="python${PYTHON_VERSION}"
Now the new Python 3.11 kernel shows in the JupterLab:
And it prints correct Python version:
import sys print(sys.version_info) # sys.version_info(major=3, minor=11, micro=7, releaselevel='final', serial=0)
Reference:
Relevant content
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
Had gone through the above links. I am able to install local python packages using %%local pip install <packagename> in jupyter notebook pyspark kernel. But i had to do this action everytime for each notebook session. Whether additional local python packages can be directly installed using bootstrap action in pyspark kernels ?