Tab completion is not working for pyspark kernels in EMR studio

0

When selecting a pyspark kernel for a notebook in EMR studio, tab completion and tooltips (with shift-Tab) are not working as expected. This is especially true for for attribute listing after a dot (.), for example typing in a shell

import os
os.

and pressing tab does not offer any suggestion. Same goes of course for spark session object, both the automatically created one and a manually created one with a SparkSession.builder.appName() call. Things are working fine with a Python3 kernel. I have tried with EMR cluster versions emr-6.6.0 and emr-6.8.0.

nikos64
preguntada hace 2 años91 visualizaciones
1 Respuesta
0

Hello

I can fully appreciate how useful tab-completion is for writing Python code in Jupyter, and that having this same functionality for PySpark would save a lot of time looking at reference material. As you may know, PySpark and Jupyter are opensource software distributions, and thus the features available to us are limited by what is developed in their respective communities. From the AWS side, EMR Notebook uses SparkMagic to communicate with EMR cluster to run spark jobs on EMR Cluster. This system (SparkMagic) uses Apache Livy to communicate with EMR cluster. Apache Livy today does not support any API to perform Intellisense autocomplete feature. Hence this functionality is not possible with EMR Notbeooks using SparkMagic kernels. For more information you can refer to the link for this issue 1. I saw there is an internal ticket for the service team to raise this feature but there is no ETA on this feature release. You can keep an eye out for the announcement here 2 3.

For the workaround, EMR supports OnCluster mode 4. You can enable on-cluster execution mode on Notebook, which will allow you to install new spark native kernels such as (Apache Toree) on EMR Cluster. Using this native kernels you can perform auto completion.

Another way I found from a third-party blog post link which is quite useful is about how you can install your own jupyter in EMR cluster. You can refer to it here 5 for more information. There is also some discussion here 6 about how to get autocomplete in jupyter notebook without using tab using Hinterland or TabNine for your to refer to. A general recommendation, since they are third party, please test them thoroughly before deploying this into production.

I hope the above information helps.

AWS
respondido hace 13 días

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas