How to use Pandas API on Spark (Glue) - ModuleNotFoundError: No module named 'pyspark_pandas'

0

On AWS Glue 2.0, I would like to try out the Pandas API on Spark. I follow this tutorial: https://spark.apache.org/docs/3.2.1/api/python/getting_started/quickstart_ps.html

    import pandas as pd
    import numpy as np
    import pyspark.pandas as ps
    from pyspark.sql import SparkSession

    s = ps.Series([1, 3, 5, np.nan, 6, 8])

I followed instructions how to include the library pyspark_pandas https://pypi.org/project/pyspark-pandas/ in a .zip File and copied it to an S3 bucket, and added under AWS Glue Job Details under Python library path It does not matter if I include the directory pyspark_pandas in the root folder of the pyspark_pandas.zip file, or if the files are already in the root folder of the .zip file, it does not change anything.

Can please someone support how I can achieve to import pyspark.pandas in a Glue job, or perhaps also tell me that it is nonsense what I am trying to do?

wkl3nk
preguntada hace un año927 visualizaciones
2 Respuestas
2

Pandas API on Spark was introduced in Spark 3.2. You'll need to execute your job using AWS Glue 4.0, which comes with Spark 3.3.0.

Steps:

  1. Create a new Job using Spark script editor -- Note that AWS Glue interactive sessions is not yet available for AWS Glue 4.0
  2. Select "Create a new script with boilerplate code" (default)
  3. After the line job.init(), write your code. For example:
import pyspark.pandas as ps
import numpy as np

s = ps.Series([1, 3, 5, np.nan, 6, 8])
AWS
respondido hace un año
0

Take a look at this page : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#glue20-modules-provided

If you have a custom .py file which you are putting into a zip and uploading to Glue Job script locations , be sure to have the right path .

respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas