How to use Pandas API on Spark (Glue) - ModuleNotFoundError: No module named 'pyspark_pandas'

0

On AWS Glue 2.0, I would like to try out the Pandas API on Spark. I follow this tutorial: https://spark.apache.org/docs/3.2.1/api/python/getting_started/quickstart_ps.html

    import pandas as pd
    import numpy as np
    import pyspark.pandas as ps
    from pyspark.sql import SparkSession

    s = ps.Series([1, 3, 5, np.nan, 6, 8])

I followed instructions how to include the library pyspark_pandas https://pypi.org/project/pyspark-pandas/ in a .zip File and copied it to an S3 bucket, and added under AWS Glue Job Details under Python library path It does not matter if I include the directory pyspark_pandas in the root folder of the pyspark_pandas.zip file, or if the files are already in the root folder of the .zip file, it does not change anything.

Can please someone support how I can achieve to import pyspark.pandas in a Glue job, or perhaps also tell me that it is nonsense what I am trying to do?

wkl3nk
gefragt vor einem Jahr927 Aufrufe
2 Antworten
2

Pandas API on Spark was introduced in Spark 3.2. You'll need to execute your job using AWS Glue 4.0, which comes with Spark 3.3.0.

Steps:

  1. Create a new Job using Spark script editor -- Note that AWS Glue interactive sessions is not yet available for AWS Glue 4.0
  2. Select "Create a new script with boilerplate code" (default)
  3. After the line job.init(), write your code. For example:
import pyspark.pandas as ps
import numpy as np

s = ps.Series([1, 3, 5, np.nan, 6, 8])
AWS
beantwortet vor einem Jahr
0

Take a look at this page : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#glue20-modules-provided

If you have a custom .py file which you are putting into a zip and uploading to Glue Job script locations , be sure to have the right path .

beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen