2 Answers
- Newest
- Most votes
- Most comments
2
Pandas API on Spark was introduced in Spark 3.2. You'll need to execute your job using AWS Glue 4.0, which comes with Spark 3.3.0.
Steps:
- Create a new Job using Spark script editor -- Note that AWS Glue interactive sessions is not yet available for AWS Glue 4.0
- Select "Create a new script with boilerplate code" (default)
- After the line
job.init()
, write your code. For example:
import pyspark.pandas as ps import numpy as np s = ps.Series([1, 3, 5, np.nan, 6, 8])
answered a year ago
0
Take a look at this page : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#glue20-modules-provided
If you have a custom .py file which you are putting into a zip and uploading to Glue Job script locations , be sure to have the right path .
answered a year ago
Relevant content
- asked 2 years ago
- asked a year ago
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago