Loading geopandas library in Glue job

0

I am trying to load geopandas library in Glue job , I have tried below approaches.

  1. Creating geopandas python wheel and adding it Job Details -->Libraries-->Python library path-->${S3_Path_to_geopandas_wheel}
  2. Creating geopandas library zip and loading it on script as below
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

#glueContext.add_file('s3://python-glue-lib/geopandas.zip')

# Specify the path to the ZIP archive containing geopandas
geopandas_path = 's3://python-glue-lib/geopandas.zip'

# Add the custom library to the Python path
sc.addPyFile(geopandas_path)
logger = glueContext.get_logger()
logger.warn(str(sys.path))
# Now you should be able to import geopandas in your script
import geopandas as gpd
  1. But when running the job still I am getting error as "No Module Found" geopandas
Ajit
已提問 4 個月前檢視次數 142 次
2 個答案
0

You don't have to do sc.addPyFile(geopandas_path) and I doubt that will work.
The best way to install it is adding a parameter --additional-python-modules=geopandas (or point to the wheel if you don't want to install from Pypi), that way it makes sure it install any dependencies needed. (I just tested it and installs correctly this way).
The Python library box is intended for your own modules (should work if you package the zip correctly AND doesn't require any native bindings)

profile pictureAWS
專家
已回答 4 個月前
0

I have tested with the sc.addPyFile(geopandas_path) and indeed does not work. The best way would be to use --additional-python-modules job parameter as specified by Gonzalo.

profile pictureAWS
支援工程師
Chaitu
已回答 4 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南