Assist with build and install of prerequisite software for GeoPandas on Amazon Linux 2023 for Graviton
Arm processors have traditionally been used in mobile and embedded devices like smartphones and IoT due to their power efficiency. In recent years, Arm has been expanding into the enterprise and server markets which have been dominated by x86 processors from Intel and AMD. The first 64-bit server processor architecture ARMv8-A came out in 2012 and since that time companies like AppliedMicro, Cavium, Qualcomm, Amazon and Huawei have designed server chips based on ARM.
In 2018, Amazon announced its ARM-based Graviton processor for its cloud servers promising 45% better performance over x86 chips. With the successive generations of the chip, performance and efficiency has continued to grow. The latest Graviton 4 chips introduced in late 2023 continue to build on these numbers. Today, AWS offers more than 150 different Graviton-powered Amazon EC2 instance types globally at scale, has built more than 2 million Graviton processors, and has more than 50,000 customers using Graviton-based instances to achieve the best price performance for their applications.1 The key challenge for Arm in enterprise is the software ecosystem which is still highly optimized for x86 but growing for Arm.
Over time the software packages available for Arm continue to grow, but there are still areas where software is not yet packaged and immediately available in repositories from install. Recently a customer approached us for help with installing the Python library GeoPandas on Amazon Linux 2023 for Graviton. On x86 the software is compiled, packaged and installs with a single command from the PyPi repo. The same attempt on Graviton was failing due to prerequisite packages not being available.
One of our Principal Solutions Architects from the Sustainability and Graviton team, jumped in to help. Based on some research and experimentation he was able to create the following process to compile the required Proj and GDal prerequisites to install and use GeoPandas on Amazon Linux 2023.
#!/bin/bash
sudo dnf -y install gcc-c++ cpp sqlite-devel libtiff cmake python3-pip \
python-devel python3.11-devel openssl-devel tcl libtiff-devel libcurl-devel \
swig libpng-devel libjpeg-turbo-devel expat-devel
wget https://download.osgeo.org/proj/proj-9.3.1.tar.gz
tar zxvf proj-9.3.1.tar.gz
cd proj-9.3.1/
mkdir build
cd build
cmake ..
cmake --build . --parallel $(nproc)
sudo cmake --install . --prefix /usr
cd ~
wget https://github.com/OSGeo/gdal/releases/download/v3.8.3/gdal-3.8.3.tar.gz
tar xvzf gdal-3.8.3.tar.gz
cd gdal-3.8.3/
mkdir build
cd build
cmake -DGDAL_BUILD_OPTIONAL_DRIVERS=OFF -DOGR_BUILD_OPTIONAL_DRIVERS=OFF ..
cmake --build . --parallel $(nproc)
sudo cmake --install . --prefix /usr
cd ~
pip install geopandas
export GDAL_DATA=/usr/share/gdal
wget https://theunitedstates.io/districts/states/NY/shape.geojson
python3 -c "import geopandas as gp;df = gp.read_file('shape.geojson')"
With this procedure, our customer was able to install GeoPandas and launch EMR clusters based on Graviton instances saving cost with their data analytics workload.
Reach out If you run into similar issues with packages that are not yet available for Arm. We are interested in helping ensure your workloads can take advantage of all that Graviton has to offer.
Updated on 4/16/2024 -- for newer EMR AMIs that have Python 3.11 installed as well, the DNF install command needs python3.11-devel
added for the GDAL compile to complete.