1 réponse
- Le plus récent
- Le plus de votes
- La plupart des commentaires
0
You can use Spark section from this EMR best practices guide. Feel free to share here or create a specReq if customer has any specific question. Here are few basic things to keep in mind.
- Handle data skew
- Make sure there is no disk spill happening
- Optimal partition size to make sure not too many tasks are created
- Use the right data format for source and target (preferably parquet)
- Watch for excessive shuffle. Can be confirmed from Spark UI.
- Tune driver/executor size (memory, core) based on workload.
répondu il y a un an
Contenus pertinents
- demandé il y a 7 mois
- demandé il y a un an
- demandé il y a un an
- demandé il y a 3 mois
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans