【以下的问题经过翻译处理】 你好,
我终于使用EMR无服务器应用程序解决了大部分问题,但根据日志显示,似乎在以下代码行中存在特定问题:
if (DeltaTable.isDeltaTable(spark, targetDeltaTableURI)):
日志中的错误消息如下所示,你可以在日志的顶部看到数据帧已创建,但在检查Delta表是否存在时出现错误。
|135123 |1670038 |13558561-12 |2018-03-19 |36.01 |2023-03-22 16:51:05.35525|
|668217 |1361372 |13563506-22 |2018-03-22 |27.19 |2023-03-22 16:51:05.35525|
+---------+-----------+------------+-------------------+---------+-------------------------+
only showing top 20 rows
Traceback (most recent call last):
File "/tmp/spark-9a694201-d321-475f-9855-7d4ada8da0e5/main.py", line 337, in <module>
main()
File "/tmp/spark-9a694201-d321-475f-9855-7d4ada8da0e5/main.py", line 135, in main
processRawData(spark, s3BucketName, objectName, schema, 'bmi')
File "/tmp/spark-9a694201-d321-475f-9855-7d4ada8da0e5/main.py", line 317, in processRawData
if (DeltaTable.isDeltaTable(spark, targetDeltaTableURI)):
File "/home/hadoop/environment/lib64/python3.7/site-packages/delta/tables.py", line 562, in isDeltaTable
return jvm.io.delta.tables.DeltaTable.isDeltaTable(jsparkSession, identifier)
TypeError: 'JavaPackage' object is not callable
以下是我的提交命令:
aws emr-serverless start-job-run --profile 000000_SD
--name nw_raw_data1
--application-id $APPLICATION_ID
--execution-role-arn $JOB_ROLE_ARN
--job-driver '{
"sparkSubmit": {
"entryPoint": "s3://'${S3_BUCKET}'/scripts/main.py",
"sparkSubmitParameters": "--jars s3://'${S3_BUCKET}'/scripts/pyspark_nw.tar.gz --py-files s3://'${S3_BUCKET}'/scripts/variables.ini --conf spark.driver.cores=1 --conf spark.driver.memory=2g --conf spark.executor.cores=4 --conf spark.executor.memory=4g --conf spark.executor.instances=2 --conf spark.archives=s3://'${S3_BUCKET}'/scripts/pyspark_nw.tar.gz#environment --conf spark.emr-serverless.driverEnv.PYSPARK_DRIVER_PYTHON=./environment/bin/python --conf spark.emr-serverless.driverEnv.PYSPARK_PYTHON=./environment/bin/python --conf spark.emr-serverless.executorEnv.PYSPARK_PYTHON=./environment/bin/python"
}
}'
--configuration-overrides '{
"monitoringConfiguration": {
"s3MonitoringConfiguration": {
"logUri": "s3://'${S3_BUCKET}'/logs/"
}
}
}'
以下是用于构建虚拟环境的pip软件包:
boto3==1.26.74
botocore==1.29.74
delta-spark
fhir.resources==6.5.0
pyspark==3.3.0
谢谢你的帮助。