Glue作业无法下载hudi连接器:403 forbidden(IAM角色具有EC2ContainerRegistry和Marketplace的完全访问权限)

0

【以下的问题经过翻译处理】 我通过阅读这篇博客尝试使用Hudi连接:使用AWS Glue和Apache Hudi DeltaStreamer直接往Apache Hudi表中摄取流数据

但是,当我启动Glue作业时,我总是得到以下错误日志:

2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Preparing layer url and gz file path to store layer 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c.
2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Getting the layer file 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c and store it as gz.
Traceback (most recent call last):
  File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 361, in <module>
    main()
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 351, in main
    res += download_jars_per_connection(conn, region, endpoint, proxy)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 304, in download_jars_per_connection
    download_and_unpack_docker_layer(ecr_url, layer["digest"], dir_prefix, http_header)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 168, in download_and_unpack_docker_layer
    layer = send_get_request(layer_url, header)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 80, in send_get_request
    response.raise_for_status()
  File "/home/spark/.local/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://prod-us-east-1-starport-layer-bucket.s3.us-east-1.amazonaws.com/6a636e-709825985650-a6bdf6d5-eba8-e643-536c-26147c8be5f0/84e9f346-bf80-4532-ac33-b00f5dbfa546?X-Amz-Security-Token=....Ks4HlEAQcC0PUIFipDGrNhcEAVTZQ%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20230328T123933Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=%2F20230328%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=c28f35ab3b3c
Glue ETL Marketplace - failed to download connector, activation script exited with code 1
LAUNCH ERROR | Glue ETL Marketplace - failed to download connector.Please refer logs for details.
Exception in thread "main" 
java.lang.Exception: Glue ETL Marketplace - failed to download connector.
	at com.amazonaws.services.glue.PrepareLaunch.downloadConnectorJar(PrepareLaunch.scala:1043)
	at com.amazonaws.services.glue.PrepareLaunch.com$amazonaws$services$glue$PrepareLaunch$$prepareCmd(PrepareLaunch.scala:759)
	at com.amazonaws.services.glue.PrepareLaunch$.main(PrepareLaunch.scala:42)
	at com.amazonaws.services.glue.PrepareLaunch.main(PrepareLaunch.scala)

我猜根本原因是:

  1. Glue Job 无法从Marketplace 上拉取 连接镜像
  2. 连接器镜像无法在S3中存储

所以 我尝试了以下方法:

  1. 给予Glue Job 执行角色赋予相应的角色. 我添加了AWSMarketplaceFullAccess, AmazonEC2ContainerRegistryFullAccess, AmazonS3FullAccess 这些权限, 且我相信这些权限肯定满足要求了
  2. 使S3 Bucket 变成可公共访问. 我关闭 了S3 的 Block public access 选项

但是, 即使我做了上面的操作, 我还是得到同样的错误. 有人可以给我点建议么?

profile picture
EXPERTE
gefragt vor 5 Monaten30 Aufrufe
1 Antwort
0

【以下的回答经过翻译处理】 如果使用Glue 3或更高版本,现在最好的支持方式是只需添加参数--datalake-formats=hudi,而不依赖于市场连接器。请参考以下链接:https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html

profile picture
EXPERTE
beantwortet vor 5 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen