Glue job cannot download the hudi connector: 403 forbidden (IAM role has full access of EC2ContainerRegistry and Marketplace)

0

I follow this blog to try the hudi connect: Ingest streaming data to Apache Hudi tables using AWS Glue and Apache Hudi DeltaStreamer.

But when I started the glue job, I always got this error log:

2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Preparing layer url and gz file path to store layer 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c.
2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Getting the layer file 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c and store it as gz.
Traceback (most recent call last):
  File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 361, in <module>
    main()
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 351, in main
    res += download_jars_per_connection(conn, region, endpoint, proxy)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 304, in download_jars_per_connection
    download_and_unpack_docker_layer(ecr_url, layer["digest"], dir_prefix, http_header)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 168, in download_and_unpack_docker_layer
    layer = send_get_request(layer_url, header)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 80, in send_get_request
    response.raise_for_status()
  File "/home/spark/.local/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://prod-us-east-1-starport-layer-bucket.s3.us-east-1.amazonaws.com/6a636e-709825985650-a6bdf6d5-eba8-e643-536c-26147c8be5f0/84e9f346-bf80-4532-ac33-b00f5dbfa546?X-Amz-Security-Token=....Ks4HlEAQcC0PUIFipDGrNhcEAVTZQ%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20230328T123933Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=%2F20230328%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=c28f35ab3b3c
Glue ETL Marketplace - failed to download connector, activation script exited with code 1
LAUNCH ERROR | Glue ETL Marketplace - failed to download connector.Please refer logs for details.
Exception in thread "main" 
java.lang.Exception: Glue ETL Marketplace - failed to download connector.
	at com.amazonaws.services.glue.PrepareLaunch.downloadConnectorJar(PrepareLaunch.scala:1043)
	at com.amazonaws.services.glue.PrepareLaunch.com$amazonaws$services$glue$PrepareLaunch$$prepareCmd(PrepareLaunch.scala:759)
	at com.amazonaws.services.glue.PrepareLaunch$.main(PrepareLaunch.scala:42)
	at com.amazonaws.services.glue.PrepareLaunch.main(PrepareLaunch.scala)

I guess the root cause is:

  1. The Glue job cannot pull the connect image from AWS maketplace.
  2. The connector image cannot store into the S3 bucket.

So I try these methods:

  1. Give permissions to the IAM role of the job. I give AWSMarketplaceFullAccess, AmazonEC2ContainerRegistryFullAccess, AmazonS3FullAccess, I think these permissions are enough definitely.
  2. Make the S3 bucket public. I turned off the Block public access of the related S3 bucket.

But even I did these, I still got the same error. Can someone give any suggestions?

donglai
gefragt vor einem Jahr376 Aufrufe
1 Antwort
0
Akzeptierte Antwort

If using Glue 3 or later, nowadays, the best way to add support is just adding a parameter --datalake-formats=hudi and not depend on the marketplace connector
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html

profile pictureAWS
EXPERTE
beantwortet vor einem Jahr
  • Thank you for your prompt reply! This problem has been successfully resolved!

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen