AWS Glue Connection with Secrets Manager Issue

2

I have a Glue Job for migrating data from a Postgres database in RDS to a S3 bucket in Parquet format and a Crawler that connects to Postgres to infer the table schema. Previously, the Glue Connection used by both was configured to authenticate to Postgres via Username and Password, however now I would like it to authenticate via credentials stored in Secrets Manager instead. After updating the Glue Connection to use Secrets Manager, the Glue Job is failing with the following error:

2022-12-14 14:58:07,092 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
  File "/tmp/parquet-job.py", line 25, in <module>
    database=glue_source_database, table_name=table, transformation_ctx="Datasource")
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py", line 787, in from_catalog
    return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 186, in create_dynamic_frame_from_catalog
    makeOptions(self._sc, additional_options), catalog_id),
  File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o75.getCatalogSource.
: java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:349)
	at scala.None$.get(Option.scala:347)
	at com.amazonaws.services.glue.util.DataCatalogWrapper.$anonfun$getJDBCConf$1(DataCatalogWrapper.scala:218)
	at scala.util.Try$.apply(Try.scala:209)
	at com.amazonaws.services.glue.util.DataCatalogWrapper.getJDBCConf(DataCatalogWrapper.scala:209)
	at com.amazonaws.services.glue.GlueContext.getGlueNativeJDBCSource(GlueContext.scala:487)
	at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:320)
	at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:185)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

This is the code for the Glue Job script:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['JOB_NAME', 'TABLE_NAME', 'PARQUET_LOCATION', 'PARQUET_TABLE_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

glue_source_database = "postgres-db"

table = args['TABLE_NAME'].replace("-", "_")

output_s3_path = args['PARQUET_LOCATION'] + \
    "/" + args['PARQUET_TABLE_NAME']

Datasource = glueContext.create_dynamic_frame.from_catalog(
    database=glue_source_database, table_name=table, transformation_ctx="Datasource") # the script fails at this line

print("Items Count:  ", Datasource.count())
Datasource.printSchema()

Transform = ApplyMapping.apply(frame=Datasource, mappings=[
    ("id", "int", "id", "int"),
    ("name", "string", "name", "string")],
    transformation_ctx="Transform")

DF = Transform.toDF()
formatted_path = output_s3_path.replace("_", "-")
DF.write.mode('overwrite').format(
    'parquet').save(formatted_path)

job.commit()

I have confirmed that the IAM role for AWS Glue has permission to access my secret. The Crawler is running successfully using the same connection as the Glue Job. I also tried reverting the connection to use Username and Password, and the job succeeded. Is there anything that I am missing? Could this possibly be a bug with AWS Glue?

kyleh
gefragt vor einem Jahr1701 Aufrufe
1 Antwort
2

Hi,

I see that you are facing the “NoSuchElementException: None.get” error when using Secrets Manager credentials over the database connection. Unfortunately getCatalogSource and getCatalogSink currently do not support Secret Managers connection which is resulting in this exception. There is already a feature request for the same. However, we do not have an ETA as to when this feature would be implemented. However as a workaround you can either use boto3 to retrieve credentials from secrets manager or use the username/password to your connection options.

I hope the above information helps.

Thank you.

AWS
SUPPORT-TECHNIKER
beantwortet vor einem Jahr
  • That makes sense. Hopefully that feature will be implemented soon. Thank you!

  • I get the same error when using from_options. Is it a similar issue?

    connection_options = {
            "useConnectionProperties": "true",
            "dbtable": table,
            "connectionName": connectionName,
        }
    
    return glueContext.create_dynamic_frame.from_options(
            connection_type='sqlserver',
            connection_options=connection_options
        )
    
  • Is there an ETA for this fix now ? I ran into the same issue recently and was wondering if something would change soon.

    Thanks a lot for the help

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen