AWS Glue Connection with Customer managed Apache Kafka Data Source throws Status Code: 400; Error Code: InvalidInputException


I want to add Confluent Cloud Apache Kafka as a Data source in AWS ETL job to read data stream from Kafka topic.

I created a cluster, topic, AWS SQS source connector and AWS S3 sink connector in Confluent Cloud Kafka console. I was able to post message from AWS SQS to Confluent Kafka Topic and export data from Confluent Kafka to AWS S3 bucket. I am able to integrate Confluent Cloud Kafka with AWS SQS and S3.

Now I want to stream the data from Confluent Kafka Cluster to AWS Glue for ETL Transformation and save the output to Targeted s3 bucket.

I created a data connection in AWS Glue. I chose the connection type as "Kafka" and chose the "Customer managed Apache Kafka" option. Under the Kafka bootstrap URLs, I provided the bootstrao server that I saw under cluster settings in Confluent portal (host:9092). I unselected "Require SSL" and also set Authentication to None. I did not provide any values under Network Options. As I am creating this for testing purposes, I selected did not set the authentication/SSL.

I then proceeded to create a job from the data connection screen and selected the Kafka connection created and set the topic name. However when I run the job I see the below error -

jobname: KafkaJob and JobRunID:XXXX failed to execute with exception Unable to resolve any valid service connection (Service AWSGlueJobExecutor, Status Code 400, Error Code InvalidInputExcepion)

Am I missing any steps to setup the Kafka data connection?

1 回答

That should mean there is something wrong with the "Network options" that prevents Glue from using it, not having a valid VPC/subnet/SG or not having enough IPs

profile pictureAWS
已回答 6 个月前

您未登录。 登录 发布回答。