Kinesis Data transformation - using Lambda vs Glue

1

I am planning to move all the filtered logs from CloudWatch log group through Kinesis Firehose to an S3 bucket in parquet files. Given that CloudWatch log group always pushes gzipped data to Kinesis Firehose, I had to add a Lambda to unzip the data.

Now I am unsure, if the conversion of this filtered json data to Parquet should be done either by the Lambda (that is invoked to unzip the data) or should i convert it using the Glue table. Will it incur additional cost if I add AWS Glue to convert record format? Or if it is feasible to convert the data format in the Lambda itself? What are the pros and cons of using the either option?

I would appreciate some guidance on this.

Neisha
已提問 7 個月前檢視次數 538 次
2 個答案
1

Hi Neisha,

For streaming the filtered logs through Kinesis Firehose to an S3 bucket in parquet files, it's preferred to use Glue Table to convert your JSON input data into Parquet format, as Kinesis Firehose has a well-defined integration with Glue Tables for Schema specification and Record conversion [1].
If your input data is in a format other than JSON, then you can use your lambda function to convert it into JSON first.

Choosing the route of converting format using the Lambda function itself will require developing your own logic and code for format conversion, and it might need more computation duration. So, Glue is ideally a go-to choice for such conversion (especially when you are working with Kinesis Firehose, which has integrations).
Some examples of achieving this are mentioned here [2] [3] for your reference.

Regarding the cost, you will have to pay for the Data Catalog storage and requests. See the pricing here [4].

References:
[1] https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html
[2] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-kinesisfirehose-deliverystream.html#aws-resource-kinesisfirehose-deliverystream--examples
[3] https://catalog.us-east-1.prod.workshops.aws/workshops/2300137e-f2ac-4eb9-a4ac-3d25026b235f/en-US/lab-3-kdf/kinesis
[4] https://aws.amazon.com/glue/pricing/

Thanks,
Atul

profile picture
已回答 7 個月前
0

Go for Glue as IBAtulAnand mentioned. Less overhead and no need to maintain the lambda

profile picture
專家
已回答 7 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南