Kinesis Data transformation - using Lambda vs Glue

1

I am planning to move all the filtered logs from CloudWatch log group through Kinesis Firehose to an S3 bucket in parquet files. Given that CloudWatch log group always pushes gzipped data to Kinesis Firehose, I had to add a Lambda to unzip the data.

Now I am unsure, if the conversion of this filtered json data to Parquet should be done either by the Lambda (that is invoked to unzip the data) or should i convert it using the Glue table. Will it incur additional cost if I add AWS Glue to convert record format? Or if it is feasible to convert the data format in the Lambda itself? What are the pros and cons of using the either option?

I would appreciate some guidance on this.

2 Answers
1

Hi Neisha,

For streaming the filtered logs through Kinesis Firehose to an S3 bucket in parquet files, it's preferred to use Glue Table to convert your JSON input data into Parquet format, as Kinesis Firehose has a well-defined integration with Glue Tables for Schema specification and Record conversion [1].
If your input data is in a format other than JSON, then you can use your lambda function to convert it into JSON first.

Choosing the route of converting format using the Lambda function itself will require developing your own logic and code for format conversion, and it might need more computation duration. So, Glue is ideally a go-to choice for such conversion (especially when you are working with Kinesis Firehose, which has integrations).
Some examples of achieving this are mentioned here [2] [3] for your reference.

Regarding the cost, you will have to pay for the Data Catalog storage and requests. See the pricing here [4].

References:
[1] https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html
[2] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-kinesisfirehose-deliverystream.html#aws-resource-kinesisfirehose-deliverystream--examples
[3] https://catalog.us-east-1.prod.workshops.aws/workshops/2300137e-f2ac-4eb9-a4ac-3d25026b235f/en-US/lab-3-kdf/kinesis
[4] https://aws.amazon.com/glue/pricing/

Thanks,
Atul

profile picture
answered 7 months ago
0

Go for Glue as IBAtulAnand mentioned. Less overhead and no need to maintain the lambda

profile picture
EXPERT
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions