Best Approach to Move S3 Data to EC2 Postgres Database - Lambda or Glue?

0

I have a scenario where I need to move data from S3 to a Postgres database running on an EC2 instance. All of this is part of cdk app so I'm looking to add this as a step to the current step function. I'm looking for your opinion on what service you would given the scenario. I'm competent with Python programming and SQL as a data scientist/data engineer but fairly new on my AWS path. Here's the setup:

  1. I have 10 files in an S3 bucket, resulting from various data transformations.
  2. At the end of the process, these files are added to a database in the Glue Data Catalog (crawler).
  3. Now, I need to move 3 of these tables into a Postgres database running on an EC2 instance.

Initially, I leaned towards using AWS Glue for this task. However, I couldn't find clear guidance on how to move data from the Glue Data Catalog to an EC2-based Postgres database.

The files in question range from 50 to 600 MB in size, and they only need to be inserted into the EC2 database once a day. I've also considered using a Lambda function to read the files from S3 and insert them into the EC2 database, using pandas sqlalchemy works but I've occasionally faced connection/ timeout issues.

I'd appreciate suggestions on the most appropriate design patterns and resources for this scenario. Should I stick with Glue, explore Lambda further, or consider other options? Any insights would be greatly appreciated.

asked a month ago247 views
1 Answer
2
Accepted Answer

AWS Lambda Pros: Flexibility: Lambda functions can be written in multiple languages (e.g., Python, Node.js), offering flexibility in how you implement your data processing logic. Cost-effective for small jobs: Lambda is cost-effective for tasks that are lightweight and have short execution times (up to 15 minutes). Event-driven: Easily triggered by S3 events, making it suitable for real-time or near-real-time data processing needs.

Cons: Time and Memory Limits: Lambda functions have a maximum execution time of 15 minutes, which may not be suitable for large-scale data migrations or complex processing tasks. Memory allocation is also capped, which can limit processing capabilities for large datasets. Management Overhead: Managing a large number of Lambda functions or complex orchestration between them can become challenging.

AWS Glue Pros: Built for ETL: AWS Glue is a managed ETL service designed to easily prepare and transform data for analytics. It is more suitable for complex data processing workflows. Scalable: Glue can handle large volumes of data by scaling resources automatically. It's designed for jobs that exceed the time and compute limitations of Lambda. Integrated Data Catalog: Glue integrates with the AWS Glue Data Catalog, allowing for easier management of metadata and schema evolution over time. Visual ETL Job Creation: Glue Studio provides a visual interface to design and run ETL jobs, making it easier for users who prefer not to write code.

Cons: Cost: For small or infrequent jobs, Glue can be more expensive than Lambda due to its pricing model, which is based on Data Processing Units (DPUs) and job runtime. Initial Setup Complexity: Setting up Glue jobs can be more complex than deploying Lambda functions, especially for simple data movement tasks.

Decision Factors Data Volume and Complexity: If you're dealing with large datasets or complex transformations, Glue is better suited for the task. For lighter, simpler data movement, Lambda is more cost-effective. Processing Time: For tasks that can be completed within minutes, Lambda is sufficient. For long-running jobs, consider Glue. Orchestration Needs: If your data processing requires sophisticated orchestration or is part of a larger ETL workflow, Glue's managed service and integration with other AWS analytics services may offer advantages. Cost Sensitivity: For infrequent or small-scale jobs, Lambda may be more cost-effective, but always consider the total cost of operation, including development and maintenance efforts.

profile picture
EXPERT
answered a month ago
profile picture
EXPERT
Artem
reviewed a month ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions