Best Approach to Move S3 Data to EC2 Postgres Database - Lambda or Glue?

0

I have a scenario where I need to move data from S3 to a Postgres database running on an EC2 instance. All of this is part of cdk app so I'm looking to add this as a step to the current step function. I'm looking for your opinion on what service you would given the scenario. I'm competent with Python programming and SQL as a data scientist/data engineer but fairly new on my AWS path. Here's the setup:

  1. I have 10 files in an S3 bucket, resulting from various data transformations.
  2. At the end of the process, these files are added to a database in the Glue Data Catalog (crawler).
  3. Now, I need to move 3 of these tables into a Postgres database running on an EC2 instance.

Initially, I leaned towards using AWS Glue for this task. However, I couldn't find clear guidance on how to move data from the Glue Data Catalog to an EC2-based Postgres database.

The files in question range from 50 to 600 MB in size, and they only need to be inserted into the EC2 database once a day. I've also considered using a Lambda function to read the files from S3 and insert them into the EC2 database, using pandas sqlalchemy works but I've occasionally faced connection/ timeout issues.

I'd appreciate suggestions on the most appropriate design patterns and resources for this scenario. Should I stick with Glue, explore Lambda further, or consider other options? Any insights would be greatly appreciated.

已提問 1 個月前檢視次數 260 次
1 個回答
2
已接受的答案

AWS Lambda Pros: Flexibility: Lambda functions can be written in multiple languages (e.g., Python, Node.js), offering flexibility in how you implement your data processing logic. Cost-effective for small jobs: Lambda is cost-effective for tasks that are lightweight and have short execution times (up to 15 minutes). Event-driven: Easily triggered by S3 events, making it suitable for real-time or near-real-time data processing needs.

Cons: Time and Memory Limits: Lambda functions have a maximum execution time of 15 minutes, which may not be suitable for large-scale data migrations or complex processing tasks. Memory allocation is also capped, which can limit processing capabilities for large datasets. Management Overhead: Managing a large number of Lambda functions or complex orchestration between them can become challenging.

AWS Glue Pros: Built for ETL: AWS Glue is a managed ETL service designed to easily prepare and transform data for analytics. It is more suitable for complex data processing workflows. Scalable: Glue can handle large volumes of data by scaling resources automatically. It's designed for jobs that exceed the time and compute limitations of Lambda. Integrated Data Catalog: Glue integrates with the AWS Glue Data Catalog, allowing for easier management of metadata and schema evolution over time. Visual ETL Job Creation: Glue Studio provides a visual interface to design and run ETL jobs, making it easier for users who prefer not to write code.

Cons: Cost: For small or infrequent jobs, Glue can be more expensive than Lambda due to its pricing model, which is based on Data Processing Units (DPUs) and job runtime. Initial Setup Complexity: Setting up Glue jobs can be more complex than deploying Lambda functions, especially for simple data movement tasks.

Decision Factors Data Volume and Complexity: If you're dealing with large datasets or complex transformations, Glue is better suited for the task. For lighter, simpler data movement, Lambda is more cost-effective. Processing Time: For tasks that can be completed within minutes, Lambda is sufficient. For long-running jobs, consider Glue. Orchestration Needs: If your data processing requires sophisticated orchestration or is part of a larger ETL workflow, Glue's managed service and integration with other AWS analytics services may offer advantages. Cost Sensitivity: For infrequent or small-scale jobs, Lambda may be more cost-effective, but always consider the total cost of operation, including development and maintenance efforts.

profile picture
專家
已回答 1 個月前
profile picture
專家
Artem
已審閱 1 個月前
profile picture
專家
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南