Choosing the Best Workflow for Indexing Jobs: AWS Batch vs. ECS

0

Description: We are evaluating the most suitable AWS service for automating our indexing job workflow. Our workflow involves processing messages received from an application, loading data from S3 into Redis, indexing data into Elasticsearch using multi-threaded Python code, and triggering an API upon completion. We are considering AWS Batch and AWS ECS as potential solutions and seek guidance on which service would be the best fit for our use case in terms of performance, scalability, and cost-effectiveness.

Need an idea which is Best approch triggering ECS task with event received from kafka with lumtiple steps in state machine (we get max around 30 messages per day) or ddeploying batch jobs?

if ECS, can we provide a entire event of kafka message as JSON to start of a contaoner as command ? is that right approach? Key Points to Address:

Overview of the Workflow: Describe the steps involved in the indexing job workflow, including message processing, data loading, indexing, and API triggering. AWS Batch Overview: Explain how AWS Batch can be utilized for job scheduling, execution, and management. Discuss how AWS Batch can handle dependencies between jobs and manage resource provisioning. AWS ECS Overview: Describe how AWS ECS can be used to run containerized applications, such as the Python code for indexing. Highlight ECS features like task definitions, service scaling, and integration with other AWS services. Evaluation Criteria: Performance: Evaluate the performance of AWS Batch and ECS in handling our multi-step indexing workflow. Consider factors like job execution time, resource utilization, and scalability. Scalability: Assess the scalability of both services in terms of handling increasing message loads and adapting to changing workload demands. Cost-effectiveness: Compare the cost implications of using AWS Batch versus ECS for our workflow. Consider factors like pricing models, resource provisioning, and optimization opportunities. Resource Utilization and Efficiency: Discuss how each service manages resource utilization and ensures efficient processing of indexing tasks. Consider aspects like CPU and memory allocation, containerization overhead, and optimization strategies. Handling Dependencies and Workflow Orchestration: Evaluate the ability of AWS Batch and ECS to handle dependencies between workflow steps, such as data loading before indexing. Discuss how workflow orchestration can be achieved using native features or integrations with other AWS services like Step Functions. Integration with Other AWS Services: Explore how AWS Batch and ECS integrate with other AWS services required for our workflow, such as S3, Redis, and Elasticsearch. Assess the ease of integration, data transfer efficiency, and compatibility with existing infrastructure. Cost Comparison: Provide a detailed cost comparison between AWS Batch and ECS for our specific use case, considering factors like compute costs, storage costs, and data transfer fees. Include potential cost-saving strategies and optimization tips for each service. Best Practices and Recommendations: Based on the evaluation criteria, offer recommendations on whether AWS Batch or ECS is the preferred choice for our indexing job workflow. Provide best practices for optimizing performance, scalability, and cost-effectiveness when using the selected service.

2 Answers
1

For architectural choice, you can consider well architected principles as criteria. Performance and Scalability, Cost Effectiveness, Resource Utilization and Efficiency, Dependencies with any other systems and their integrations. Even though both ECS and Batch can be a solution. Batch appears more reasonable with intermittent nature of your workload, handling dependencies between jobs efficiently and dynamically scaling resources as needed. you can consider using step functions as well. Additionally you should consider spot instances where appropriate to reduce cost, CloudWatch for monitoring job progress and resource utilization. you can visit this blog to see as a case study https://aws.amazon.com/blogs/compute/building-high-throughput-genomic-batch-workflows-on-aws-batch-layer-part-3-of-4/

profile pictureAWS
EXPERT
SriniV
answered 13 days ago
  • Hi Srini,

    Thank you for bringing up this topic.

    We receive a single event per hour, which serves as a trigger to check for incremental updates. Occasionally, we may need to initiate a full rebuild job, which could potentially span more than 9 hours. Additionally, there might be scenarios where we need to run parallel jobs-partial rebuild.

    My concern primarily revolves around understanding the differences in performance, technical perspective, and service behavior when comparing the execution of tasks for each step versus triggering a job.

    Could you please provide insights into any core differences in executing a task for each step versus triggering a job? Also, I'm interested in understanding any potential impacts on performance or service behavior, especially since we are spinning up Fargate tasks for both scenarios.

    Looking forward to your insights.

0

In your specific use case, (where you have a single event per hour triggering incremental updates and the potential need for full rebuild jobs spanning more than 9 hours) it appears triggering jobs using Batch may be the more suitable approach, Batch can handle the dependencies between jobs, ensuring that the full rebuild job runs only when necessary. It can also manage the parallel jobs for partial rebuilds,scaling compute resources as needed.

profile pictureAWS
EXPERT
SriniV
answered 13 days ago
  • Hi SriniV,

    1. Can we initiate the job externally, such as triggering it from a Step Function that receives input in JSON format and passes it to a container, which starts the job based on the input? If yes, Any reference please.
    2. Once the job is completed, since we're using Fargate, I believe it follows a pay-as-you-go model, right?
    3. Batch can handle dependencies between jobs, ensuring that the full rebuild job runs only when necessary!!. Could you provide a scenario illustrating this? We have dependencies on data stored in S3 within the script running inside the container, but not direct dependencies.
    4. It can also manage parallel jobs for partial rebuilds, scaling compute resources as needed!!. Is there any documentation available on this feature?

    5)Also what would be the major difference if we design this using ECS?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions