Inquiry Regarding Spark Application Performance Discrepancy Across AWS Accounts in the Same Region

0

Overview: The Spark application in question is deployed within AWS Account A, specifically in the us-west-2 region. This application reads data from and writes data to Amazon S3 buckets hosted in Account B, also located in the us-west-2 region.

Issue: When the Spark application interacts with S3 buckets within the same AWS account (Account A), the job execution completes in approximately 1 hour and 10 minutes. However, when the Spark application accesses S3 buckets residing in a different AWS account (Account B) within the same region (us-west-2), the job execution significantly increases to 4 hours and 40 minutes.

Expectation: My understanding, based on AWS documentation and best practices, was that data transfer within the same AWS region, regardless of AWS account, would not incur additional latency or cost.

Request for Clarification: I am seeking clarification on why such a significant performance discrepancy occurs when the Spark application accesses S3 buckets in a different AWS account within the same region. Specifically, I would like to understand if there are any inherent limitations or factors that contribute to increased latency or reduced performance in this scenario.

I appreciate your assistance in addressing this matter and providing insights into potential reasons for the observed performance difference. Any recommendations or suggestions for optimizing performance in this multi-account scenario would be highly valuable.

Vikrant
asked 3 months ago220 views
1 Answer
3

Hello,

This requires a deep analysis to find out where the bottleneck is. But, you can start with checking the spark UI and compare if the input/output datasets are same and any jobs that makes the significant difference in the job execution.

Further, you can also test a subset of same data to compare the execution time. Also, please check if they are utilizing the same resources/memory & spark configurations provisioned are identical.

AWS
SUPPORT ENGINEER
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions