Overview:
The Spark application in question is deployed within AWS Account A, specifically in the us-west-2 region.
This application reads data from and writes data to Amazon S3 buckets hosted in Account B, also located in the us-west-2 region.
Issue:
When the Spark application interacts with S3 buckets within the same AWS account (Account A), the job execution completes in approximately 1 hour and 10 minutes.
However, when the Spark application accesses S3 buckets residing in a different AWS account (Account B) within the same region (us-west-2), the job execution significantly increases to 4 hours and 40 minutes.
Expectation:
My understanding, based on AWS documentation and best practices, was that data transfer within the same AWS region, regardless of AWS account, would not incur additional latency or cost.
Request for Clarification:
I am seeking clarification on why such a significant performance discrepancy occurs when the Spark application accesses S3 buckets in a different AWS account within the same region.
Specifically, I would like to understand if there are any inherent limitations or factors that contribute to increased latency or reduced performance in this scenario.
I appreciate your assistance in addressing this matter and providing insights into potential reasons for the observed performance difference. Any recommendations or suggestions for optimizing performance in this multi-account scenario would be highly valuable.