Inquiry Regarding Spark Application Performance Discrepancy Across AWS Accounts in the Same Region

0

Overview: The Spark application in question is deployed within AWS Account A, specifically in the us-west-2 region. This application reads data from and writes data to Amazon S3 buckets hosted in Account B, also located in the us-west-2 region.

Issue: When the Spark application interacts with S3 buckets within the same AWS account (Account A), the job execution completes in approximately 1 hour and 10 minutes. However, when the Spark application accesses S3 buckets residing in a different AWS account (Account B) within the same region (us-west-2), the job execution significantly increases to 4 hours and 40 minutes.

Expectation: My understanding, based on AWS documentation and best practices, was that data transfer within the same AWS region, regardless of AWS account, would not incur additional latency or cost.

Request for Clarification: I am seeking clarification on why such a significant performance discrepancy occurs when the Spark application accesses S3 buckets in a different AWS account within the same region. Specifically, I would like to understand if there are any inherent limitations or factors that contribute to increased latency or reduced performance in this scenario.

I appreciate your assistance in addressing this matter and providing insights into potential reasons for the observed performance difference. Any recommendations or suggestions for optimizing performance in this multi-account scenario would be highly valuable.

Vikrant
gefragt vor 4 Monaten231 Aufrufe
1 Antwort
3

Hello,

This requires a deep analysis to find out where the bottleneck is. But, you can start with checking the spark UI and compare if the input/output datasets are same and any jobs that makes the significant difference in the job execution.

Further, you can also test a subset of same data to compare the execution time. Also, please check if they are utilizing the same resources/memory & spark configurations provisioned are identical.

AWS
SUPPORT-TECHNIKER
beantwortet vor 3 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen