Lake Formation table read performance

0

I'm evaluating Lake Formation and facing abysmally slow table reading numbers. For example, both wr.lakeformation.read_sql_table and wr.athena.read_sql_table need 50-60 sec to read a small 104x1000 table. For comparison, direct S3 parquet read wr.s3.read_parquet need only 3-4 sec to read the same table.

Are those numbers make any sense? Can I optimise Lake Formation read performance...

The performance test code is at https://gist.github.com/staskh/f25b52f97f6775d96992f9785b0e2019

asked a year ago271 views
1 Answer
0
Accepted Answer

In order to optimize performance, it is typically necessary to analyze the specific details of the data and query on a case-by-case basis. Identifying the performance bottlenecks is crucial before providing any optimization recommendations. Each scenario may have unique factors that contribute to performance issues, and a thorough assessment is necessary to determine the most effective optimization strategies. To better answer your question, we require details that are non-public information. Please open a support case with AWS using the following link.

Based on the information provided, it could make sense that the Athena query took longer time than a local query for smaller datasets. When querying through Athena, the query execution occurs remotely, involving additional steps and API calls. Additionally, Athena is designed to optimize performance for large-scale data scenarios by utilizing distributed computing, which introduces additional overhead compared to local processing on a single node. This overhead can become much more noticeable when working with smaller datasets.

AWS
JJ_L
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions