How to query large parquet file

0

Hi ,

We have a large parquet file which is stored in S3 storage. It contains the data for 6 months of logs or even more. As as service we need to run a query to fetch the user data from parquet file. It has to be done end-to-end using aws data apis.

I have explored the redshift but the challenge is data loading from s3 to redshift needs additional effort such as provisioning a cluster and copying data from S3 to redshift data base.

I have the following query.

  1. Redshift provisioning can be done using aws data APIs ?
  2. How to run the query to fetch user data from redshift database. Is any data apis available for this?
  3. Is there any better aws service is available other than redshift ?

Regards, Ashok

已提问 1 年前217 查看次数
1 回答
1

You should run a Glue crawler against the S3 location of your parquet dataset. Then you will be able to query the data using Athena. You can use the console for Athena or APIs to run your queries.

AWS
Don_D
已回答 1 年前
profile pictureAWS
专家
Chris_G
已审核 1 年前
  • Thank you for reply. What are the challenge in using the redshift serverless.?

  • Redshift Serverless is an option, you could use it to query the data in s3 by running the crawler and then create an external schema. If you need better performance you could COPY it in the serverless endpoint. There are cost differences with Athena and Redshift Serverless that should be compared.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则