How to query large parquet file

0

Hi ,

We have a large parquet file which is stored in S3 storage. It contains the data for 6 months of logs or even more. As as service we need to run a query to fetch the user data from parquet file. It has to be done end-to-end using aws data apis.

I have explored the redshift but the challenge is data loading from s3 to redshift needs additional effort such as provisioning a cluster and copying data from S3 to redshift data base.

I have the following query.

  1. Redshift provisioning can be done using aws data APIs ?
  2. How to run the query to fetch user data from redshift database. Is any data apis available for this?
  3. Is there any better aws service is available other than redshift ?

Regards, Ashok

질문됨 일 년 전218회 조회
1개 답변
1

You should run a Glue crawler against the S3 location of your parquet dataset. Then you will be able to query the data using Athena. You can use the console for Athena or APIs to run your queries.

AWS
Don_D
답변함 일 년 전
profile pictureAWS
전문가
Chris_G
검토됨 일 년 전
  • Thank you for reply. What are the challenge in using the redshift serverless.?

  • Redshift Serverless is an option, you could use it to query the data in s3 by running the crawler and then create an external schema. If you need better performance you could COPY it in the serverless endpoint. There are cost differences with Athena and Redshift Serverless that should be compared.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠