How to query large parquet file

0

Hi ,

We have a large parquet file which is stored in S3 storage. It contains the data for 6 months of logs or even more. As as service we need to run a query to fetch the user data from parquet file. It has to be done end-to-end using aws data apis.

I have explored the redshift but the challenge is data loading from s3 to redshift needs additional effort such as provisioning a cluster and copying data from S3 to redshift data base.

I have the following query.

  1. Redshift provisioning can be done using aws data APIs ?
  2. How to run the query to fetch user data from redshift database. Is any data apis available for this?
  3. Is there any better aws service is available other than redshift ?

Regards, Ashok

posta un anno fa230 visualizzazioni
1 Risposta
1

You should run a Glue crawler against the S3 location of your parquet dataset. Then you will be able to query the data using Athena. You can use the console for Athena or APIs to run your queries.

AWS
Don_D
con risposta un anno fa
profile pictureAWS
ESPERTO
Chris_G
verificato un anno fa
  • Thank you for reply. What are the challenge in using the redshift serverless.?

  • Redshift Serverless is an option, you could use it to query the data in s3 by running the crawler and then create an external schema. If you need better performance you could COPY it in the serverless endpoint. There are cost differences with Athena and Redshift Serverless that should be compared.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande