How do I store the output from an Amazon Athena query in a data format other than CSV?

2 minute read
1

I want to store Amazon Athena query results in a format other than CSV, such as JSON or Parquet.

Short description

Athena supports only CSV output files when you run SELECT queries. To store output from Athena in formats other than CSV, choose one of the following options:

The UNLOAD statement writes the output from a SELECT query in one of several different data formats, but does not create a new table. A CTAS query also writes output in different data formats, but creates a new table with that output. Both UNLOAD and CTAS support formats such as Apache Parquet, ORC, Apache Avro, and JSON.

Note: Both UNLOAD and CTAS support partitioning, but only CTAS supports bucketing.

Resolution

Run an UNLOAD query

To use the UNLOAD statement, put your SELECT query in parentheses after UNLOAD and specify the file type in a WITH statement. For example, the following query saves the output from SELECT as a JSON file in the Amazon Simple Storage Service (Amazon S3) bucket amzn-s3-example-bucket:

UNLOAD (SELECT col1, col2 FROM example_table)
    TO 's3://amzn-s3-example-bucket/example_folder/'
  WITH (format = 'JSON')

For more information on how to modify the file output from UNLOAD, see Parameters.

Run a CTAS query

To store the output from a CTAS query in a format other than CSV, configure the format property in a WITH statement. For example, the following query creates a new table called ctas-parquet-example from the data returned by SELECT in Parquet format:

`CREATE TABLE ctas_parquet_example   WITH (format = '`_`PARQUET`_`')     AS SELECT col1, col2, FROM example_table;`

For more information on CTAS parameters, see CTAS table properties.

If you need to keep only the output data file but not the new table, then drop the CTAS table after the query completes.

Related information

Considerations and limitations for CTAS queries

Examples of CTAS queries

AWS OFFICIAL
AWS OFFICIALUpdated a month ago