Skip to content

How do I calculate Amazon Redshift Spectrum query charges?

6 minute read
0

I want to calculate and optimize costs when I use Amazon Redshift Spectrum to query Amazon Simple Storage Service (Amazon S3) data.

Short description

Prerequisites:

  • An Amazon Redshift cluster and S3 bucket that are in the same AWS Region.
  • An SQL client that's connected to your cluster to run SQL commands.

You're charged for the number of bytes that Redshift Spectrum scans from Amazon S3. You might incur additional charges depending on the Region. Redshift Spectrum rounds up byte numbers to the next megabyte, with a minimum of 10 MB per query. For more information, see Amazon Redshift pricing.

Note: Redshift Spectrum doesn't charge for Data Definition Language (DDL) statements, CREATE, ALTER, and DROP TABLE, that manage partitions and failed queries. Your Amazon Redshift Serverless compute capacity charges include all Amazon S3 external data queries.

Resolution

To calculate the estimated query cost and to get a summary of all S3 queries that you run in Redshift Spectrum, use the SVL_S3QUERY_SUMMARY table. The s3_scanned_bytes column returns the number of bytes that Redshift Spectrum scans from S3 and sends to the Redshift Spectrum layer.

Usage

Run the following query against SVL_S3QUERY_SUMMARY to determine the number of bytes that Amazon S3 transfers by queryID:

SELECT s3_scanned_bytes
FROM SVL_S3QUERY_SUMMARY
WHERE query=queryID;

Note: Replace queryID with your query's ID.

To determine the sum of all bytes scanned from S3, run the following query:

SELECT sum(s3_scanned_bytes)
FROM SVL_S3QUERY_SUMMARY;

You can also determine the sum of bytes for all Redshift Spectrum queries that you run in a specific time interval. The following example shows how to calculate the total bytes from queries that you started to run from the previous day:

SELECT sum(s3_scanned_bytes)
FROM SVL_S3QUERY_SUMMARY
WHERE starttime >= current_date-1;

If you run the following query against an S3 bucket in the US East (N. Virginia) Region, then Redshift Spectrum charges per terabyte. If the sum for s3_scanned_bytes returns 621,900,000,000 bytes when you query SVL_S3QUERY_SUMMARY, then you have 0.565614755032584 terabytes when you convert from bytes to terabytes.

Example query:

621900000000 bytes = 621900000000/1024 = 607324218.75 kilobytes
607324218.75 kilobytes = 607324218.75/1024 =  593090.057373046875 megabytes
593090.057373046875  megabytes =  593090.057373046875 /1024 = 579.189509153366089 gigabytes
579.189509153366089  gigabytes =  579.189509153366089/1024 = 0.565614755032584 terabytes

In the following example, your usage is approximately 0.5657 terabytes. To calculate the Redshift Spectrum usage cost, multiply the cost per terabyte:

$5 * 0.5657= $2.83

Run the following SQL query to calculate the Redshift Spectrum usage charges:

SELECT
   round(1.0*sum(s3_scanned_bytes/1024/1024/1024/1024),4) s3_scanned_tb,
   round(1.0*5*sum(s3_scanned_bytes/1024/1024/1024/1024),2) cost_in_usd 
FROM SVL_S3QUERY_SUMMARY;

The preceding example queries the charges in Redshift Spectrum against your S3 bucket for data that the query scanned from the previous day.

Note: All queries that scan up to 9.9 MB are rounded up and charged for 10 MB. There are no charges for failed or stopped queries.

Also, system log tables (STL) keep only 2–5 days of log history depending on log usage and available disk space. To keep a record of transferred bytes, it's a best practice to calculate the daily query charges and store the data in another table.

Example query:

CREATE VIEW spectrum_cost AS
SELECT starttime::date as date, xid, query, trim(usename) as user, 
  CASE WHEN s3_scanned_bytes < 10000000 then 10 ELSE s3_scanned_bytes/1024/1024 end as scanned_mb, 
  round(CASE WHEN s3_scanned_bytes < 10000000 then 10*(5.0/1024/1024) 
  ELSE (s3_scanned_bytes/1024/1024)*(5.0/1024/1024) end,5) as cost_$ 
FROM svl_s3query_summary s 
LEFT JOIN pg_user u ON userid=u.usesysid 
JOIN 
(select xid as x_xid,max(aborted) as x_aborted from svl_qlog group by xid) q 
ON s.xid=q.x_xid 
WHERE userid>1 AND x_aborted=0
AND s.starttime >= current_date-1;

Note: You can also use the CREATE TABLE query to calculate and store the data in another table. If you don't want to specify a time period, then remove current_date-1.

Calculate the total sum of data that a query scans from S3 to Redshift Spectrum from the day before. To calculate the total estimate of charges, run the following query:

SELECT current_date-1 as query_since, SUM(scanned_mb) as total_scanned_mb, SUM(cost_$) as total_cost_$
FROM spectrum_cost;
Result:
  query_since | total_scanned_mb | total_cost_$
--------------+------------------+---------------
 2020-05-15   |            5029  |      0.02515

Redshift Spectrum best practices

To reduce the query charges and improve Redshift Spectrum's performance, use the following best practices:

  • Use cost controls for Redshift Spectrum and concurrency scaling features to monitor and control your usage.
  • Use Optimized Data Formats to improve performance and lower costs. Use columnar data formats such as PARQUET and ORC to select only the columns you want to scan from Amazon S3.
  • If you infrequently access the data, then load the data in Amazon S3 and use Redshift Spectrum.
  • When you use multiple Amazon Redshift clusters to scale concurrency, stop the clusters as soon as the jobs are complete.

Cost controls and concurrency scaling for Redshift Spectrum

If you use the cost controls and concurrency scaling feature for Redshift Spectrum, then you can create daily, weekly, and monthly usage quotas. When you reach the usage quotas, Amazon Redshift automatically takes action.

To configure the cost control, complete the following steps:

  1. Open the Amazon Redshift console.
  2. Choose Configure usage limit.
  3. Update the following configuration settings:
    Time period
    Usage limit
    Action
    Note: The Action settings can help you manage your usage quotas.

To configure the concurrency scaling usage quota, complete the following steps:

  1. Open the Amazon Redshift console.
  2. Choose Concurrency scaling usage limit as your usage quota.
  3. Update the following configuration settings:
    Time period
    Usage limit
    Action
    Note: The Time period is in the UTC time zone. For the Action setting, you can also attach an Amazon Simple Notification Service (Amazon SNS) subscription to the Alert and Disable features. If you use the Amazon Redshift console to activate an alert, then an Amazon CloudWatch alarm is automatically created for the metrics.

Additional cost control requirements and limitations

To manage your Redshift Spectrum usage and cost, review the following requirements and limitations:

  • Usage quotas are available with supported versions 1.0.14677 or later.
  • You can add up to four quotas and actions for each category with a total quota of eight.
  • Only Regions where Redshift Spectrum and concurrency scaling are available support Redshift Spectrum.
  • Only one quota for each feature can use the Disable feature.
  • Usage quotas persist until you delete the usage quota definition or the cluster.
  • If you create a quota in the middle of a period, then the quota is measured from the middle point to the end of the period.
  • If you choose log options, then review the details in the STL_USAGE_CONTROL logs.

Related information

Setting usage limits, including setting RPU limits

SVL_QLOG

System tables and views reference

Best practices for Amazon Redshift Spectrum

AWS OFFICIALUpdated a year ago