Questions tagged with Amazon Redshift
Content language: English
Sort by most recent
Games digital distribution platform
I want to enter into digital game distribution services, similar to STEAM. I am new to all this. Need help and suggestions on how can I develop a steam-like platform where 1. There will be 2 interfaces - 1 for gamers, 1 for the game developer 2. Game codes to be stored in the cloud 3. User authentication will be needed 4. Game license management will be needed Thanks
Amazon Redshift concurrency scaling - How much time it takes to complete scaling and setting threshold to trigger it
Hi Team, I have an existing redshift cluster, where I want to enable concurrency scaling. I had few queries related to the same : 1. My cluster having 2 on demand ra3.4xlarge nodes is running since March 2021. AWS docs mentions that the running redshift cluster accrues 1 hour of free cluster usage credit every 24 hours which never expire. Does it mean that, my cluster would already have 18 months * 30 credit usage hours already accrued, since concurrency scaling was never enabled for this cluster.? 2. When the does the concurrency scaling feature kicks in ? Is it only when the queries starts getting queued up ? Can we define some kind of threshold like cpu %utilization or memory % utilization, which would automatically start the concurrency scaling process ? 3. How much time does it take for cluster to complete the autoscaling process and start serving queries ? Thanks!
Is Redshift mixing up my data columns when creating a model?
Hello, I'm using running: ``` create model predict_xxxxx from (select col1, col2, col3 from my_table) target col3 function predict_xxx iam_role 'arn:aws:iam::xxxxxxx:role/RedshiftML' problem_type regression objective 'mse' settings ( s3_bucket 'redshiftml-xxxxxxx', s3_garbage_collect off, max_runtime 1800 ); ``` Which then generates input data files in CSV format in the S3 bucket I specified, but when I open up those files and look at them, all the columns in my `select` statement are present, but the column headers are mismatched with the data below them. I see `col1` data under the `col2` column and so on. I know the data is mixed up because the data types and numeric ranges are different for each column. I double-checked my table and the columns and data are matched correctly. Is Redshift/Sagemaker then using that mismatched data to train the model? I have tried with only two column and it still gets mixed up. I've tried using a table instead of a select expression and the problem persists. Any insight is appreciated. Thanks, - SV
Migrating partitioned table from postgres to Redshift with pglogical
I've created a DMS task of CDC and Full Load, migrating data from postgres 14 to Redshift. According to the documentation, when using pglogical and creating postgres publication with 'publish_via_partition_root' parameter of my partitioned table, changes should be published to the parent table and to to child tables. However, the data is still migrated to the child tables in Redshift and not to the parent table. Am I missing something thats needs to be configured or is it just not possible in DMS?
How to see the Egress Cost
Hi, My company is migrating to AWS from On-perm. I'm doing a POC to see what will be my egress cost when all the accounts move from On-perm to AWS. I have created the existing data model in AWS Redshift and accessing the data through Azure Power BI. I want to see cost while using the import model vs direct query. I have couple of reports in Power BI service pointing to AWS Redshift tables. I couldn't get any decent information from AWS team about how much data is being transferred from AWS to Public internet. I tried cost explorer, Bill sections and didn't find any useful information. I really appreciate if someone can show me where i can get either data transfer or egress information Thanks, Miky
Need help to understand Redshift Serverless costs
I'm using Redshift Serverless to run some tests, but I don't understand how it's being billed. I'm still on the Free Tier $300, but already used almost $50 of those, and according to my calculations, the cost should be less than $2 so far. ![Enter image description here](/media/postImages/original/IM7KWpN_prSgqg3s9BsfEJZQ) I understand that Redshift Serverless is billed for RPU and Storage. But when I check the usage using: ``` select date_trunc('day', start_time) usage_date, sum(compute_seconds) total_compute_seconds, sum(compute_seconds)/(60*60) total_compute_hours, total_compute_hours*0.375 total_compute_cost from sys_serverless_usage group by date_trunc('day', start_time); ``` Result shows that the cost should be less than $2 so far: ![Enter image description here](/media/postImages/original/IM66nz2iuHQfesqSiwDkiArA) Storage doesn't seem to be the cost either, as the cost of it shows $0 so far, using: ``` SELECT date_trunc('day', start_time) usage_date, SUM((data_storage/(1024*1024*1024))*(datediff(s,start_time,end_time)/3600.0)) AS GB_hours, GB_hours / 720 AS GB_months, GB_months*0.024 AS storage_cost_day FROM sys_serverless_usage GROUP BY 1 ORDER BY 1; ``` I need help to understand where the money is going, if there is some fixed cost or how is it draining so fast? I also tried to find Redshift Serverless on the Billing section, but it doesn't seem to be there (maybe cause it's still under the Free Tier, but some services show up there even when cost is $0) Thanks in advance!
RedShift Serverless timeout when connecting with python
I tried to use redshift_connector to connect to my redshift cluster using python but I get a timeout error.. ``` import redshift_connector conn = redshift_connector.connect( host="default.XXXXXXXXX.us-east-1.redshift-serverless.amazonaws.com", database='dev', access_key_id="XXXXXXXXX", secret_access_key="XXXXXXX", port=5439, region="us-east-1" ) ``` Result : ``` redshift_connector.error.InterfaceError: ('communication error', TimeoutError(10060, ``` 1- I am using my default workspace which uses my default VPC which has open inbound rules for IP and ports 2- I enabled public access to the workspace wasted 3 hours on this and finaly used google bigquery ...
[Bug] AVG() function does not work in Redshift materialized view
If I create a materialized view using the AVG() function, it raises an error when I query data from that view. Meanwhile, the function works just fine on a normal view. I'm attaching below the query to replicate this bug. ![SQL query to replicate the bug](/media/postImages/original/IMaVOyoYEuS6SygJFWv_5gmw)