Questions tagged with Amazon Redshift

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

I have a redshift table that I would like to grant access to a user, if I run: GRANT SELECT ON ALL TABLES IN SCHEMA schema_name to user_name; it works, however, if I run: GRANT SELECT ON schema_name.table_name to user_name; I got the following error: ERROR: Operation not supported on external tables Any ideas why? Thanks!
0
answers
0
votes
7
views
asked 17 hours ago
I have an array which is stored inside s3 bucket that looks like ``` [ { "bucket_name": "ababa", "bucket_creation_date": "130999", "additional_data": { "bucket_acl": [ { "Grantee": { "DisplayName": "abaabbb", "ID": "abaaaa", "Type": "CanonicalUser" }, "Permission": "FULL_CONTROL" } ], "bucket_policy": { "Version": "2012-10-17", "Id": "abaaa", "Statement": [ { "Sid": "iddd", "Effect": "Allow", "Principal": { "Service": "logging.s3.amazonaws.com" }, "Action": "s3:PutObject", "Resource": "aarnnn" }, { "Effect": "Deny", "Principal": "*", "Action": [ "s3:GetBucket*", "s3:List*", "s3:DeleteObject*" ], "Resource": [ "arn:aws:s3:::1111-aaa/*", "arn:aws:s3:::1111-bbb" ], "Condition": { "Bool": { "aws_SecureTransport": "false" } } } ] }, "public_access_block_configuration": { "BlockPublicAcls": true, "IgnorePublicAcls": true, "BlockPublicPolicy": true, "RestrictPublicBuckets": true }, "website_hosting": {}, "bucket_tags": [ { "Key": "keyyy", "Value": "valueee" } ] }, "processed_data": {} }, ....................... ] ``` NOTE- some of the field may be string/array/struct based on the data we get(eg actions can be array or string) END GOAL- I want to query inside this data and look for multiple conditions and then create a field inside processed_data and set it to true/false based on the query using AWS Glue Example- For each object inside the array, i want to check : ``` 1- if bucket_acl has grantee.type=CanonicalUser and Permission=FULL_CONTROL AND 2- if bucket_policy has statement that contains Effect=Allow and Principal=* and Action = ...... and Resources = ...... and condition is empty AND 3- website_hosting is empty and then create a field inside processes_data and set it to true if the above query satisfies eg- processed_data:{ isPublic: True} ``` Approaches I Tried: 1- I tried saving the data in s3 bucket in parquet format using aws-wrangler/aws-pandas for faster querying and then getting the data in aws glue using glue dynamic frame: ``` S3bucket_node1 = glueContext.create_dynamic_frame.from_options( format_options={}, connection_type="s3", format="parquet", connection_options={"paths": ["s3://abaabbb/abaaaaa/"], "recurse": True}, transformation_ctx="S3bucket_node1", ) S3bucket_node1.printSchema() S3bucket_node1.show() ``` Output: ``` root |-- bucket_name: string |-- bucket_creation_date: string |-- additional_data: string |-- processed_data: string {"bucket_name": "abaaaa", "bucket_creation_date": "139999", "additional_data": "{'bucket_acl': [{'Grantee': {'DisplayName': 'abaaaaaa', 'ID': 'abaaa', 'Type': 'CanonicalUser'}, 'Permission': 'FULL_CONTROL'}], 'bucket_policy': {}, 'public_access_block_configuration': {'BlockPublicAcls': True, 'IgnorePublicAcls': True, 'BlockPublicPolicy': True, 'RestrictPublicBuckets': True}, 'website_hosting': {}, 'bucket_tags': []}", "processed_data": "{}"} ``` Getting everything as string, seems like most of these libraries doesn't support nested data types 2- Tried saving the data as it is(in json) using put object API and then getting the data in aws glue using glue dynamic frame: ``` piece1 = glueContext.create_dynamic_frame.from_options( format_options={"multiline": True}, connection_type="s3", format="json", connection_options={"paths": ["s3://raghav-test-df/raghav3.json"], "recurse": True}, transformation_ctx="S3bucket_node1", ) piece1.printSchema() piece1.show() piece1.count() ``` Output: ``` root 0 ``` Getting no schema and count as 0 3- Tried getting the data using spark data frame: ``` sparkDF=spark.read.option("inferSchema", "true").option("multiline", "true").json("s3://ababa/abaa.json") sparkDF.printSchema() sparkDF.count() sparkDF.show() ``` Output- ``` root |-- additional_data: struct (nullable = true) | |-- bucket_acl: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- Grantee: struct (nullable = true) | | | | |-- DisplayName: string (nullable = true) | | | | |-- ID: string (nullable = true) | | | | |-- Type: string (nullable = true) | | | |-- Permission: string (nullable = true) | |-- bucket_policy: struct (nullable = true) | | |-- Id: string (nullable = true) | | |-- Statement: array (nullable = true) | | | |-- element: struct (containsNull = true) | | | | |-- Action: string (nullable = true) | | | | |-- Condition: struct (nullable = true) | | | | | |-- Bool: struct (nullable = true) | | | | | | |-- aws:SecureTransport: string (nullable = true) | | | | | |-- StringEquals: struct (nullable = true) | | | | | | |-- AWS:SourceAccount: string (nullable = true) | | | | | | |-- AWS:SourceArn: string (nullable = true) | | | | | | |-- aws:PrincipalAccount: string (nullable = true) | | | | | | |-- s3:x-amz-acl: string (nullable = true) | | | | |-- Effect: string (nullable = true) | | | | |-- Principal: string (nullable = true) | | | | |-- Resource: string (nullable = true) | | | | |-- Sid: string (nullable = true) | | |-- Version: string (nullable = true) | |-- bucket_tags: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- Key: string (nullable = true) | | | |-- Value: string (nullable = true) | |-- public_access_block_configuration: struct (nullable = true) | | |-- BlockPublicAcls: boolean (nullable = true) | | |-- BlockPublicPolicy: boolean (nullable = true) | | |-- IgnorePublicAcls: boolean (nullable = true) | | |-- RestrictPublicBuckets: boolean (nullable = true) |-- bucket_creation_date: string (nullable = true) |-- bucket_name: string (nullable = true) ``` Getting the schema and correct count, but some of the field has different data types(eg actions can be string or array) and spark makes them default to string, i think querying the data based on multiple conditions using sql will be too complex Do i need to change the approach or something else, i am stuck here Can someone please help in achieving the end goal?
0
answers
0
votes
18
views
asked a day ago
Hi All, In one of our Redshift database most of the columns being stored as length 500 bytes even it holds real data of length less than 50 bytes or smaller. Understanding was that , as Redshift compresses the columns so the physical disk storage will get rid of those blank Avg 450 bytes(500bytes-50bytes) spaces automatically and only consume the compressed version of ~50 bytes of actual data, so it wont harm us anyway even we define larger length for a column. However after seeing below doc,it seems the assumption is simply wrong. It says, during query execution Redshift parks the intermediate results in temporary table format in each compute nodes memory and that data is stored as uncompressed so it will consume the defined data type size rather the actual data length or compressed data length. https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-smallest-column-size.html Want to understand from experts here , 1) If its a real issue and if there exists any way to OR any ready-made dictionary/system table/views exists which logs all such tables/columns which we need to fix to avoid performance overhead? 2)And also we ran some sample query to test the above behaviour and verified results from SVL_QUERY_REPORT and SVL_QUERY_SUMMARY, it shows 75% degradation in performance. Not sure if its the correct way of checking? or any other possible way to verify the performance (or DB resource usage like CPU, I/O, Memory consumption) for a query execution in Redshift. Please guide me here.
2
answers
0
votes
24
views
asked 3 days ago
Until about two days ago, I used to be able to see queries that are currently executing (filtering by running is always empty). Now, no matter how much activity is occurring, queries don't show up until after they are finished. Querying tables like `stv_inflight` and `stv_query_metrics` does allow to get the information I need, but it was nice when it was all available on the console. Is this a bug or did something possibly change on my side?
1
answers
0
votes
23
views
Jon
asked 4 days ago
Pretty basic newbie redshift question here. Wanting to upload historical test data to a redshift database and the data as we have it is in multiple csv's and is formatted in typical table format with columns set as like this: | Time | Var1 | Var2 | Var3 | | 0 | 2 | 3 | 4 | | 1 | 4 | 5 | 6 | | 2 | 6 | 7 | 8 | In the past for other projects we've had data imported to redshift using COPY from s3 and the data was already formatted like below, which I've read is more efficient for querying in redshift: | Time | Var | Value | | 0 | Var1 | 2 | | 0 | Var2 | 3 | | 0 | Var3 | 4 | | 1 | Var3 | 4 | | 1 | Var2 | 5 | | 1 | Var3 | 6 | | 2 | Var1 | 6 | | 2 | Var2 | 7 | | 2 | Var3 | 8 | For this new data do we need to be transforming it to match the columnar setup before uploading to redshift? Are there situations where this reformatting is not necessary?
2
answers
0
votes
32
views
asked 8 days ago
Following this blog: https://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html, it states "A database snapshot is created within a transaction on the first occurrence of most SELECT statements, DML commands such as COPY, DELETE, INSERT, UPDATE". We have a requirement for Redshift snapshot isolation to create a snapshot also on RENAME commands. Is this supported? If not, how to approach this if we have concurrent transactions with RENAME commands.
1
answers
0
votes
20
views
profile picture
AWS
asked 9 days ago
I found SLA for Redshift (clustered) https://aws.amazon.com/redshift/sla/ but I can't find anything related to Serverless. Is it the same as Redshift multi-node? Could you help me with it?
1
answers
0
votes
26
views
asked 11 days ago
According to the docs [HERE](https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-security.html), I can't understand why should I associate multiple role with a namespace? Anyone know, please help ``` You can associate multiple roles to a namespace using the console, as described previously in this section. You can also use the API command CreateNamespace, or the CLI command create-namespace. With the API or CLI command, you can assign IAM roles to the namespace by populating IAMRoles with one or more roles. Specifically, you add ARNs for specific roles to the collection. ```
Accepted AnswerAmazon Redshift
2
answers
0
votes
21
views
hai
asked 11 days ago
I have a materialized view with autorefresh set to on. When I script it out, it doesn't show the argument `AUTO REFRESH YES`. select autorefresh,* from STV_MV_INFO; ![Enter image description here](/media/postImages/original/IMONWe0B69SXyyHUx4NZwOZQ) It shows this when i right click on the proc and click view definition: `CREATE MATERIALIZED VIEW globaldata.vwdeviceplatform AS select xyz..` it should show this: `CREATE MATERIALIZED VIEW globaldata.vwdeviceplatform AUTO REFRESH YES as select xyz...` Also strangely, getting the viewdef through here also doesn't script out the autorefresh portion SELECT pg_catalog.pg_get_viewdef('globaldata.vwdeviceplatform'::regclass::oid, true); ; I'm signed to AWS Editor V2 via Okta and I'm using a role, as opposed to a direct redshift login/pw. Is this just a limitation?
1
answers
0
votes
35
views
asked 16 days ago
We are trying to restore a snapshot of a redshift cluster from one account into a new account. Accounts are unlinked. Snapshot process and copy works fine. When we try to restore the snapshot on the new account it is requiring access to to the Redshift KMS key from the source account. *The source cluster for this snapshot is encrypted. When you restore, the target database will be encrypted as well. You can't unencrypt the data as part of the restore operation. After you restore, you can change encryption settings.* Here it is prompting for access to the source key. I am not sure how, or if it is possible to give the new/destination account or my user access to the key from the old source account. The source Redshift cluster uses a AWS managed aws/redshift key from the source account. We want to bring the whole snapshot, redshift users and all, not just the data.
2
answers
0
votes
38
views
asked 17 days ago
one of our users is getting an error occasionally on a view that has autorefresh on. the name of the table is `vwdeviceplatform`. Is it because the view is being autofreshed at the time they are running the query? Any way to avoid that? We don't want to schedule manual refreshes. ` Invalid operation. Relation "mv_tbl_vwdeviceplatform_0" does not exist. ` User is using dbeaver
1
answers
0
votes
45
views
asked 18 days ago
I'm writing into redshift and realized Glue 4.0 is probably optimizing the column sizes. Summary of error: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o236.pyWriteDynamicFrame. : java.sql.SQLException: Error (code 1204) while loading data into Redshift: "String length exceeds DDL length" Table name: "PUBLIC"."table_name" Column name: column_a Column type: varchar(256) ``` In previous glue versions, the string columns were always varchar(65535) but now, my tables are created with varchar(256), and writing into some columns fail due to this error. Now, will this occur with other data types? . How can I solve this within Glue 4.0?
1
answers
0
votes
47
views
asked 20 days ago