Unanswered Questions tagged with Amazon Redshift
Content language: English
Sort by most recent
Weird DMS Error: "Replication Task Settings document error: Array element's unique identifier was not found, please provide Id for each and every element"
Trying to set up DMS between RDS MySQL and Redshift. If I try to edit the the Table Mappings from the GUI, I always get a weird error (that has zero Google results, except 1 spam page) `Replication Task Settings document error: Array element's unique identifier was not found, please provide Id for each and every element`. No idea what that means. Every rule has a unique name and ID. Not the biggest issue in the world, as it's usually okay if I make a new one, but it does make it hard to iterate. Any idea what the issue here could be? Thanks!
Redshift is translating 'dbo' schema name in to 'public' schema name.
We have two Redshift clusters. One cluster updated to version `1.0.44126` over the weekend. After that, we started experiencing failures in some data extract processes. The error was: `ERROR: schema "dbo" does not exist` Sure enough, after reviewing the queries we had inadvertently used `dbo` instead of `public` in the schema name. However, it worked for months prior to this latest Redshift version. I cannot find any information on version `1.0.44126`. I cannot find any feature that might translate these schema names transparently. **Why is Redshift doing this?** I can reproduce the behavior on cluster version `1.0.43931`. ``` create table public.test_table ( test integer ); select * from dbo.test_table; drop table dbo.test_table; ``` The above all succeeds on cluster version `1.0.43931` and fails on `1.0.44126`.
COPY command ignoring defaults if some values are present in specific fields
I am trying to load a Redshift table from a .csv file in an S3 bucket. When I created the table, three columns are set to have default values. ,"current_row_ind" CHAR(1) DEFAULT 'Y' ,"EFFECTIVE_DATE" DATE DEFAULT TO_DATE(CURRENT_DATE, 'YYYY-MM-DD') ,"EXPIRATION_DATE" DATE DEFAULT TO_DATE('9999-12-31', 'YYYY-MM-DD') Some of the rows in my file have values for these fields. Some do not. When I use the COPY command to load the table, Redshift seems to ignore the defaults for those fields that do not have values filled in. If I don't specify those fields in the list but leave them in my file, I get the following error: "Extra column(s) found". If I do specify those fields in the list, those fields in my file that do not have values are added with NULL values rather than the defaults. If I leave those fields off the COPY list and remove them from my file, they load with the defaults. Obviously this won't work as some of the fields have legitimate values. I've listed the parameters I'm using in my COPY command below: FORMAT AS CSV DELIMITER ',' QUOTE '"' ACCEPTINVCHARS '?' TIMEFORMAT 'auto' IGNOREHEADER 1 ACCEPTANYDATE BLANKSASNULL DATEFORMAT 'auto' EMPTYASNULL EXPLICIT_IDS FILLRECORD IGNOREBLANKLINES TRIMBLANKS REGION AS 'us-west-2' My question is how can I use the COPY command to load this table and have it pick up the values for these fields as they are in the file but use the defaults when values are not present? I have researched this here in re:Post as well as other forums such as StackOverflow but have not found anything that addresses this specific scenario. Any help would be appreciated.
HIVE_UNSUPPORTED_FORMAT: Output format org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat with SerDe org.openx.data.jsonserde.JsonSerDe is not supported. If a data manifest file was generated
HIVE_UNSUPPORTED_FORMAT: Output format org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat with SerDe org.openx.data.jsonserde.JsonSerDe is not supported. If a data manifest file was generated at 's3://athena-one-output-bucket/Unsaved/2022/11/24/0a5467bf-8b9a-4119-bc89-c891d1e26744-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account. This query ran against the "covid_dataset" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 0a5467bf-8b9a-4119-bc89-c891d1e26744
Column header is not showing when reading data from redshift to jupyter on Sagemaker
Hello, I am reading a specific table from a redshift database using redshift connector. when i am viewing the dataframe it does not show the column headers. It shows only numbers as the column headers. Can anyone help whats wrong here? we need to view the table headers
Get full query logs in redshift serverless
Hi friends, We need to access executed queries in our redshift serverless. It shows there's the `sys_query_history` table https://docs.aws.amazon.com/redshift/latest/dg/SYS_QUERY_HISTORY.html, but the query_text has a 4000 character limits, longer than that would be truncated. Is there another way for getting full query text in Redshift Serverless?
RedShift Serverless timeout when connecting with python
I tried to use redshift_connector to connect to my redshift cluster using python but I get a timeout error.. ``` import redshift_connector conn = redshift_connector.connect( host="default.XXXXXXXXX.us-east-1.redshift-serverless.amazonaws.com", database='dev', access_key_id="XXXXXXXXX", secret_access_key="XXXXXXX", port=5439, region="us-east-1" ) ``` Result : ``` redshift_connector.error.InterfaceError: ('communication error', TimeoutError(10060, ``` 1- I am using my default workspace which uses my default VPC which has open inbound rules for IP and ports 2- I enabled public access to the workspace wasted 3 hours on this and finaly used google bigquery ...
Problem with Federated Query to RDS Assert code: 1000
We are connecting Redshift with RDS using Federated Queries. When we try to query very simple tables like Month (id (int4) / name (text)) or Practices (text,text,text) from Postgres, we have errors like: ERROR: ----------------------------------------------- error: Assert code: 1000 context: reltuples >= 0.0 - Number of rows cannot be negative query: 0 location: pgclient.cpp:288 process: padbmaster [pid=14019] ----------------------------------------------- [ErrorId: 1-630676b5-594e911339e1d0341291f074] One useful information: we enabled enable_case_sensitive_identifier = True because the names of the tables on RDS were PascalCase. The tables are small, so I don't know if a query optimization engine is causing such errors. Any information would help us. Thanks in advance.
[Python UDF] Failed to import library after CREATE LIBRARY
I want to write a Python UDF that uses `scikit-learn`. Here's the command that I'm running. ```sql CREATE OR REPLACE LIBRARY scikit_learn LANGUAGE plpythonu FROM '...' ``` I've uploaded the a 2.7 Python package in a `.zip` file to http://file.io. Afterwards, I'm trying to run a function that uses `sklearn` but I'm getting: `ImportError: No module named sklearn.covariance. Please look at svl_udf_log for more information`. Why does it fail to import even though it downloaded the package?
Complex lag windows function help needed
Im racking my brain about how to do this and I dont think its possible but not sure what to even search for. So the below is a subset of the data (using example numbers) I am trying to use the lag function to populate the flag column. ``` Status Flag Connected -> Exiting Maillink Exiting -> Not equal to Connected OnCorp ( All rows stay OnCorp until we see Connected) Connected -> Reconnecting Mailink (It should stay Mailink for all rows until we see "Exiting" status ) ``` Is it possible in Sql? Using lag function in case statement to compare previous state is causing error [Sample Data] : https://i.stack.imgur.com/tdTOm.png Sql Code (But erroring out ) -: ``` WITH base as ( select o.computername, o.currentuser, o.datetime, message, CASE WHEN LEN(split_part(substring(split_part(message, 'STATE', 2), 13), ',', 1)) <> 0 THEN split_part(substring(split_part(message, 'STATE', 2), 13), ',', 1) WHEN (message like 'WARNING%' OR message like 'OpenVPN%') THEN 'Start' WHEN message = 'SIGTERM[hard,] received, process exiting' THEN 'Exit1' WHEN message = 'Closing TUN/TAP interface' THEN 'Exit2' ELSE 'NO Status' END State1, CASE WHEN State1 = 'Start' THEN 1 WHEN State1 = 'RESOLVE' THEN 2 WHEN State1 = 'WAIT' THEN 3 WHEN State1 = 'AUTH' THEN 4 WHEN State1 = 'GET_CONFIG' THEN 5 WHEN State1 = 'ASSIGN_IP' THEN 6 WHEN State1 = 'ADD_ROUTES' THEN 7 WHEN State1 = 'CONNECTED' THEN 8 WHEN State1 = 'EXITING' THEN 9 END orderofoperation --row_number() over (partition by o.computername,o.currentuser,DATE(o.datetime) order by o.computername,o.currentuser,o.datetime) as rownumber from maillink_openvpn_logs_ext_schema.open_vpn_filtered o where o.message != 'message' and currentuser = 'wuellie' --and State1 in ('EXITING' ,'OpenVPN','RESOLVE','WAIT','AUTH','ASSIGN_IP','GET_CONFIG','ADD_ROUTES') order by o.datetime,orderofoperation ) select final.*, case when final.previous_record_state is null then ‘oncorp’ when final.currentstate = ‘CONNECTED’ then ‘maillink’ when final.previous_record_state is not null and final.previous_record_state not in (‘CONNECTED’) then lag(flag) partition by computername,currentuser, nextstateddatetime ::DATE order by nextstateddatetime when previous_record_state in (‘EXITING’) and lag(flag) partition by computername,currentuser, nextstateddatetime ::DATE order by nextstateddatetime = ‘maillink’ then ‘oncorp’ else lag(flag) partition by computername,currentuser, nextstateddatetime ::DATE order by nextstateddatetime end as flag from ( select b.computername,b.currentuser,b.State1 currentstate,b.datetime, lag(State1) over (partition by b.computername,b.currentuser,(b.datetime) ::DATE order by b.computername,b.currentuser, (b.datetime) ::TIMESTAMP,orderofoperation) as previous_record_state --lag(datetime) over (partition by b.computername,b.currentuser,b.datetime ::DATE order by b.computername,b.currentuser, (b.datetime) ::TIMESTAMP,orderofoperation) as nextdatetime from base b order by b.datetime,orderofoperation )final ``` ``` Computername username State datetime onCorp/mailink xyx ads start 2022-07-06T06:45:51 ONCORP xyx ads RESOLVE 2022-07-06T07:06:45 ONCORP xyx ads WAIT 2022-07-06T07:06:45 ONCORP xyx ads AUTH 2022-07-06T07:07:00 ONCORP xyx ads GET_CONFIG 2022-07-06T07:07:00 ONCORP xyx ads ADD_ROUTES 2022-07-06T07:07:01 ONCORP xyx ads CONNECTED 2022-07-06T07:07:01 MAILINK xyx ads EXITING 2022-07-06T07:07:01 MAILINK xyx ads RESOLVE 2022-07-06T07:07:46 ONCORP xyx ads WAIT 2022-07-06T07:07:46 ONCORP xyx ads AUTH 2022-07-06T07:07:50 ONCORP xyx ads GET_CONFIG 2022-07-06T07:07:51 ONCORP xyx ads ADD_ROUTES 2022-07-06T07:07:51 ONCORP xyx ads CONNECTED 2022-07-06T07:07:52 MAILINK xyx ads RECONNECTED 2022-07-06T07:08:01 MAILINK xyx ads WAIT 2022-07-06T07:08:02 MAILINK xyx ads AUTH 2022-07-06T07:08:09 MAILINK xyx ads RECONNECTED 2022-07-06T07:08:10 MAILINK xyx ads CONNECTED 2022-07-06T07:08:15 MAILINK xyx ads RECONNECTED 2022-07-06T07:08:20 MAILINK xyx ads CONNECTED 2022-07-06T07:09:01 MAILINK xyx ads EXITING 2022-07-06T07:10:50 MAILINK xyx ads START 2022-07-06T07:11:50 ONCORP ```
Spectrum on Redshift Shift Serverless with Enhanced VPC routing
For Amazon Redshift provisioned clusters, traffic between Spectrum and Amazon S3 does not use enhanced VPC routing even if enhanced VPC routing is enabled on the cluster. Does Redshift Serverless use enhanced VPC routing if it is enabled on the Workgroup?  https://docs.aws.amazon.com/redshift/latest/mgmt/spectrum-enhanced-vpc.html
Incremental Copy of RDS Aurora MySql Table to RedShift
Hi!!! To copy/incrementally copy data from Aurora MySql 5.7 to RedShift, can anyone share implementation steps? i couldnt do it with DataPipeline, Glue, DMS due to Certificate Issues at the Db. 1. Scheduled Jobs that does S3 exports of initial/changed data. This is the least expensive as you can decide what tables to export data from and how often. You can decide to do that every 10 mins or so. AWS Batch can be used with Lambda to accomplish this. - any references to Achieve this? Any other better optimised ways? Thanks!!