import tables with large data columns from redshift federated schema into local tables

0

Hi,

We have Aurora Postgres databases federated in Redshift and require copying tables into Redshift local schemata. Our Aurora databases have some tables with large data columns, in the form of integer arrays among others. When we try to import those tables as with this query:

CREATE TABLE local_schema.accounts AS (
    SELECT
        tasks
    FROM federated_schema.accounts
)

We get this error:

[XX000] ERROR: Value of VARCHAR type is too long.
Detail:
-----------------------------------------------
error:  Value of VARCHAR type is too long.
code:      25101
context:   Received VARCHAR len=128599 max=65535. This could be due to multibyte characters exceeding the column size or the value in remote database is larger than the max allowed length (65535) of varchar column in Redshift.
query:     865164
location:  federation_fetchers.hpp:146
process:   query12_495_865164 [pid=12824]
-----------------------------------------------

The error does not change, even if we try to cast the column as in CAST(tasks AS SUPER)

Based on this https://docs.amazonaws.cn/en_us/redshift/latest/dg/federated-data-types.html such columns are converted into VARCHAR(64K)

How can we handle columns with data larger than 64K of characters?

This post suggest creating a view on postgres, but we do not have access to the postgres db. https://repost.aws/questions/QUM_FcRt0eR3uqhTJ6ZDDUJQ/redshift-federated-query-rds-postgres-long-varchar-value-error-code-25101

Is there a way to handle this issue from the Redshift side?

Much appreciate the help!

asked 2 years ago1.1K views
1 Answer
0

Hello,

I would like to inform you that in regard to federated queries, PostgreSQL data types are converted to VARCHAR(64K) in Amazon Redshift [1] unless they can be mapped to data types in Redshift.

Unfortunately, as of now VARCHAR datatype has a hard limit on length which is 65535 bytes [2]. So for now we would have to keep the column size and values within this limit, 65535 bytes in order to resolve this error.

A possible workaround for the above can be to split the column data to multiple columns in PostgreSQL, possibly using a view, so that data in each column does not exceed the 65535 bytes limit.

With regards to using the SUPER type as a way to overcome this limit, I would like to let you know that when pulling data into a SUPER datatype column, the JSON data is parsed using json_parse(json_string) function [3], although SUPER objects and arrays are up to 1MB it is a string scalar value inside and would be constrained to the same 64K VARCHAR size limit, therefore, this would not be a feasible way to overcome the above limitation with federated queries.

Therefore, currently, the only workaround to overcome the above limitation is to create a physical table in Redshift, then get the data from the RDS/Aurora into S3 and load the data into Redshift within the SUPER type, however, if the values are larger than the 64K character limit for VARCHAR (which is imposed on the SUPER type) the limitation would still apply.

Considering the above, there is currently no way to overcome this limitation aside from possibly splitting up the columns containing the data into smaller columns which would fit within the maximum size of the VARCHAR column, but I do understand that this may not be ideal.

[1] Data type differences between Amazon Redshift and supported PostgreSQL and MySQL databases - https://docs.aws.amazon.com/redshift/latest/dg/federated-data-types.html

[2] Character types - Storage and ranges - https://docs.aws.amazon.com/redshift/latest/dg/r_Character_types.html#r_Character_types-storage-and-ranges

[3] JSON_PARSE function - https://docs.aws.amazon.com/redshift/latest/dg/JSON_PARSE.html

AWS
SUPPORT ENGINEER
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions