Redshift super datatype not enough to store json data type column from Postgres

0

We are encountering a issue where we're utilizing the "super" datatype. The column in the Parquet file we receive has a maximum length of 192K. How should we handle this data? Are there alternative datatypes we can use to accommodate such large data sizes?

msve
질문됨 한 달 전235회 조회
2개 답변
0

Redshift copy, as well as glue/athena, is incapable of processing an embedded json string within a parquet column, no matter what data type you set that column to within the parquet schema. No matter what you do, if the json string is over 65k characters, you will not be able to get it into a redshift super. Neither through vanilla copy, nor through spectrum. If you can't change the way the files are being written to s3, use a lambda to reprocess the parquet into json in a different s3 folder, then ingest it from there.

답변함 2일 전
-1

Is the parquet file the one you are ingesting? One option would be to keep the file as parquet and read it via Redshift Spectrum. https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html. You could then query it joined with all the other data in Redshift and not have make alterations to the file itself.

AWS
evaleah
답변함 한 달 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠