- 新しい順
- 投票が多い順
- コメントが多い順
Redshift copy, as well as glue/athena, is incapable of processing an embedded json string within a parquet column, no matter what data type you set that column to within the parquet schema. No matter what you do, if the json string is over 65k characters, you will not be able to get it into a redshift super. Neither through vanilla copy, nor through spectrum. If you can't change the way the files are being written to s3, use a lambda to reprocess the parquet into json in a different s3 folder, then ingest it from there.
Is the parquet file the one you are ingesting? One option would be to keep the file as parquet and read it via Redshift Spectrum. https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html. You could then query it joined with all the other data in Redshift and not have make alterations to the file itself.
関連するコンテンツ
- AWS公式更新しました 2年前