- 新しい順
- 投票が多い順
- コメントが多い順
OK, thank you for bringing this to our attention. We've identified the source of this issue and will investigate a fix. When it is fixed a note will to appear in our regular maintenance announcements at the top of the forum.
This is related to the automatic DISTSTYLE feature we introduced recently. For DISTSTYLE AUTO tables that are currently ALL, we load new data into a temp table and then check if the new rows added require the table to be converted from ALL to EVEN.
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
DISTSTYLE { AUTO | EVEN | KEY | ALL }
The default is AUTO. …
AUTO: Amazon Redshift assigns an optimal distribution style based on the table data. For example, if AUTO distribution style is specified, Amazon Redshift initially assigns ALL distribution to a small table, then changes the table to EVEN distribution when the table grows larger. The change in distribution occurs in the background, in a few seconds. …
There are 2 ways work around this:
- Define an explicit DISTSTYLE of EVEN, KEY, or ALL on the table being loaded. I recommend this option primarily for tables that can use a KEY diststyle. If you are not using KEY then the AUTO diststyle will be the most efficient.
- Retrieve the query ID from the stl_load_errors, retrieve the SQL from stl_querytext for the query, and then look up the table by name from pg_class, svv_table_info, or stv_tbl_perm.
Just to respond quickly, here answering only this one specific question;
Why would my COPY to a table go through what seems like a temporary table for some table, and not for others?
It could concievaby depend on the amount of data. The incoming data is sorted with respect to itself before being appended to the table. If the data is small, it could be done in memory - if the data is large, perhaps a temporary table is needed.
This however is a complete guess and with zero knowledge of how sorting is handled internally, so it's 99.9% likely to be wrong.
Thanks for your answer.
That could have happened for some big loads I made with a couple millions of lines, but it also happens with the extremely small load of 3 lines I gave as an example in the link, so I don't understand. :/
Do you mean that weird unexpected ID would come from the intermediate analysis Redshift does to determine compression?
If I add encoding to all the columns of my output table it's not going to happen?
I'm going to try with encoding and see if I can reproduce the issue.
Edited by: NicolasPA on Apr 16, 2019 3:05 AM
So I have tried:
- keeping table columns without encoding in the
CREATE TABLE
and usingCOPY ... COMPUPDATE OFF
- adding columns encoding in the
CREATE TABLE
and usingCOPY ...
without compupdate off. - adding columns encoding in the
CREATE TABLE
and usingCOPY ... COMPUPDATE OFF
And none of these ideas changed anything, the table id found stl_load_errors
still doesn't corresponding to the one expected for the table I loaded.
Have I understood your comment correctly?
Thanks for your help.
Edited by: NicolasPA on Apr 16, 2019 6:59 AM
Please send me a PM with the name and region of your cluster and the query you ran to identify the load errors.
When loading data (COPY, INSERT, or CTAS) into new tables Redshift loads a sample of the data and performs a compression analysis. It then applies the selected compression encodings to the table and completes the data load.
If desired, you can avoid this step in INSERTs by specifying compression encodings manually for all columns (remember to specify RAW for the first sort key column). You can avoid this in COPY by specifying the COMPUPDATE OFF parameter, although I strongly recommend against this unless you have carefully specified compression for all columns in advance.
Woo!
Specifying a DISTSTYLE in my little reproduction does solve the problem and I'll apply the same solution to my production tables because they seem to fit with the EVEN style (hundreds of millions of rows, no joins).
I have also tested your method with the querying of the stl_querytext
table, and it works too in my reproduction case, but I don't find it very clean nor effective to parse a string to find the table name inside, so I'll stick with DISTSTYLE EVEN in my current cases.
Thanks a lot Joe!
Quoted you in an answer on my Stackoverflow post: https://stackoverflow.com/a/55762845/9599601
Edited by: NicolasPA on Apr 19, 2019 3:48 PM
関連するコンテンツ
- 質問済み 6年前
- AWS公式更新しました 2年前