Is Redshift mixing up my data columns when creating a model?

0

Hello,

I'm using running:

create model predict_xxxxx
from (select col1, col2, col3 from my_table)
target col3
function predict_xxx
iam_role 'arn:aws:iam::xxxxxxx:role/RedshiftML'
problem_type regression
objective 'mse'
settings (
    s3_bucket 'redshiftml-xxxxxxx',
    s3_garbage_collect off,
    max_runtime 1800
);

Which then generates input data files in CSV format in the S3 bucket I specified, but when I open up those files and look at them, all the columns in my select statement are present, but the column headers are mismatched with the data below them. I see col1 data under the col2 column and so on. I know the data is mixed up because the data types and numeric ranges are different for each column. I double-checked my table and the columns and data are matched correctly. Is Redshift/Sagemaker then using that mismatched data to train the model? I have tried with only two column and it still gets mixed up. I've tried using a table instead of a select expression and the problem persists.

Any insight is appreciated.

Thanks,

  • SV
scv
質問済み 2年前253ビュー
1回答
0

Hi there, Is it not possible that the table contains a null header and sagemaker is reading that header as col1 instead thus giving you that shift in the data structure. If that is not the case then it is possible you already had an col0 and it is moving up the data a column when you append col1, col2 and so on.

Hopefully this gives you more to think about and puts you in the right direction.

Regards NN

回答済み 2年前
  • Hi NN,

    The data is coming from a Redshift table/select expression so I can't see how a "null header" is possible. And each run generates a fresh set of files in S3, so I'm not sure what it means for columns to be appended.

    Do you have examples of null headers or columns moving around, that would be very interesting to look into.

    Regards, SV

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ