Sagemaker批量转换报错415返回值。

Question

【以下的问题经过翻译处理】 你好，我需要在1500万个样本（以csv格式存储时为3.9GB）上运行XGBoost推理。由于批量转换在这样大的批量上不起作用（最大有效负载为100MB），我将我的输入文件分成了646个文件，每个文件约为6MB，存储在S3中。我正在运行下面的代码：

```
transformer = XGB.transformer(
    instance_count=2, instance_type='ml.c5.9xlarge',
    output_path='s3://xxxxxxxxxxxxx/sagemaker/recsys/xgbtransform/',
    max_payload=100)

transformer.transform(
    data='s3://xxxxxxxxxxxxx/sagemaker/recsys/testchunks/',
    split_type='Line')

```

但任务失败了- Sagemaker告诉“ClientError：Too many objects failed. See logs for more information”，而 cloudwatch 日志显示：

```
Bad HTTP status returned from invoke: 415
'NoneType' object has no attribute 'lower'
```

我在批量转换的设置中忘记了什么吗？

Answer

【以下的回答经过翻译处理】 这指示算法认为传递了错误的数据。也许是您的拆分过程存在问题？

我建议尝试以下两种方法：

1. 使用参数“SplitType”：“Line”和“BatchStrategy”：“MultiRecord” 在原始数据上运行算法，看看是否有更好的结果。
2. 查看Cloudwatch日志，了解算法在哪些地方报错，日志可以在日志组“/aws/sagemaker/TransformJobs”中找到，该日志流以作业名称开头。

Sagemaker批量转换报错415返回值。

相关内容