AWS announces preview of AWS Interconnect - multicloud
AWS announces AWS Interconnect – multicloud (preview), providing simple, resilient, high-speed private connections to other cloud service providers. AWS Interconnect - multicloud is easy to configure and provides high-speed, resilient connectivity with dedicated bandwidth, enabling customers to interconnect AWS networking services such as AWS Transit Gateway, AWS Cloud WAN, and Amazon VPC to other cloud service providers with ease.
為什麼我的 Amazon SageMaker 管道執行會失敗?
我想要對 Amazon SageMaker 管道執行失敗的原因進行疑難排解。
解決方案
若要對 SageMaker 的管線執行失敗進行疑難排解,請執行下列操作:
**備註:**如果您在執行 AWS CLI 命令時收到錯誤,請確保您使用的是最新版 AWS CLI。
1. 執行 AWS Command Line Interface (AWS CLI) 命令 list-pipeline-executions。
**備註:**如果您的本機電腦未設定 AWS CLI,請使用 AWS CloudShell 主控台。
$ aws sagemaker list-pipeline-executions --pipeline-name test-pipeline-p-wzx9cplzrvdk
此命令會傳回管線的管線執行清單,看起來類似下列內容:
"PipelineExecutionSummaries": [ { "PipelineExecutionArn": "arn:aws:sagemaker:eu-west-1:1111222233334444:pipeline/test-pipeline-p-wzx9cplzrvdk/execution/lvejn1jl827b", "StartTime": "2022-09-27T12:56:44.646000+00:00", "PipelineExecutionStatus": "Failed", "PipelineExecutionDisplayName": "execution-1664283404791", "PipelineExecutionFailureReason": "Step failure: One or multiple steps failed." }, { "PipelineExecutionArn": "arn:aws:sagemaker:eu-west-1:1111222233334444:pipeline/test-pipeline-p-wzx9cplzrvdk/execution/acvref9y1f47", "StartTime": "2022-09-27T12:13:28.762000+00:00", "PipelineExecutionStatus": "Succeeded", "PipelineExecutionDisplayName": "execution-1664280808943" } ] }
2. 執行 list-pipeline-executions-steps 命令,以檢視失敗的步驟:
$ aws sagemaker list-pipeline-execution-steps --pipeline-execution-arn arn:aws:sagemaker:eu-west-1:1111222233334444:pipeline/test-pipeline-p-wzx9cplzrvdk/execution/lvejn1jl827b
輸出類似於以下內容:
{ "PipelineExecutionSteps": [ { "StepName": "TrainAbaloneModel", "StartTime": "2022-09-27T13:00:49.235000+00:00", "EndTime": "2022-09-27T13:01:50.056000+00:00", "StepStatus": "Failed", "AttemptCount": 0, "FailureReason": "ClientError: ClientError: Please ensure the security group provided is valid", "Metadata": { "TrainingJob": { "Arn": "arn:aws:sagemaker:eu-west-1:1111222233334444:training-job/pipelines-lvejn1jl827b-trainabalonemodel-u9l9wjassg" } } }, { "StepName": "PreprocessAbaloneData", "StartTime": "2022-09-27T12:56:45.595000+00:00", "EndTime": "2022-09-27T13:00:48.638000+00:00", "StepStatus": "Succeeded", "AttemptCount": 0, "Metadata": { "ProcessingJob": { "Arn": "arn:aws:sagemaker:eu-west-1:1111222233334444:processing-job/pipelines-lvejn1jl827b-preprocessabalonedat-6axq0kthyg" } } } ] }
在此情況下,訓練任務步驟失敗,是因為在該任務的 vPCConfig 物件中指定不存在的安全群組。
如果失敗步驟的失敗原因不明,請檢查 Amazon CloudWatch Logs 中是否有失敗的 SageMaker 任務或端點,以進一步進行疑難排解。您可以在 CloudWatch 日誌群組 /aws/sagemaker/TrainingJobs 中查看訓練任務的日誌。日誌串流看起來類似下列內容:
example-training-job-name/algo-example-instance-number-in-cluster-example-epoch-timestamp
相關資訊
- 語言
- 中文 (繁體)

相關內容
- 已提問 3 年前
- 已提問 2 年前
- 已提問 1 年前
- 已提問 1 年前