1回答
- 新しい順
- 投票が多い順
- コメントが多い順
0
It’s my understanding that you’ll need to merge them into a single pytorch_model.bin
I don’t recall my original source, but I had this saved in my notes, so please make any necessary changes for your project.
- Create a new directory and copy all the bin files of your model into it.
- Install the
torch
library if you haven't already done so.pip install torch
- Use the following Python code to merge the bin files into one:
import torch bin_files_path = "path/to/your/bin/files/directory" output_path = "path/to/output/merged/bin/file/pytorch_model.bin" # Create an empty state dictionary state_dict = {} device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Load the weights from each bin file and merge them into the state dictionary for bin_file in sorted(os.listdir(bin_files_path)): if bin_file.endswith(".bin"): model_state_dict = torch.load(os.path.join(bin_files_path, bin_file), map_location=device) state_dict.update(model_state_dict) # Save the merged state dictionary as a single bin file torch.save(state_dict, output_path)
After merging the bin files into one, you should be able to deploy your model into SageMaker using the merged pytorch_model.bin
file.
回答済み 10ヶ月前
関連するコンテンツ
- AWS公式更新しました 2年前
- AWS公式更新しました 8ヶ月前
- AWS公式更新しました 3年前
I passed this error. Sagemaker will actually do the conversion for me. But I need to give it more time.
Set up the container_startup_health_check_timeout to a bigger number and it will pass this error.
But I encountered the next error
I upgraded to a bigger instance type, and played with param PYTORCH_CUDA_ALLOC_CONF but the error persisted.