1 回答
- 最新
- 投票最多
- 评论最多
1
Hi Ayman,
Try increasing the number of training data or set max_seq_len hyper-parameter to be small (For example a value of 128) to see if the error keeps persisting.
The way that the computation works is that all text is processed, combined and then split into sample (each of length equal to max input length). Then, the examples are batched as per the batch size. If you are using 8 GPU machines, you need to have at least 8 non-empty batches. That is, you either need to have large enough data such that there are 8 batches or you need to decrease the batch size or you need to reduce the max input length.
已回答 5 个月前
相关内容
- AWS 官方已更新 2 年前