1回答
- 新しい順
- 投票が多い順
- コメントが多い順
1
Hi Ayman,
Try increasing the number of training data or set max_seq_len hyper-parameter to be small (For example a value of 128) to see if the error keeps persisting.
The way that the computation works is that all text is processed, combined and then split into sample (each of length equal to max input length). Then, the examples are batched as per the batch size. If you are using 8 GPU machines, you need to have at least 8 non-empty batches. That is, you either need to have large enough data such that there are 8 batches or you need to decrease the batch size or you need to reduce the max input length.
回答済み 5ヶ月前