what resources are required to train a GPT model with 10 billion parameters using 6 petabytes of data, assuming no hyperparameter tuning is performed? Specifically, how many GPUs would be needed and what types of GPUs would you recommend?
You are not logged in. Log in to post an answer.
A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.