When training with zero2offloading, I have 95% of GPU memory utilization but but only 20% of GPU utilization. And with no offloading it gives me OOM error. I am training it on 8 H100s.
HOw can I increase my GPU utilization
Also would like to know if I have to train it on 5 nodes (each with 8 h100s) Whats the best configuration? Can I use deepspeed 3? Or something like Deepspeed zero++?