DeepSpeed and Hugginface Accelerate already include a lot of these features and ...

DeepSpeed and Hugginface Accelerate already include a lot of these features and they’re indeed already used in production. This class would be a great intro to said features given most people barely have access to one GPU let alone hundreds.

Moreover, the class also covers topics related to using and hacking with foundational models of all you have is the model versus a cluster.

Energy used to train foundation models is indeed extreme but the gross expenditure is more a function of corporate spending and competition than deep learning technology.