Skip to content

Latest commit

 

History

History

week09

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Week 9: HPC, MPI, and Multinode / MultiGPU (MNMG) Training

History of HPC. HPC vs HTC/Big Data. Typical HPC problems. Architecture of supercomputers. Interconnect topologies: fat tree, torus. FLOPs, Top500. Amdahl’s law. Programming for HPC systems. MPI. HPC schedulers. Infiniband vs TCP/IP. Google TPUs and TPU pods. Nvidia DGX systems and superpods. Magnum IO. Distributed Deep Learning Model training. Uber Horovod. Distributed Training in TensorFlow and PyTorch. Distributed Training in AWS and Azure.

Reading: