Skip to content

Scaling applications


This module is all about scaling the applications that we are building. We are here going to use a very narrow definition of scaling namely that we want our applications to run faster, however, one should note that in general scaling is a much broader term. There are many different ways to scale your applications and we are going to look at three of these related to different tasks in machine learning algorithms:

  • Scaling data loading
  • Scaling training
  • Scaling inference

We are going to approach the term scaling from two different angles that both should result in your application running faster. The first approach is levering multiple devices, such as using multiple CPU cores or parallelizing training across multiple GPUs. The second approach is more analytical, were we are actually going to look at how we can design smaller/faster model architectures that runs faster.

It should be noted that this module is specific to working with Pytorch applications. In particular we are going to see how we can both improve base Pytorch code and how to utilize the Pytorch Lightning which we introduced in module M14 on boilerplate to improve the scaling of our applications. If your application is written using another framework we can guarantee that the same techniques in these modules transfers to that framework, but may require you do seek out how to specifically to it.

If you manage to complete all modules in this session, feel free to checkout the extra module on scalable hyperparameter optimization.

Learning objectives

The learning objectives of this session are:

  • Understand how data loading during training can be parallelized and have experimented with it
  • Understand the different paradigms for distributed training and can run multi-gpu experiments using the framework pytorch-lightning
  • Knowledge of different ways, including quantization, pruning, architecture tuning etc. to improve inference speed