Skip to content


Deep Learning Software

Core Module

Deep learning has, since its revolution back in 2012, transformed our lives. From Google Translate to driverless cars to personal assistants to protein engineering, deep learning is transforming nearly every sector of our economy and our lives. However, it did not take long before people realized that deep learning is not a simple beast to tame and it comes with its own kinds of problems, especially if you want to use it in a production setting. In particular, the concept of technical debt was invented to indicate the significant maintenance costs at a system level that it takes to run machine learning in production. MLOps should very much be seen as the response to the concept of technical debt, namely that we should develop methods, processes, and tools (with inspiration from classical DevOps) to counter the problems we run into when working with deep learning models.

It is important to note that all the concepts and tools that have been developed for MLOps can be used together with more classical machine learning models (think K-nearest neighbor, Random forest, etc.), however, deep learning comes with its own set of problems which mostly have to do with the sheer size of the data and models we are working with. For these reasons, we are focusing on working with deep learning models in this course.

Software Landscape for Deep Learning

Regarding software for Deep Learning, the landscape is currently dominated by three software frameworks (listed in order of when they were published):

Logo Logo Logo

We won't go into a longer discussion on which framework is best, as it is pointless. Pytorch and Tensorflow have been around for the longest and therefore have bigger communities and feature sets at this point in time. They are both very similar in the sense that they both have features directed against research and production. JAX is kind of the new kid on the block, which in many ways improves on Pytorch and Tensorflow, but is still not as mature as the other frameworks. As the frameworks use different kinds of programming principles (object-oriented vs. functional programming), comparing them is essentially meaningless.

In this course, we have chosen to work with Pytorch because we find it a bit more intuitive and it is the framework that we use for our day-to-day research life. Additionally, as of right now, it is absolutely the dominating framework for published models, research papers, and competition winners.

The intention behind this set of exercises is to bring everyone's Pytorch skills up to date. If you already are a Pytorch-Jedi, feel free to pass the first set of exercises, but I recommend that you still complete it. The exercises are, in large part, taken directly from the deep learning course at Udacity. Note that these exercises are given as notebooks, which is the last time we are going to use them actively in the course. Instead, after this set of exercises, we are going to focus on writing code in Python scripts.

The notebooks contain a lot of explanatory text. The exercises that you are supposed to fill out are inlined in the text in small "exercise" blocks:


If you need a refresher on any deep learning topic in general throughout the course, we recommend finding the relevant chapter in the deep learning book by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (which can also be found in the literature folder). It is not necessary to be good at deep learning to pass this course as the focus is on all the software needed to get deep learning models into production. However, it's important to have a basic understanding of the concepts.

❔ Exercises

Exercise files

  1. Start a Jupyter Notebook session in your terminal (assuming you are standing at the root of the course material). Alternatively, you should be able to open the notebooks directly in your code editor. For VS code users you can read more about how to work with Jupyter Notebooks in VS code here

  2. Complete the Tensors in Pytorch notebook. It focuses on the basic manipulation of Pytorch tensors. You can pass this notebook if you are comfortable doing this.

  3. Complete the Neural Networks in Pytorch notebook. It focuses on building a very simple neural network using the Pytorch nn.Module interface.

  4. Complete the Training Neural Networks notebook. It focuses on how to write a simple training loop for training a neural network.

  5. Complete the Fashion MNIST notebook, which summarizes concepts learned in notebooks 2 and 3 on building a neural network for classifying the Fashion MNIST dataset.

  6. Complete the Inference and Validation notebook. This notebook adds important concepts on how to do inference and validation on our neural network.

  7. Complete the Saving_and_Loading_Models notebook. This notebook addresses how to save and load model weights. This is important if you want to share a model with someone else.

🧠 Knowledge check

  1. If tensor a has shape [N, d] and tensor b has shape [M, d] how can we calculate the pairwise distance between rows in a and b without using a for loop?


    We can take advantage of broadcasting to do this

    a = torch.randn(N, d)
    b = torch.randn(M, d)
    dist = torch.sum((a.unsqueeze(1) - b.unsqueeze(0))**2, dim=2)  # shape [N, M]
  2. What should be the size of S for an input image of size 1x28x28, and how many parameters does the neural network then have?

    from torch import nn
    neural_net = nn.Sequential(
        nn.Conv2d(1, 32, 3), nn.ReLU(), nn.Conv2d(32, 64, 3), nn.ReLU(), nn.Flatten(), nn.Linear(S, 10)

    Since both convolutions have a kernel size of 3, stride 1 (default value) and no padding that means that we lose 2 pixels in each dimension, because the kernel can not be centered on the edge pixels. Therefore, the output of the first convolution would be 32x26x26. The output of the second convolution would be 64x24x24. The size of S must therefore be 64 * 24 * 24 = 36864. The number of parameters in a convolutional layer is kernel_size * kernel_size * in_channels * out_channels + out_channels (last term is the bias) and the number of parameters in a linear layer is in_features * out_features + out_features (last term is the bias). Therefore, the total number of parameters in the network is 3*3*1*32 + 32 + 3*3*32*64 + 64 + 36864*10 + 10 = 387,466, which could be calculated by running:

    sum([prod(p.shape) for p in neural_net.parameters()])
  3. A working training loop in Pytorch should have these three function calls: optimizer.zero_grad(), loss.backward(), optimizer.step(). Explain what would happen in the training loop (or implement it) if you forgot each of the function calls.


    optimizer.zero_grad() is in charge of zeroring the gradient. If this is not done, then gradients would accumulate over the steps leading to exploding gradients. loss.backward() is in charge of calculating the gradients. If this is not done, then the gradients will not be calculated and the optimizer will not be able to update the weights. optimizer.step() is in charge of updating the weights. If this is not done, then the weights will not be updated and the model will not learn anything.

Final exercise

As the final exercise, we will develop a simple baseline model that we will continue to develop during the course. For this exercise, we provide the data in the data/corruptmnist folder. Do NOT use the data in the corruptmnist_v2 folder as that is intended for another exercise. As the name suggests, this is a (subsampled) corrupted version of the regular MNIST. Your overall task is the following:

Implement an MNIST neural network that achieves at least 85% accuracy on the test set.

Before any training can start, you should identify the corruption that we have applied to the MNIST dataset to create the corrupted version. This can help you identify what kind of neural network to use to get good performance, but any network should really be able to achieve this.

One key point of this course is trying to stay organized. Spending time now organizing your code will save time in the future as you start to add more and more features. As subgoals, please fulfill the following exercises:

  1. Implement your model in a script called

  2. Implement your data setup in a script called The data was saved using, so to load it you should use torch.load.

    Saving the model

    When saving the model, you should use, ""), and when loading the model, you should use model.load_state_dict(torch.load("")). If you do, ""), this can lead to problems when loading the model later on, as it will try to not only save the model weights but also the model definition. This can lead to problems if you change the model definition later on (which you most likely are going to do).

  3. Implement training and evaluation of your model in script. The script should be able to take additional subcommands indicating if the model should be trained or evaluated. It will look something like this:

    python train --lr 1e-4
    python evaluate

    which can be implemented in various ways.

    VS code and command line arguments

    If you try to execute the above code in VS code using the debugger (F5) or the build run functionality in the upper right corner:


    you will get an error message saying that you need to select a command to run e.g. either needs the train or evaluate command. This can be fixed by adding a lunch.json to a specialized .vscode folder in the root of the project. The lunch.json file should look something like this:

        "version": "0.2.0",
        "configurations": [
                "name": "Python: Current File",
                "type": "python",
                "request": "launch",
                "program": "${file}",
                "args": [
                "console": "integratedTerminal",
                "justMyCode": true

    This will inform VS code that then we execute the current file (in this case we want to run it with the train command and additionally pass the --lr argument with the value 1e-4. You can read more about creating a lunch.json file here. If you want to have multiple configurations you can add them to the configurations list as additional dictionaries.

To start you off, a very basic version of each script is provided in the final_exercise folder. We have already implemented some logic, especially to make sure you can easily run different subcommands for step 4. If you are interested in how this is done, you can check out this optional module on defining command line interfaces (CLI). We additionally also provide a requirements.txt with suggestions for the necessary packages to complete the exercise.

As documentation that your model is working when running the train command, the script needs to produce a single plot with the training curve (training step vs training loss). When the evaluate command is run, it should write the test set accuracy to the terminal.

It is part of the exercise to not implement in notebooks, as code development in real life happens in scripts. As the model is simple to run (for now), you should be able to complete the exercise on your laptop, even if you are only training on CPU. That said, you are allowed to upload your scripts to your own "Google Drive" and then you can call your scripts from a Google Colab notebook, which is shown in the image below where all code is placed in the script and the Colab notebook is just used to execute it.


Be sure to have completed the final exercise before the next session, as we will be building on top of the model you have created.