Deep Learning Software

Core Module

Since its breakthrough in 2012, deep learning has revolutionized various aspects of our lives, from Google Translate and driverless cars to personal assistants and protein engineering. It is transforming nearly every sector of the economy. However, deploying deep learning models in production presents unique challenges. The concept of technical debt highlights the significant maintenance costs associated with running machine learning systems in production. MLOps, inspired by classical DevOps, aims to address these challenges by developing methods, processes, and tools to streamline the development and deployment of deep learning models.

While MLOps concepts and tools can be applied to classical machine learning models (e.g., K-nearest neighbor, Random Forest), deep learning introduces specific challenges related to the size of data and models. Therefore, this course focuses on deep learning models.

Software Landscape for Deep Learning

Regarding software for Deep Learning, the landscape is currently dominated by three software frameworks (listed in order of when they were published):

A detailed comparison of these frameworks is unnecessary. PyTorch and TensorFlow, being the oldest, have larger communities and comprehensive feature sets for both research and production. JAX, a newer framework, incorporates improvements over PyTorch and TensorFlow but is still evolving. Given their different programming paradigms (object-oriented vs. functional), a direct comparison is not particularly meaningful.

This course uses PyTorch due to its intuitive nature and its prevalence in our research. Currently, PyTorch is the dominant framework for published models, research papers, and competition winners.

The intention behind this set of exercises is to get everyone's PyTorch skills up to date. If you're already a PyTorch-Jedi, feel free to skip the first set of exercises, but I still recommend that you go through them. The exercises are in large part taken directly from the deep learning course at Udacity. Note that these exercises are given as notebooks, which is the only time we are going to use them actively in the course. Instead, after this set of exercises, we are going to focus on writing code in Python scripts.

The notebooks contain a lot of explanatory text. The exercises that you are supposed to complete are inlined in the text in small "exercise" blocks:

For a refresher on deep learning topics, consult the relevant chapters in the deep learning book by Goodfellow, Bengio, and Courville (also available in the literature folder). While expertise in deep learning is not required to pass this course, a basic understanding of the concepts is beneficial. The course focuses on the software aspects of deploying deep learning models in production.

❔ Exercises

Exercise files

Start a Jupyter Notebook session in your terminal (assuming you are at the root of the course material). Alternatively, you should be able to open the notebooks directly in your code editor. For VS Code users you can read more about how to work with Jupyter Notebooks in VS code here
Complete the Tensors in PyTorch notebook. It focuses on basic manipulation of PyTorch tensors. You can skip this notebook if you are comfortable doing this.
Complete the Neural Networks in PyTorch notebook. It focuses on building a very simple neural network using the PyTorch nn.Module interface.
Complete the Training Neural Networks notebook. It focuses on how to write a simple training loop for training a neural network.
Complete the Fashion MNIST notebook, which summarizes concepts learned in notebooks 2 and 3 on building a neural network for classifying the Fashion MNIST dataset.
Complete the Inference and Validation notebook. This notebook adds important concepts on how to do inference and validation on our neural network.
Complete the Saving_and_Loading_Models notebook. This notebook addresses how to save and load model weights. This is important if you want to share a model with someone else.

🧠 Knowledge check

If tensor a has shape [N, d] and tensor b has shape [M, d] how can we calculate the pairwise distance between rows in a and b without using a for loop?
Solution

We can take advantage of broadcasting to do this
```
a = torch.randn(N, d)
b = torch.randn(M, d)
dist = torch.sum((a.unsqueeze(1) - b.unsqueeze(0))**2, dim=2)  # shape [N, M]
```
What should be the size of S for an input image of size 1x28x28, and how many parameters does the neural network then have?
```
from torch import nn
neural_net = nn.Sequential(
    nn.Conv2d(1, 32, 3), nn.ReLU(), nn.Conv2d(32, 64, 3), nn.ReLU(), nn.Flatten(), nn.Linear(S, 10)
)
```
Solution

Since both convolutions have a kernel size of 3, stride 1 (default value) and no padding that means that we lose 2 pixels in each dimension, because the kernel cannot be centered on the edge pixels. Therefore, the output of the first convolution would be 32x26x26. The output of the second convolution would be 64x24x24. The size of S must therefore be 64 * 24 * 24 = 36864. The number of parameters in a convolutional layer is kernel_size * kernel_size * in_channels * out_channels + out_channels (last term is the bias) and the number of parameters in a linear layer is in_features * out_features + out_features (last term is the bias). Therefore, the total number of parameters in the network is 3*3*1*32 + 32 + 3*3*32*64 + 64 + 36864*10 + 10 = 387,466, which could be calculated by running:
```
sum([prod(p.shape) for p in neural_net.parameters()])
```
A working training loop in PyTorch should have these three function calls: optimizer.zero_grad(), loss.backward(), optimizer.step(). Explain what would happen in the training loop (or implement it) if you forgot each of the function calls.

Solution

optimizer.zero_grad() is in charge of zeroing the gradient. If this is not done, then gradients would accumulate over the steps, leading to exploding gradients. loss.backward() is in charge of calculating the gradients. If this is not done, then the gradients will not be calculated and the optimizer will not be able to update the weights. optimizer.step() is in charge of updating the weights. If this is not done, then the weights will not be updated and the model will not learn anything.

Final exercise

As the final exercise, we will develop a simple baseline model that we will continue to develop during the course. For this exercise, we provide a corrupted subset of the MNIST dataset which can be downloaded from this Google Drive folder or using these two commands:

pip install gdown
gdown --folder 'https://drive.google.com/drive/folders/1ddWeCcsfmelqxF8sOGBihY9IU98S9JRP?usp=sharing'

The data should be placed in a subfolder called data/corruptmnist in the root of the project. Your overall task is the following:

Implement an MNIST neural network that achieves at least 85% accuracy on the test set.

Before any training can start, you should identify the corruption that we have applied to the MNIST dataset to create the corrupted version. This can help you identify what kind of neural network to use to get good performance, but any network should be able to achieve this.

One key point of this course is trying to stay organized. Spending time now organizing your code will save time in the future as you start to add more and more features. As subgoals, please complete the following exercises:

Implement your model in a script called model.py.

Starting point for model.py

model.py
from torch import nn


class MyAwesomeModel(nn.Module):
    """My awesome model."""

    def __init__(self) -> None:
        super().__init__()
        self.fc1 = nn.Linear(784, 128)

Solution

The provided solution implements a convolutional neural network with 3 convolutional layers and a single fully connected layer. Because the MNIST dataset consists of images, we want an architecture that can take advantage of the spatial information in the images.

model.py
import torch
from torch import nn


class MyAwesomeModel(nn.Module):
    """My awesome model."""

    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.conv3 = nn.Conv2d(64, 128, 3, 1)
        self.dropout = nn.Dropout(0.5)
        self.fc1 = nn.Linear(128, 10)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward pass."""
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2, 2)
        x = torch.relu(self.conv3(x))
        x = torch.max_pool2d(x, 2, 2)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        return self.fc1(x)


if __name__ == "__main__":
    model = MyAwesomeModel()
    print(f"Model architecture: {model}")
    print(f"Number of parameters: {sum(p.numel() for p in model.parameters())}")

    dummy_input = torch.randn(1, 1, 28, 28)
    output = model(dummy_input)
    print(f"Output shape: {output.shape}")

Implement your data setup in a script called data.py. The data was saved using torch.save, so to load it you should use torch.load.

Saving the model

When saving the model, you should use torch.save(model.state_dict(), "model.pt"), and when loading the model, you should use model.load_state_dict(torch.load("model.pt")). If you do torch.save(model, "model.pt"), this can lead to problems when loading the model later on, as it will try to not only save the model weights but also the model definition. This can lead to problems if you change the model definition later (which you are most likely going to do).

Starting point for data.py

data.py
import torch


def corrupt_mnist():
    """Return train and test dataloaders for corrupt MNIST."""
    # exchange with the corrupted mnist dataset
    train = torch.randn(50000, 784)
    test = torch.randn(10000, 784)
    return train, test

Solution

Data is stored in .pt files which can be loaded using torch.load (1). We iterate over the files, load them and concatenate them into a single tensor. In particular, we have highlighted the use of .unsqueeze function. Convolutional neural networks (which we propose as a solution) need the data to be in the shape [N, C, H, W] where N is the number of samples, C is the number of channels, H is the height of the image and W is the width of the image. The dataset is stored in the shape [N, H, W] and therefore we need to add a channel.

The .pt files are nothing else than a .pickle file in disguise. The torch.save/torch.load function is essentially a wrapper around the pickle module in Python, which produces serialized files. However, it is convention to use .pt to indicate that the file contains PyTorch tensors.

We have additionally in the solution added functionality for plotting the images together with the labels for inspection. Remember: all good machine learning starts with a good understanding of the data.

data.py
from __future__ import annotations

import matplotlib.pyplot as plt  # only needed for plotting
import torch
from mpl_toolkits.axes_grid1 import ImageGrid  # only needed for plotting

DATA_PATH = "data/corruptmnist"


def corrupt_mnist() -> tuple[torch.utils.data.Dataset, torch.utils.data.Dataset]:
    """Return train and test dataloaders for corrupt MNIST."""
    train_images, train_target = [], []
    for i in range(6):
        train_images.append(torch.load(f"{DATA_PATH}/train_images_{i}.pt"))
        train_target.append(torch.load(f"{DATA_PATH}/train_target_{i}.pt"))
    train_images = torch.cat(train_images)
    train_target = torch.cat(train_target)

    test_images: torch.Tensor = torch.load(f"{DATA_PATH}/test_images.pt")
    test_target: torch.Tensor = torch.load(f"{DATA_PATH}/test_target.pt")

    train_images = train_images.unsqueeze(1).float()
    test_images = test_images.unsqueeze(1).float()
    train_target = train_target.long()
    test_target = test_target.long()

    train_set = torch.utils.data.TensorDataset(train_images, train_target)
    test_set = torch.utils.data.TensorDataset(test_images, test_target)

    return train_set, test_set


def show_image_and_target(images: torch.Tensor, target: torch.Tensor) -> None:
    """Plot images and their labels in a grid."""
    row_col = int(len(images) ** 0.5)
    fig = plt.figure(figsize=(10.0, 10.0))
    grid = ImageGrid(fig, 111, nrows_ncols=(row_col, row_col), axes_pad=0.3)
    for ax, im, label in zip(grid, images, target):
        ax.imshow(im.squeeze(), cmap="gray")
        ax.set_title(f"Label: {label.item()}")
        ax.axis("off")
    plt.show()


if __name__ == "__main__":
    train_set, test_set = corrupt_mnist()
    print(f"Size of training set: {len(train_set)}")
    print(f"Size of test set: {len(test_set)}")
    print(f"Shape of a training point {(train_set[0][0].shape, train_set[0][1].shape)}")
    print(f"Shape of a test point {(test_set[0][0].shape, test_set[0][1].shape)}")
    show_image_and_target(train_set.tensors[0][:25], train_set.tensors[1][:25])

Implement training and evaluation of your model in the main.py script. The main.py script should be able to take additional subcommands indicating if the model is being trained or evaluated. It will look something like this:

python main.py train --lr 1e-4
python main.py evaluate model.pth

which can be implemented in various ways. We provide you with a starting script that uses the typer library to define a command line interface (CLI), which you can learn more about in this module later in the course.

VS Code and command line arguments

If you try to execute the above code in VS Code using the debugger (F5) or the build run functionality in the upper right corner:

you will get an error message saying that you need to select a command to run e.g. main.py either needs the train or evaluate command. This can be fixed by adding a launch.json to a specialized .vscode folder in the root of the project. The launch.json file should look something like this:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Train",
            "type": "debugpy",
            "request": "launch",
            "program": "${file}",
            "args": [
                "train",
                "--lr",
                "1e-4"
            ],
            "console": "integratedTerminal",
            "justMyCode": true
        }
    ]
}

This will inform VS Code that when we execute the current file (in this case main.py), we want to run it with the train command and additionally pass the --lr argument with the value 1e-4. You can read more about creating a launch.json file here. If you want to have multiple configurations you can add them to the configurations list as additional dictionaries.

Starting point for main.py

main.py
import torch
import typer
from data_solution import corrupt_mnist
from model import MyAwesomeModel

app = typer.Typer()


@app.command()
def train(lr: float = 1e-3) -> None:
    """Train a model on MNIST."""
    print("Training day and night")
    print(lr)

    # TODO: Implement training loop here
    model = MyAwesomeModel()
    train_set, _ = corrupt_mnist()


@app.command()
def evaluate(model_checkpoint: str) -> None:
    """Evaluate a trained model."""
    print("Evaluating like my life depends on it")
    print(model_checkpoint)

    # TODO: Implement evaluation logic here
    model = torch.load(model_checkpoint)
    _, test_set = corrupt_mnist()


if __name__ == "__main__":
    app()

Solution

The solution implements a simple training loop and evaluation loop. Furthermore, we have added additional hyperparameters that can be passed to the training loop. Highlighted in the solution are the different lines where we ensure that our model and data are moved to the GPU (or Apple MPS accelerator if you have a newer Mac) if available.

main.py
import matplotlib.pyplot as plt
import torch
import typer
from data_solution import corrupt_mnist
from model_solution import MyAwesomeModel

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")

app = typer.Typer()


@app.command()
def train(lr: float = 1e-3, batch_size: int = 32, epochs: int = 10) -> None:
    """Train a model on MNIST."""
    print("Training day and night")
    print(f"{lr=}, {batch_size=}, {epochs=}")

    model = MyAwesomeModel().to(DEVICE)
    train_set, _ = corrupt_mnist()

    train_dataloader = torch.utils.data.DataLoader(train_set, batch_size=batch_size)

    loss_fn = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)

    statistics = {"train_loss": [], "train_accuracy": []}
    for epoch in range(epochs):
        model.train()
        for i, (img, target) in enumerate(train_dataloader):
            img, target = img.to(DEVICE), target.to(DEVICE)
            optimizer.zero_grad()
            y_pred = model(img)
            loss = loss_fn(y_pred, target)
            loss.backward()
            optimizer.step()
            statistics["train_loss"].append(loss.item())

            accuracy = (y_pred.argmax(dim=1) == target).float().mean().item()
            statistics["train_accuracy"].append(accuracy)

            if i % 100 == 0:
                print(f"Epoch {epoch}, iter {i}, loss: {loss.item()}")

    print("Training complete")
    torch.save(model.state_dict(), "model.pth")
    fig, axs = plt.subplots(1, 2, figsize=(15, 5))
    axs[0].plot(statistics["train_loss"])
    axs[0].set_title("Train loss")
    axs[1].plot(statistics["train_accuracy"])
    axs[1].set_title("Train accuracy")
    fig.savefig("training_statistics.png")


@app.command()
def evaluate(model_checkpoint: str) -> None:
    """Evaluate a trained model."""
    print("Evaluating like my life depended on it")
    print(model_checkpoint)

    model = MyAwesomeModel().to(DEVICE)
    model.load_state_dict(torch.load(model_checkpoint))

    _, test_set = corrupt_mnist()
    test_dataloader = torch.utils.data.DataLoader(test_set, batch_size=32)

    model.eval()
    correct, total = 0, 0
    for img, target in test_dataloader:
        img, target = img.to(DEVICE), target.to(DEVICE)
        y_pred = model(img)
        correct += (y_pred.argmax(dim=1) == target).float().sum().item()
        total += target.size(0)
    print(f"Test accuracy: {correct / total}")


if __name__ == "__main__":
    app()

As documentation that your model is working when running the train command, the script needs to produce a single plot with the training curve (training step vs. training loss). When the evaluate command is run, it should write the test set accuracy to the terminal.

It is part of the exercise not to implement this in notebooks, as code development in real life happens in scripts. As the model is simple to run (for now), you should be able to complete the exercise on your laptop, even if you are only training on CPU. That said, you are allowed to upload your scripts to your own "Google Drive" and then you can call your scripts from a Google Colab notebook, which is shown in the image below where all code is placed in the fashion_trainer.py script and the Colab notebook is just used to execute it.

Be sure to have completed the final exercise before the next session, as we will be building on top of the model you have created.