Today we will dive in torch.nn module, one of them most important Pytorch’s module to create neural networks and to customise existing code.


Table of contents:

Torch.nn introduces new concept and tries to answer the following questions:

  • Notions
  • How to create a model ?
  • How to deploy a model ?
  • How to inspect a model ?


Notions

Subclass: nn.Parameter

Sum up: nn.Parameter is a subclass of torch.Tensor used when we need to optimize tensors during the gradient descent. It automatically add tensors to the parameters() iterator allowing us to simply define an optimizer.

Example: To illustrate the notion, let us implement a linear layer. To recall, a linear layer modifies the number of channels of an input tensor x applying a single hidden layer. It is defined with a weight matrix A and bias matrix b by the following equation: \(y=x.A^T + b\) Where y has shape $(,H_{out})$ and x has shape $(,H_{in})$.

Thus, to implement the layer, we need to define A and b, i.e to initialize a weight and a bias matrix and to specify that they need to be updated during optimization. A common practice to define an optimizer is to call model.parameters() returning an iterator with all model registered parameters:

# SGD optimizer
# >> use model.parameters()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

This behaviour constraints A and b to be in the model.parameters() iterator. To do it, we must define them using torch.nn.Parameter (!) instead of torch.Tensor. Hence, a simple definition of a linear layer can be:

class SimpleLinearLayer(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.weight=nn.Parameter(torch.zeros(out_features, in_features))
        self.bias=nn.Parameter(torch.ones(out_features))
        return
    
    def forward(self,x):
        return x @ self.weight.T + self.bias

If you look at some PyTorch implementation, it is more frequent to define parameters as empty tensors and then initialize them in a reset_params functions. To keep it more clear, I have hard coded the initialization.

Then, if we want to be sure everything is well initialized, we can inspect parameters using the parameters() iterator:

layer=CustomLayer(in_features, out_features)
for p in layer.parameters():
    print(p)


How to create a model ?

Layers

Defining a model requires to create layers and to add an order between them. To begin with, we need to subclass the Module class. It is the base class for all neural networks modules. Then, we need to define layers in the __init__ method. There are plenty of predifined layers in torch grouped in several categories: convolutions, pooling, transformer, etc. When looking at the source code, many layers are implemented using their counterpart function present in torch.nn.functional. The module and the sub-module works hand in hand.

On a more high-level, we can also grouped layers into two different categories: lazy modules and explicit modules. When defining a model, in Tensorflow, we only have to specify the output shaped and input shapes are inferred automatically. In Torch, it was not the case before lazy modules were implemented.

Personal think:

  • Lazy module: From my point of view, even if it is a nice TensorFlow feature, it is not time consuming to manually calculate input shapes and it ensures we understand how our model works at every steps.

Containers

Once layers are created, we need to order them. The best way to do it depends on the architecture we have. We can use:

  • nn.Sequential: If layers are sequentially executed.
  • nn.ModuleList or nn.ModuleDict: If we want to register layers but define the call logic later.
  • nn.ParameterList or nn.ParameterDict: If instead of layers we work directly with their parameters.

ModuleDict and ParameterDict are really close to their list counterpart but have a better representation. Sequential does not have a dictionary-like version, however if we want to customize its representation, there is a trick using OrderedDict:

# Defining layers
lin_in=nn.Linear(in_features, inter_features)
lin_out=nn.Linear(inter_features, out_features)

# Order them
layers=nn.Sequential(OrderedDict([('lin_in', lin_in),
                                  ('lin_out', lin_out)]))

As already mentioned, ParameterDict is suited when we work directly with parameters. For instance, we can re-write the previous simple linear layer with the following syntax:

class SimpleLinearLayer(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.params=nn.ParameterDict({'weight':nn.Parameter(torch.zeros(out_features, in_features)),
                                     'bias':nn.Parameter(torch.ones(out_features))})
        return
    
    def forward(self,x):
        return x @ self.params['weight'].T + self.params['bias']

Questions:

  • ModuleList vs list of modules: One can wonder why we should use ModuleList instead of a list of layers. One reason is when we define a list of layers, calling model.parameters() does not look inside each item parameters. That is a huge problem because the optimizer is usually defined calling model.parameters() ! It can lead to bad optimization.


How to deploy a model ?

Once the model is created, we need to deploy it on one or some machine. If we have a single GPU, we simply have to export the model to the machine using model.to(). This method modifies the module in-place. If we have multiple GPUs, we can not export the model on a single device, that is why DataParallel layers have been created.


How to inspect a model ?

Hooks

Once the model is created we might want to inspect it to ensure it behaves as expected or to modify its behaviour. That is a typical hook usecase. Hooks are functions that automatically execute after or before a particular event, for instance a forward or a backward call. They can be used for additional prints, catching intermediate results, cliping gradients, applying dropout, etc.

Example: To illustrate hooks, we will print intermediate input shapes before every forward. For this purpose, we can assign a hook function for every submodule. This solution requires changing many code and it is not really flexible. A better option is to recursively apply hooks to each layer of our model using a function (apply) or an iterator (with: named_modules or named_children).

First we create a model containing nested nn.Module dependencies.

# Imports:
import torch
from torch import nn

from collections import OrderedDict

# Classes
class SimpleMLP(nn.Module):
    def __init__(self, in_f, inter_f, out_f):
        super().__init__()
        self.layers=nn.Sequential(
            OrderedDict([
            ('lin_in', nn.Linear(in_f, inter_f)),
            ('lin_out', nn.Linear(inter_f, out_f))])
            )
        
    def forward(self, x):
        return self.layers(x)

Then, we select the type of hook we want and we define the function we should apply:

class VerboseModel(nn.Module):
    def __init__(self, model:nn.Module, mode='named_modules') -> None:
        super().__init__()
        self.model=model 
        
        # Hook before every forward(): register_forward_pre_hook
        match mode:
            case 'named_modules':
                for name, module in model.named_modules():
                    module.__name__=name
                    module.register_forward_pre_hook(inspect_shape_in)
            case 'named_children':
                for name, module in model.named_children():
                    module.__name__=name
                    module.register_forward_pre_hook(inspect_shape_in)

    def forward(self, x):
        return self.model(x)

# Hook
def inspect_shape_in(module, input):
    if hasattr(module, '__name__'):
        name=module.__name__
    else:
        name=''
    print("Model {:15} Input shape {:10}".format(name, 
                                                 str(tuple(input[0].shape)) ))

Then, we apply hook:

  • recursively to every submodule (as returned by .children()) as well as self.
  • to every modules (as returned by .named_modules()).
  • to every immediate children (as returned by .named_children())
if __name__ == '__main__':
    in_f, inter_f, out_f=3,10,1
    x=torch.zeros((2,in_f))
    
    for mode in ['apply', 'named_modules', 'named_children']:
        print('Mode: {:<15}'.format(mode))
        model=SimpleMLP(in_f, inter_f, out_f)
        
        match mode:
            case 'apply':
                # Recursively implicit
                model.apply(lambda m: m.register_forward_pre_hook(inspect_shape_in) )
                model(x)
                
            case 'named_modules'|'named_children':
                # Recursively explicit
                verbose_model=VerboseModel(model, mode)
                verbose_model(x)
        print('\n')

Among the three options, the one using apply() is the less flexible but requires less code. Compared to the two other options, it is harder to know what layer we are looking at since we can not directly get their name.

The difference between named_modules() and named_children() is that the first is more exaustive than the other and allows to have a deep look into every submodules. In our previous example, named_children() looks only into model.layers while named_modules() looks into model, model.layers, model.layers.lin_in, model.layers.lin_out