4.3. Neural Network Programming¶
To implement an AI model, a machine learning framework often takes a neural-network-centric programming interface. Regardless of their structures, neural networks are comprised of three elements: (1) Nodes serve as computational units that carry out the processing of a neural network, (2) Node Weights are variables updated by gradients during the training process, and (3) Node Connections specify how data (for example, activation and gradients) are transmitted within a neural network.
4.3.1. Neural Network Layers¶
In order to simplify the construction of a neural network, many machine learning frameworks utilize a layer-oriented approach. This method organizes nodes, their weights, and their connections into cohesive neural network layers.
To illustrate this, we can examine the use of fully connected layers, a type of neural network layer. A distinguishing characteristic of fully connected layers is that every node in one layer is linked to every node in the succeeding layer. This method facilitates an extensive linear transformation of the feature space. By doing so, data can be transposed from a high-dimensional space to a lower-dimensional one, and conversely.
As shown in Figure ch03/fc_layer_1
, the fully connected
process transforms n data points from the input into an m sized
feature space. This is followed by a further transformation into a p
sized feature space. It’s important to highlight that the quantity of
parameters in a fully connected layer grows substantially — from
n\(\times\)m during the initial transformation to
m\(\times\)p in the subsequent one.
Fully connected layer illustration
Several types of neural network layers are widely used in various
applications, including fully connected, convolution, pooling,
recurrent, attention, batch normalization, and dropout layers. When
dealing with problems related to time series association in sequential
data, recurrent neural layers are commonly employed. However, recurrent
neural layers encounter difficulties with vanishing or exploding
gradients as the sequence length increases during the training process.
The Long Short-Term Memory (LSTM) model was developed as a solution to
this problem, enabling the capturing of long-term dependencies in
sequential data. Code ch02/code2.3.1
shows some examples of NN
Layers in Pytorch:
ch02/code2.3.1
fc_layer = nn.Linear(16, 5) # A fully connected layer with 16 input features and 5 output features
relu_layer = nn.ReLU() # A ReLU activation layer
conv_layer = nn.Conv2d(3, 16, 3, padding=1) # A convolutional layer with 3 input channels, 16 output channels, and a 3x3 kernel
dropout_layer = nn.Dropout(0.2) # A dropout layer with 20% dropout rate
batch_norm_layer = nn.BatchNorm2d(16) # A batch normalization layer with 16 channels
layers = nn.Sequential(conv_layer, batch_norm_layer, relu_layer, fc_layer, dropout_layer) # A sequential container to combine layers
In tasks related to natural language processing, the Sequence-to-Sequence (Seq2Seq) architecture applies recurrent neural layers in an encoder-decoder framework. Often, the decoder component of Seq2Seq integrates the attention mechanism, allowing the model to concentrate on pertinent segments of the input sequence. This amalgamation contributed to the inception of the Transformer model, a pivotal element in the architecture of the Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformers (GPT) models. Both BERT and GPT have propelled significant progress in diverse language-related tasks.
4.3.2. Neural Network Implementation¶
With an increase in the number of network layers, the manual management
of training variables becomes progressively complex. Thankfully, most
machine learning frameworks provide user-friendly APIs that encapsulate
neural network layers into a base class, which is then inherited by all
other layers. Notable examples include mindspore.nn.Cell
in
MindSpore and torch.nn.Module
in PyTorch. Code ch02/code2.3.2
gives a MLP Implementation using Pytorch.
ch02/code2.3.2
class MLP(nn.Module):
def __init__(self, input_size, hidden_size, num_classes, dropout_rate=0.5):
super(MLP, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.bn1 = nn.BatchNorm1d(hidden_size)
self.relu = nn.ReLU()
self.dropout = nn.Dropout(dropout_rate)
self.fc2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
out = self.fc1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.dropout(out)
out = self.fc2(out)
return out
Figure ch03/model_build
demonstrates the intricate process
of constructing a neural network. The base class plays a pivotal role in
initializing training parameters, managing their status, and outlining
the computation process. Conversely, the neural network model implements
functions to administer the network layers and their associated
parameters. Both MindSpore’s Cell and PyTorch’s Module efficiently serve
these functions. Notably, Cell and Module function not just as model
abstraction methods but also as base classes for all networks.
Existing model abstraction strategies can be divided into two categories. The first involves the abstraction of two methods: Layer (which oversees parameter construction and forward computation for an individual neural network layer) and Model (which manages the connection, combination of neural network layers, and administration of layer parameters). The second category combines Layer and Model into a single method, representing both an individual neural network layer and a model composed of multiple layers. Cell and Module implementations fall into this second category.
Comprehensive neural network construction process
Figure ch03/cell_abstract
portrays a universal method for
designing the abstraction of a neural network layer. The constructor
uses the OrderedDict
class from the Python collections
module to
store initialized neural network layers and their corresponding
parameters. This results in an ordered output, which is more compatible
with stacked deep learning models compared to an unordered Dict
. The
management of neural network layers and parameters is conducted within
the __setattr__
method. Upon detecting that an attribute pertains to
a neural network layer or represents a layer parameter, __setattr__
records the attribute appropriately.
In the neural network model, the computation process is vital. This
process is defined by reloading the __call__
method during the
implementation of neural network layers. To acquire the training
parameters, the base class traverses all network layers. All retrieved
training parameters are then conveyed to the optimizer through the
assigned interface that returns such parameters. This text, however,
only touches on a few significant methods.
Concerning custom methods, it is often required to implement techniques for inserting/deleting parameters, adding/removing neural network layers, and retrieving neural network model information.
Abstraction technique of neural network base classes
In order to preserve simplicity, we provide a condensed overview of the
base class implementation for neural network interface layers. In
practical applications, users are typically unable to directly reload
the __call__
method responsible for computation. Instead, an
operation method is usually defined outside of __call__
, which users
can invoke to utilize __call__
.