This exercise introduces the tensor data structure of MindSpore. By performing a series of operations on tensors, you can understand the basic syntax of MindSpore.
MindSpore 1.2 or later is recommended. The exercise can be performed on a PC or by logging in to HUAWEI CLOUD and purchasing the ModelArts service.
Tensor is a basic data structure in the MindSpore network computing. For details about data types in tensors, see dtype
.
Tensors of different dimensions represent different data. For example, a 0-dimensional tensor represents a scalar, a 1-dimensional tensor represents a vector, a 2-dimensional tensor represents a matrix, and a 3-dimensional tensor may represent the three channels of RGB images.
MindSpore tensors support different data types, including int8, int16, int32, int64, uint8, uint16, uint32, uint64, float16, float32, float64 and bool, which correspond to the data types of NumPy.
In the computation process of MindSpore, the int data type in Python is converted into the defined int64 type, and the float data type is converted into the defined float32 type.
During tensor construction, the tensor, float, int, Boolean, tuple, list, and NumPy.array types can be input. The tuple and list can store only data of the float, int, and Boolean types.
The data type can be specified during tensor initialization. However, if the data type is not specified, the initial int, float, and bool values respectively generate 0-dimensional tensors with mindspore.int32, mindspore.float32 and mindspore.bool_ data types. The data types of the 1-dimensional tensors generated by the initial values tuple and list correspond to those of tensors stored in the tuple and list. If multiple types of data are contained, the MindSpore data type corresponding to the type with the highest priority is selected (Boolean < int < float). If the initial value is Tensor, the data type is tensor. If the initial value is NumPy.array, the generated tensor data type corresponds to NumPy.array.
Code:
# Import MindSpore.
import mindspore
# The cell outputs multiple lines at the same time.
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import numpy as np
from mindspore import Tensor
from mindspore import dtype
# Use an array to create a tensor.
x = Tensor(np.array([[1, 2], [3, 4]]), dtype.int32)
x
Output:
Tensor(shape=[2, 2], dtype=Int32, value=
[[1, 2],
[3, 4]])
Code:
# Use numbers to create tensors.
y = Tensor(1.0, dtype.int32)
z = Tensor(2, dtype.int32)
print(y)
print(z)
Output:
Tensor(shape=[], dtype=Int32, value= 1)
Tensor(shape=[], dtype=Int32, value= 2)
Code:
# Use Boolean to create a tensor.
m = Tensor(True, dtype.bool_)
m
Output:
Tensor(shape=[], dtype=Bool, value= True)
Code:
# Use a tuple to create a tensor.
n = Tensor((1, 2, 3), dtype.int16)
n
Output:
Tensor(shape=[3], dtype=Int16, value= [1, 2, 3])
Code:
# Use a list to create a tensor.
p = Tensor([4.0, 5.0, 6.0], dtype.float64)
p
Output:
Tensor(shape=[3], dtype=Float64, value= [4.00000000e+000, 5.00000000e+000, 6.00000000e+000]
Code:
from mindspore import ops
oneslike = ops.OnesLike()
x = Tensor(np.array([[0, 1], [2, 1]]).astype(np.int32))
output = oneslike(x)
output
Output:
Tensor(shape=[2, 2], dtype=Int32, value=
[[1, 1],
[1, 1]])
Code:
from mindspore.ops import operations as ops
shape = (2, 2)
ones = ops.Ones()
output = ones(shape,dtype.float32)
print(output)
zeros = ops.Zeros()
output = zeros(shape, dtype.float32)
print(output)
Output:
[[1. 1.]
[1. 1.]]
[[0. 0.]
[0. 0.]]
Tensor attributes include shape and data type (dtype).
Code:
x = Tensor(np.array([[1, 2], [3, 4]]), dtype.int32)
x.shape # Shape
x.dtype # Data type
x.ndim # Dimension
x.size # Size
Output:
(2, 2)
mindspore.int32
2
4
asnumpy()
: converts a tensor to an array of NumPy.
Code:
y = Tensor(np.array([[True, True], [False, False]]), dtype.bool_)
# Convert the tensor data type to NumPy.
y_array = y.asnumpy()
y
y_array
Output:
Tensor(shape=[2, 2], dtype=Bool, value=
[[ True, True],
[False, False]])
array([[ True, True],
[False, False]])
There are many operations between tensors, including arithmetic, linear algebra, matrix processing (transposing, indexing, and slicing), and sampling. The following describes several operations. The usage of tensor computation is similar to that of NumPy.
Code:
tensor = Tensor(np.array([[0, 1], [2, 3]]).astype(np.float32))
print("First row: {}".format(tensor[0]))
print("First column: {}".format(tensor[:, 0]))
print("Last column: {}".format(tensor[..., -1]))
Output:
First row: [0. 1.]
First column: [0. 2.]
Last column: [1. 3.]
Code:
data1 = Tensor(np.array([[0, 1], [2, 3]]).astype(np.float32))
data2 = Tensor(np.array([[4, 5], [6, 7]]).astype(np.float32))
op = ops.Stack()
output = op([data1, data2])
print(output)
Output:
[[[0. 1.]
[2. 3.]]
[[4. 5.]
[6. 7.]]]
Code:
zeros = ops.Zeros()
output = zeros((2,2), dtype.float32)
print("output: {}".format(type(output)))
n_output = output.asnumpy()
print("n_output: {}".format(type(n_output)))
Output:
output: <class 'mindspore.common.tensor.Tensor'>
n_output: <class 'numpy.ndarray'>
MindSpore.dataset provides APIs to load and process datasets, such as MNIST, CIFAR-10, CIFAR-100, VOC, ImageNet, and CelebA.
You are advised to download the MNIST dataset from http://yann.lecun.com/exdb/mnist/ and save the training and test files to the MNIST folder.
Code:
import os
import mindspore.dataset as ds
import matplotlib.pyplot as plt
dataset_dir = "./MNIST/train" # Path of the dataset
# Read three images from the MNIST dataset.
mnist_dataset = ds.MnistDataset(dataset_dir=dataset_dir, num_samples=3)
# View the images and set the image sizes.
plt.figure(figsize=(8,8))
i = 1
# Print three subgraphs.
for dic in mnist_dataset.create_dict_iterator():
plt.subplot(3,3,i)
plt.imshow(dic['image'].asnumpy())
plt.axis('off')
i +=1
plt.show()
Output:
For datasets that cannot be directly loaded by MindSpore, you can build a custom dataset class and use the GeneratorDataset API to customize data loading.
Code:
import numpy as np
np.random.seed(58)
class DatasetGenerator:
# When a dataset object is instantiated, the __init__ function is called. You can perform operations such as data initialization.
def __init__(self):
self.data = np.random.sample((5, 2))
self.label = np.random.sample((5, 1))
# Define the __getitem__ function of the dataset class to support random access and obtain and return data in the dataset based on the specified index value.
def __getitem__(self, index):
return self.data[index], self.label[index]
# Define the __len__ function of the dataset class and return the number of samples in the dataset.
def __len__(self):
return len(self.data)
# After the dataset class is defined, the GeneratorDataset API can be used to load and access dataset samples in the user-defined mode.
dataset_generator = DatasetGenerator()
dataset = ds.GeneratorDataset(dataset_generator, ["data", "label"], shuffle=False)
# Use the create_dict_iterator method to obtain data.
for data in dataset.create_dict_iterator():
print('{}'.format(data["data"]), '{}'.format(data["label"]))
Output:
[0.36510558 0.45120592] [0.78888122]
[0.49606035 0.07562207] [0.38068183]
[0.57176158 0.28963401] [0.16271622]
[0.30880446 0.37487617] [0.54738768]
[0.81585667 0.96883469] [0.77994068]
The dataset APIs provided by MindSpore support data processing methods, such as shuffle and batch. You only need to call the corresponding function API to quickly process data.
In the following example, the datasets are shuffled, and then two samples form a batch.
Code:
ds.config.set_seed(58)
# Shuffle the data sequence. buffer_size indicates the size of the shuffled buffer in the dataset.
dataset = dataset.shuffle(buffer_size=10)
# Divide the dataset into batches. batch_size indicates the number of data records contained in each batch. Set this parameter to 2.
dataset = dataset.batch(batch_size=2)
for data in dataset.create_dict_iterator():
print("data: {}".format(data["data"]))
print("label: {}".format(data["label"]))
Output:
data: [[0.36510558 0.45120592]
[0.57176158 0.28963401]]
label: [[0.78888122]
[0.16271622]]
data: [[0.30880446 0.37487617]
[0.49606035 0.07562207]]
label: [[0.54738768]
[0.38068183]]
data: [[0.81585667 0.96883469]]
label: [[0.77994068]]
Code:
import matplotlib.pyplot as plt
from mindspore.dataset.vision import Inter
import mindspore.dataset.vision.c_transforms as c_vision
DATA_DIR = './MNIST/train'
# Obtain six samples.
mnist_dataset = ds.MnistDataset(DATA_DIR, num_samples=6, shuffle=False)
# View the original image data.
mnist_it = mnist_dataset.create_dict_iterator()
data = next(mnist_it)
plt.imshow(data['image'].asnumpy().squeeze(), cmap=plt.cm.gray)
plt.title(data['label'].asnumpy(), fontsize=20)
plt.show()
Output:
Code:
resize_op = c_vision.Resize(size=(40,40), interpolation=Inter.LINEAR)
crop_op = c_vision.RandomCrop(28)
transforms_list = [resize_op, crop_op]
mnist_dataset = mnist_dataset.map(operations=transforms_list, input_columns=["image"])
mnist_dataset = mnist_dataset.create_dict_iterator()
data = next(mnist_dataset)
plt.imshow(data['image'].asnumpy().squeeze(), cmap=plt.cm.gray)
plt.title(data['label'].asnumpy(), fontsize=20)
plt.show()
Output:
Effect after data argumentation
MindSpore encapsulates APIs for building network layers in the nn module. Different types of neural network layers are built by calling these APIs.
in_channels
: input channelout_channels
: output channelweight_init
: weight initialization. Default: ‘normal’.Code:
import mindspore as ms
import mindspore.nn as nn
from mindspore import Tensor
import numpy as np
# Construct the input tensor.
input_a = Tensor(np.array([[1, 1, 1], [2, 2, 2]]), ms.float32)
print(input_a)
# Construct a fully-connected network. Set both in_channels and out_channels to 3.
net = nn.Dense(in_channels=3, out_channels=3, weight_init=1)
output = net(input_a)
print(output)
Output:
[[1. 1. 1.]
[2. 2. 2.]]
[[3. 3. 3.]
[6. 6. 6.]]
Code:
conv2d = nn.Conv2d(1, 6, 5, has_bias=False, weight_init='normal', pad_mode='valid')
input_x = Tensor(np.ones([1, 1, 32, 32]), ms.float32)
print(conv2d(input_x).shape)
Output:
(1, 6, 28, 28)
Build a ReLU layer.
Code:
relu = nn.ReLU()
input_x = Tensor(np.array([-1, 2, -3, 2, -1]), ms.float16)
output = relu(input_x)
print(output)
Output:
[0. 2. 0. 2. 0.]
Code:
max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
input_x = Tensor(np.ones([1, 6, 28, 28]), ms.float32)
print(max_pool2d(input_x).shape)
Output:
(1, 6, 14, 14)
Code:
flatten = nn.Flatten()
input_x = Tensor(np.ones([1, 16, 5, 5]), ms.float32)
output = flatten(input_x)
print(output.shape)
Output:
(1, 400)
The Cell class of MindSpore is the base class for building all networks and the basic unit of a network. When a neural network is required, you need to inherit the Cell class and overwrite the init and construct methods.
Code:
class LeNet5(nn.Cell):
"""
LeNet network structure
"""
def __init__(self, num_class=10, num_channel=1):
super(LeNet5, self).__init__()
# Define the required operation.
self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')
self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')
self.fc1 = nn.Dense(16 * 4 * 4, 120)
self.fc2 = nn.Dense(120, 84)
self.fc3 = nn.Dense(84, num_class)
self.relu = nn.ReLU()
self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
self.flatten = nn.Flatten()
def construct(self, x):
# Use the defined operation to build a forward network.
x = self.conv1(x)
x = self.relu(x)
x = self.max_pool2d(x)
x = self.conv2(x)
x = self.relu(x)
x = self.max_pool2d(x)
x = self.flatten(x)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.relu(x)
x = self.fc3(x)
return x
# Instantiate the model and use the parameters_and_names method to view the model parameters.
modelle = LeNet5()
for m in modelle.parameters_and_names():
print(m)
Output:
('conv1.weight', Parameter (name=conv1.weight, shape=(6, 1, 5, 5), dtype=Float32, requires_grad=True))
('conv2.weight', Parameter (name=conv2.weight, shape=(16, 6, 5, 5), dtype=Float32, requires_grad=True))
('fc1.weight', Parameter (name=fc1.weight, shape=(120, 400), dtype=Float32, requires_grad=True))
('fc1.bias', Parameter (name=fc1.bias, shape=(120,), dtype=Float32, requires_grad=True))
('fc2.weight', Parameter (name=fc2.weight, shape=(84, 120), dtype=Float32, requires_grad=True))
('fc2.bias', Parameter (name=fc2.bias, shape=(84,), dtype=Float32, requires_grad=True))
('fc3.weight', Parameter (name=fc3.weight, shape=(10, 84), dtype=Float32, requires_grad=True))
('fc3.bias', Parameter (name=fc3.bias, shape=(10,), dtype=Float32, requires_grad=True))
A loss function is used to validate the difference between the predicted and actual values of a model. Here, the absolute error loss function L1Loss is used. mindspore.nn.loss also provides many other loss functions, such as SoftmaxCrossEntropyWithLogits, MSELoss, and SmoothL1Loss.
The output value and target value are provided to compute the loss value. The method is as follows:
Code:
import numpy as np
import mindspore.nn as nn
from mindspore import Tensor
import mindspore.dataset as ds
import mindspore as ms
loss = nn.L1Loss()
output_data = Tensor(np.array([[1, 2, 3], [2, 3, 4]]).astype(np.float32))
target_data = Tensor(np.array([[0, 2, 5], [3, 1, 1]]).astype(np.float32))
print(loss(output_data, target_data))
Output:
1.5
Common deep learning optimization algorithms include SGD, Adam, Ftrl, lazyadam, Momentum, RMSprop, Lars, Proximal_ada_grad, and lamb.
mindspore.nn.Momentum
Code:
optim = nn.Momentum(params=modelle.trainable_params(), learning_rate=0.1, momentum=0.9, weight_decay=0.0)
mindspore.Model(network, loss_fn, optimizer, metrics)
Code:
from mindspore import Model
# Define a neural network.
net = LeNet5()
# Define the loss function.
loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
# Define the optimizer.
optim = nn.Momentum(params=net.trainable_params(), learning_rate=0.1, momentum=0.9)
# Build a model.
model = Model(network = net, loss_fn=loss, optimizer=optim, metrics={'accuracy'})
Train the model.
Code:
import mindspore.dataset.transforms.c_transforms as C
import mindspore.dataset.vision.c_transforms as CV
from mindspore.train.callback import LossMonitor
DATA_DIR = './MNIST/train'
mnist_dataset = ds.MnistDataset(DATA_DIR)
resize_op = CV.Resize((28,28))
rescale_op = CV.Rescale(1/255,0)
hwc2chw_op = CV.HWC2CHW()
mnist_dataset = mnist_dataset .map(input_columns="image", operations=[rescale_op,resize_op, hwc2chw_op])
mnist_dataset = mnist_dataset .map(input_columns="label", operations=C.TypeCast(ms.int32))
mnist_dataset = mnist_dataset.batch(32)
loss_cb = LossMonitor(per_print_times=1000)
# dataset is an input parameter, which indicates the training set, and epoch indicates the number of training epochs of the training set.
model.train(epoch=1, train_dataset=mnist_dataset,callbacks=[loss_cb])
Code:
# dataset is an input parameter, which indicates the validation set.
DATA_DIR = './forward_mnist/MNIST/test'
dataset = ds.MnistDataset(DATA_DIR)
resize_op = CV.Resize((28,28))
rescale_op = CV.Rescale(1/255,0)
hwc2chw_op = CV.HWC2CHW()
dataset = dataset .map(input_columns="image", operations=[rescale_op,resize_op, hwc2chw_op])
dataset = dataset .map(input_columns="label", operations=C.TypeCast(ms.int32))
dataset = dataset.batch(32)
model.eval(valid_dataset=dataset)
Backward propagation is the commonly used algorithm for training neural networks. In this algorithm, parameters (model weights) are adjusted based on a gradient of a loss function for a given parameter.
The first-order derivative method of MindSpore is mindspore.ops.GradOperation (get_all=False, get_by_list=False, sens_param=False). When get_all is set to False, the first input derivative is computed. When get_all is set to True, all input derivatives are computed. When get_by_list is set to False, weight derivatives are not computed. When get_by_list is set to True, weight derivatives are computed. sens_param scales the output value of the network to change the final gradient.
The following uses the MatMul operator derivative for in-depth analysis.
To compute the input derivative, you need to define a network requiring a derivative. The following uses a network f(x,y)=z∗x∗y formed by the MatMul operator as an example.
Code:
import numpy as np
import mindspore.nn as nn
import mindspore.ops as ops
from mindspore import Tensor
from mindspore import ParameterTuple, Parameter
from mindspore import dtype as mstype
class Net(nn.Cell):
def __init__(self):
super(Net, self).__init__()
self.matmul = ops.MatMul()
self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z')
def construct(self, x, y):
x = x * self.z
out = self.matmul(x, y)
return out
class GradNetWrtX(nn.Cell):
def __init__(self, net):
super(GradNetWrtX, self).__init__()
self.net = net
self.grad_op = ops.GradOperation()
def construct(self, x, y):
gradient_function = self.grad_op(self.net)
return gradient_function(x, y)
x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)
y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)
output = GradNetWrtX(Net())(x, y)
print(output)
Output:
[[4.5099998 2.7 3.6000001]
[4.5099998 2.7 3.6000001]]
To compute weight derivatives, you need to set get_by_list in ops.GradOperation to True. If computation of certain weight derivatives is not required, set requirements_grad to False when defining the network requiring derivatives.
Code:
class GradNetWrtX(nn.Cell):
def __init__(self, net):
super(GradNetWrtX, self).__init__()
self.net = net
self.params = ParameterTuple(net.trainable_params())
self.grad_op = ops.GradOperation(get_by_list=True)
def construct(self, x, y):
gradient_function = self.grad_op(self.net, self.params)
return gradient_function(x, y)
output = GradNetWrtX(Net())(x, y)
print(output)
Output:
(Tensor(shape=[1], dtype=Float32, value= [ 2.15359993e+01]),)
This exercise implements the MNIST handwritten character recognition, which is a typical case in the deep learning field. The whole process is as follows:
Before you start, check whether MindSpore has been correctly installed. You are advised to install MindSpore 1.1.1 or later on your computer by referring to the MindSpore official website https://www.mindspore.cn/install/.
In addition, you shall have basic mathematical knowledge such as Python coding basics, probability, and matrix.
The MNIST dataset used in this example consists of 10 classes of 28 x 28 pixels grayscale images. It has a training set of 60,000 examples, and a test set of 10,000 examples.
Download the MNIST dataset at http://yann.lecun.com/exdb/mnist/. Four dataset download links are provided. The first two links are for downloading test data files, and the last two links are for downloading training data files.
Download and decompress the files, and store them in the workspace directories ./MNIST /train and ./MNIST /test.
The directory structure is as follows:
└─MNIST
├─ test
│ t10k-images.idx3-ubyte
│ t10k-labels.idx1-ubyte
│
└─ train
train-images.idx3-ubyte
train-labels.idx1-ubyte
Currently, the os library is required. Other required libraries will not be described here. For details about the MindSpore modules, see the MindSpore API page. You can use context.set_context to configure the information required for running, such as the running mode, backend information, and hardware information.
Code:
# Import related dependent libraries.
import os
from matplotlib import pyplot as plt
import numpy as np
import mindspore as ms
import mindspore.context as context
import mindspore.dataset as ds
import mindspore.dataset.transforms.c_transforms as C
import mindspore.dataset.vision.c_transforms as CV
from mindspore.nn.metrics import Accuracy
from mindspore import nn
from mindspore.train import Model
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
context.set_context(mode=context.GRAPH_MODE, device_target='CPU')
The graph mode is used in this exercise. You can configure hardware information as required. For example, if the code runs on the Ascend AI processor, set device_target to Ascend. This rule also applies to the code running on the CPU and GPU. For details about parameters, see the API description of context.set_context.
Use the data reading function of MindSpore to read the MNIST dataset and view the data volume and sample information of the training set and test set.
Code:
DATA_DIR_TRAIN = "MNIST/train" # Training set information
DATA_DIR_TEST = "MNIST/test" # Test set information
# Read data.
ds_train = ds.MnistDataset(DATA_DIR_TRAIN)
ds_test = ds.MnistDataset(DATA_DIR_TEST )
# Display the dataset features.
print('Data volume of the training dataset:',ds_train.get_dataset_size())
print(' Data volume of the test dataset:',ds_test.get_dataset_size())
image=ds_train.create_dict_iterator().__next__()
print('Image length/width/channels:',image['image'].shape)
print('Image label style:',image['label']) # Total 10 label classes which are represented by numbers from 0 to 9.
Datasets are crucial for training. A good dataset can effectively improve training accuracy and efficiency. Generally, before loading a dataset, you need to perform some operations on the dataset.
Code:
def create_dataset(training=True, batch_size=128, resize=(28, 28),
rescale=1/255, shift=0, buffer_size=64):
ds = ms.dataset.MnistDataset(DATA_DIR_TRAIN if training else DATA_DIR_TEST)
# Define the resizing, normalization, and channel conversion of the map operation.
resize_op = CV.Resize(resize)
rescale_op = CV.Rescale(rescale,shift)
hwc2chw_op = CV.HWC2CHW()
# Perform the map operation on the dataset.
ds = ds.map(input_columns="image", operations=[rescale_op,resize_op, hwc2chw_op])
ds = ds.map(input_columns="label", operations=C.TypeCast(ms.int32))
# Set the shuffle parameter and batch size.
ds = ds.shuffle(buffer_size=buffer_size)
ds = ds.batch(batch_size, drop_remainder=True)
return ds
In the preceding information, batch_size indicates the number of data records in each batch. Assume that each batch contains 32 data records. Modify the image size, normalization, and image channel, and then modify the data type of the label. Perform the shuffle operation, set batch_size, and set drop_remainder to True. In this case, data that cannot form a batch in the dataset will be discarded.
MindSpore supports multiple data processing and argumentation operations, which are usually used together. For details, see Data Processing and Data Argumentation.
Read the first 10 samples and visualize the samples to determine whether the samples are real datasets.
Code:
# Display the first 10 images and the labels, and check whether the images are correctly labeled.
ds = create_dataset(training=False)
data = ds.create_dict_iterator().__next__()
images = data['image'].asnumpy()
labels = data['label'].asnumpy()
plt.figure(figsize=(15,5))
for i in range(1,11):
plt.subplot(2, 5, i)
plt.imshow(np.squeeze(images[i]))
plt.title('Number: %s' % labels[i])
plt.xticks([])
plt.show()
Output:
Sample visualization
We define a simple fully-connected network to implement image recognition. The network has only three layers:
To use MindSpore for neural network definition, inherit mindspore.nn.cell.Cell. Cell is the base class of all neural networks (such as Conv2d).
Define each layer of a neural network in the init method in advance, and then define the construct method to complete the forward construction of the neural network. The network layers are defined as follows:
Code:
# Create a deep neural network (DNN) model. The model consists of three fully-connected layers. The final output layer uses softmax for classification (10 classes represented by numbers from 0 to 9)
class ForwardNN(nn.Cell):
def __init__(self):
super(ForwardNN, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Dense(784, 512, activation='relu')
self.fc2 = nn.Dense(512, 128, activation='relu')
self.fc3 = nn.Dense(128, 10, activation=None)
def construct(self, input_x):
output = self.flatten(input_x)
output = self.fc1(output)
output = self.fc2(output)
output = self.fc3(output)
return output
A loss function is also called an objective function and is used to measure the difference between a predicted value and an actual value. Deep learning reduces the loss value by continuous iteration. Defining a good loss function can effectively improve the model performance.
An optimizer is used to minimize the loss function, improving the model during training.
After the loss function is defined, the weight-related gradient of the loss function can be obtained. The gradient is used to indicate the weight optimization direction for the optimizer, improving model performance. Loss functions supported by MindSpore include SoftmaxCrossEntropyWithLogits, L1Loss, and MSELoss.
SoftmaxCrossEntropyWithLogits is used in this example.
MindSpore provides the callback mechanism to execute custom logic during training. The following uses ModelCheckpoint provided by the framework as an example. ModelCheckpoint can save the network model and parameters for subsequent fine-tuning.
Code:
# Create a network, a loss function, validation metric, and optimizer, and set related hyperparameters.
lr = 0.001
num_epoch = 10
momentum = 0.9
net = ForwardNN()
loss = nn.loss.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
metrics={"Accuracy": Accuracy()}
opt = nn.Adam(net.trainable_params(), lr)
Start training.
The training process refers to a process in which training dataset is transferred to a network for training and optimizing network parameters. In the MindSpore framework, the .train method is used to complete this process.
Code:
# Build a model.
model = Model(net, loss, opt, metrics)
config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)
ckpoint_cb = ModelCheckpoint(prefix="checkpoint_net",directory = "./ckpt" ,config=config_ck)
# Generate a dataset.
ds_eval = create_dataset(False, batch_size=32)
ds_train = create_dataset(batch_size=32)
# Train the model.
loss_cb = LossMonitor(per_print_times=1875)
time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())
print("============== Starting Training ==============")
model.train(num_epoch, ds_train,callbacks=[ckpoint_cb,loss_cb,time_cb ],dataset_sink_mode=False)
Loss values are displayed during training, as shown in the following. Although loss values may fluctuate, they gradually decrease and the accuracy gradually increases in general. Loss values displayed each time may be different because of their randomicity. The following is an example of loss values output during training:
============== Starting Training ==============
epoch: 1 step: 1875, loss is 0.06333521
epoch time: 18669.680 ms, per step time: 9.957 ms
epoch: 2 step: 1875, loss is 0.07061358
epoch time: 21463.662 ms, per step time: 11.447 ms
epoch: 3 step: 1875, loss is 0.043515638
epoch time: 25836.919 ms, per step time: 13.780 ms
epoch: 4 step: 1875, loss is 0.03468642
epoch time: 25553.150 ms, per step time: 13.628 ms
epoch: 5 step: 1875, loss is 0.03934026
epoch time: 27364.246 ms, per step time: 14.594 ms
epoch: 6 step: 1875, loss is 0.0023852987
epoch time: 31432.281 ms, per step time: 16.764 ms
epoch: 7 step: 1875, loss is 0.010915326
epoch time: 33697.183 ms, per step time: 17.972 ms
epoch: 8 step: 1875, loss is 0.011417691
epoch time: 29594.438 ms, per step time: 15.784 ms
epoch: 9 step: 1875, loss is 0.00044568744
epoch time: 28676.948 ms, per step time: 15.294 ms
epoch: 10 step: 1875, loss is 0.071476705
epoch time: 34999.863 ms, per step time: 18.667 ms
In this step, the original test set is used to validate the model.
Code:
# Use the test set to validate the model and print the overall accuracy.
metrics=model.eval(ds_eval)
print(metrics)
Output:
{'Accuracy': 0.9740584935897436}