CNN implementation with numpy
TOC
Layers for CNN
- To implement CNN, newly convolution layer and pooling layer should be defined. With them, feature extraction section can be established, and CNN can train and inference.
Convolution Layer
- Basic CNN targets 3d images(channels, width, height), but like DNN, CNN should support batch operations. Also, there are multiple 2d filters. Therefore, actual target of CNN is 4d images and 3d filters.
- However, It is not easy to write a code for convolution of 4d images and 3d filters.
im2col Transform
- Fortunately, there is a simple method to convolution batch images and multiple filters. The method is usually called im2col in popular machine learning frameworks.
Image 1. Forward operations in Convolution layer for batch image
- With im2col and matrix reshape, convolution is replaced with dot product. After dot product, its result should be reshaped to 3d feature map.
def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
"""
Transform 4 dimensional images to 2 dimensional array.
Parameters
----------
input_data : 4 dimensional input images
(The number of images, The number of channels,
Height, Widht)
filter_h : height of filter
filter_w : width of fiter
stride : the interval of stride
pad : the interval of padding
Returns
-------
col : 2 dimnesional array
"""
N, C, H, W = input_data.shape
out_h = (H + 2 * pad - filter_h) // stride + 1
out_w = (W + 2 * pad - filter_w) // stride + 1
img = np.pad(input_data,
[(0, 0), (0, 0), (pad, pad), (pad, pad)],
'constant')
col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))
for y in range(filter_h):
y_max = y + stride * out_h
for x in range(filter_w):
x_max = x + stride * out_w
col[:, :, y, x, :, :] = \
img[:, :, y:y_max:stride, x:x_max:stride]
col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N * out_h * out_w, -1)
return col
- Furthermore, col2im is required for back propagation. This is implementation of col2im.
def col2im(col, input_shape, filter_h, filter_w, stride=1, pad=0):
"""Inverse of im2col.
Parameters
----------
col : 2 dimensional array
input_shape : the shape of original input images
filter_h : height of filter
filter_w : width of filter
stride : the interval of stride
pad : the interval of padding
Returns
-------
img : images
"""
N, C, H, W = input_shape
out_h = (H + 2 * pad - filter_h) // stride + 1
out_w = (W + 2 * pad - filter_w) // stride + 1
col = col.reshape(N, out_h, out_w, C,
filter_h, filter_w).transpose(0, 3, 4, 5, 1, 2)
img = np.zeros((N, C, H + 2 * pad + stride - 1, W + 2 * pad + stride - 1))
for y in range(filter_h):
y_max = y + stride * out_h
for x in range(filter_w):
x_max = x + stride * out_w
img[:, :, y:y_max:stride, x:x_max:stride] += col[:, :, y, x, :, :]
return img[:, :, pad:H + pad, pad:W + pad]
Forward and Backward
- With im2col and col2im, forward and backward of convolution layer is
class Convolution:
def __init__(self, W, b, stride=1, pad=0):
self.W = W
self.b = b
self.stride = stride
self.pad = pad
self.x = None
self.col = None
self.col_W = None
self.dW = None
self.db = None
def forward(self, x):
FN, C, FH, FW = self.W.shape
N, C, H, W = x.shape
out_h = 1 + int((H + 2*self.pad - FH) / self.stride)
out_w = 1 + int((W + 2*self.pad - FW) / self.stride)
col = im2col(x, FH, FW, self.stride, self.pad)
col_W = self.W.reshape(FN, -1).T
out = np.dot(col, col_W) + self.b
out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)
self.x = x
self.col = col
self.col_W = col_W
return out
def backward(self, dout):
FN, C, FH, FW = self.W.shape
dout = dout.transpose(0,2,3,1).reshape(-1, FN)
self.db = np.sum(dout, axis=0)
self.dW = np.dot(self.col.T, dout)
self.dW = self.dW.transpose(1, 0).reshape(FN, C, FH, FW)
dcol = np.dot(dout, self.col_W.T)
dx = col2im(dcol, self.x.shape, FH, FW, self.stride, self.pad)
return dx
- Also, as Image 1, some reshapes are required.
- The last operation is transpose. Transpose rearrange (N, Height, Width, Channel) to (N, Channel, Height, Width).
Pooling Layer
- Pooling layer is a layer to select a special value in target area.
- Pooling layer uses also im2col for forward propagation and col2im for back propagation. However, it does not require filters, because it can select the specific value with a simple criterion.
Image 2. Forward operations in Pooling layer with max pooling
class Pooling:
def __init__(self, pool_h, pool_w, stride=1, pad=0):
self.pool_h = pool_h
self.pool_w = pool_w
self.stride = stride
self.pad = pad
self.x = None
self.arg_max = None
def forward(self, x):
N, C, H, W = x.shape
out_h = int(1 + (H - self.pool_h) / self.stride)
out_w = int(1 + (W - self.pool_w) / self.stride)
col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
col = col.reshape(-1, self.pool_h*self.pool_w)
arg_max = np.argmax(col, axis=1)
out = np.max(col, axis=1)
out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)
self.x = x
self.arg_max = arg_max
return out
def backward(self, dout):
dout = dout.transpose(0, 2, 3, 1)
pool_size = self.pool_h * self.pool_w
dmax = np.zeros((dout.size, pool_size))
dmax[np.arange(self.arg_max.size), self.arg_max.flatten()] = \
dout.flatten()
dmax = dmax.reshape(dout.shape + (pool_size,))
dcol = dmax.reshape(dmax.shape[0] * dmax.shape[1] * dmax.shape[2], -1)
dx = col2im(dcol, self.x.shape,
self.pool_h, self.pool_w,
self.stride, self.pad)
return dx
Linking of CNN
- The simple CNN is
class SimpleConvNet:
"""
conv - relu - pool - affine - relu - affine - softmax
Parameters
----------
input_size: input data size
conv_param: parameters for convolution layers
hidden_size: size of input for hidden layers
output_size: size of output for output layers
activation: activation function
weight_init_std: initial standard variation of weight
"""
def __init__(self, input_dim=(1, 28, 28),
conv_param={'filter_num': 30,
'filter_size': 5,
'pad': 0,
'stride': 1},
hidden_size=100, output_size=10, weight_init_std=0.01):
filter_num = conv_param['filter_num']
filter_size = conv_param['filter_size']
filter_pad = conv_param['pad']
filter_stride = conv_param['stride']
input_size = input_dim[1]
conv_output_size = \
(input_size - filter_size + 2 * filter_pad) / filter_stride + 1
pool_output_size = \
int(filter_num * (conv_output_size / 2) * (conv_output_size / 2))
# Initialize weights and biases
self.params = {}
self.params['W1'] = weight_init_std * \
np.random.randn(filter_num, input_dim[0],
filter_size, filter_size)
self.params['b1'] = np.zeros(filter_num)
self.params['W2'] = weight_init_std * \
np.random.randn(pool_output_size, hidden_size)
self.params['b2'] = np.zeros(hidden_size)
self.params['W3'] = weight_init_std * \
np.random.randn(hidden_size, output_size)
self.params['b3'] = np.zeros(output_size)
# Create layers
self.layers = OrderedDict()
self.layers['Conv1'] = Convolution(self.params['W1'],
self.params['b1'],
conv_param['stride'],
conv_param['pad'])
self.layers['Relu1'] = Relu()
self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
self.layers['Affine1'] = Affine(self.params['W2'], self.params['b2'])
self.layers['Relu2'] = Relu()
self.layers['Affine2'] = Affine(self.params['W3'], self.params['b3'])
self.last_layer = SoftmaxWithLoss()
def predict(self, x):
for layer in self.layers.values():
x = layer.forward(x)
return x
def loss(self, x, t):
"""
Parameters
----------
x : Input data
t : answer labels
"""
y = self.predict(x)
return self.last_layer.forward(y, t)
def accuracy(self, x, t, batch_size=100):
if t.ndim != 1: t = np.argmax(t, axis=1)
acc = 0.0
for i in range(int(x.shape[0] / batch_size)):
tx = x[i * batch_size:(i + 1) * batch_size]
tt = t[i * batch_size:(i + 1) * batch_size]
y = self.predict(tx)
y = np.argmax(y, axis=1)
acc += np.sum(y == tt)
return acc / x.shape[0]
def gradient(self, x, t):
"""Gradient with back propagation
Parameters
----------
x : Input data
t : Answer labels
Returns
-------
Dictionary having gradients for each layers
"""
# forward
self.loss(x, t)
# backward
dout = 1
dout = self.last_layer.backward(dout)
layers = list(self.layers.values())
layers.reverse()
for layer in layers:
dout = layer.backward(dout)
# Create return dictionary
grads = {}
grads['W1'], grads['b1'] = \
self.layers['Conv1'].dW, self.layers['Conv1'].db
grads['W2'], grads['b2'] = \
self.layers['Affine1'].dW, self.layers['Affine1'].db
grads['W3'], grads['b3'] = \
self.layers['Affine2'].dW, self.layers['Affine2'].db
return grads
Train MNIST Data Set
class Trainer:
"""
Train network
"""
def __init__(self, network, x_train, t_train, x_test, t_test,
epochs=20, mini_batch_size=100,
optimizer='SGD', optimizer_param={'lr': 0.01},
evaluate_sample_num_per_epoch=None, verbose=True):
self.network = network
self.verbose = verbose
self.x_train = x_train
self.t_train = t_train
self.x_test = x_test
self.t_test = t_test
self.epochs = epochs
self.batch_size = mini_batch_size
self.evaluate_sample_num_per_epoch = evaluate_sample_num_per_epoch
# optimzer
optimizer_class_dict = {'adam': Adam, 'sgd':SGD}
self.optimizer = \
optimizer_class_dict[optimizer.lower()](**optimizer_param)
self.train_size = x_train.shape[0]
self.iter_per_epoch = max(self.train_size / mini_batch_size, 1)
self.max_iter = int(epochs * self.iter_per_epoch)
self.current_iter = 0
self.current_epoch = 0
self.train_loss_list = []
self.train_acc_list = []
self.test_acc_list = []
def train_step(self):
batch_mask = np.random.choice(self.train_size, self.batch_size)
x_batch = self.x_train[batch_mask]
t_batch = self.t_train[batch_mask]
grads = self.network.gradient(x_batch, t_batch)
self.optimizer.update(self.network.params, grads)
loss = self.network.loss(x_batch, t_batch)
self.train_loss_list.append(loss)
if self.verbose: print("train loss:" + str(loss))
if self.current_iter % self.iter_per_epoch == 0:
self.current_epoch += 1
x_train_sample, t_train_sample = self.x_train, self.t_train
x_test_sample, t_test_sample = self.x_test, self.t_test
if not self.evaluate_sample_num_per_epoch is None:
t = self.evaluate_sample_num_per_epoch
x_train_sample, t_train_sample = \
self.x_train[:t], self.t_train[:t]
x_test_sample, t_test_sample = \
self.x_test[:t], self.t_test[:t]
train_acc = self.network.accuracy(x_train_sample, t_train_sample)
test_acc = self.network.accuracy(x_test_sample, t_test_sample)
self.train_acc_list.append(train_acc)
self.test_acc_list.append(test_acc)
print("=== epoch:" + str(self.current_epoch) + \
", train acc:" + str(train_acc) + \
", test acc:" + str(test_acc) + " ===")
self.current_iter += 1
def train(self):
for i in range(self.max_iter):
self.train_step()
test_acc = self.network.accuracy(self.x_test, self.t_test)
if self.verbose:
print("=============== Final Test Accuracy ===============")
print("test acc:" + str(test_acc))
def main():
# Get MNIST data set
mnist = Mnist()
# For convolution, image is 2 dimension, 28 x 28.
(x_train, t_train), (x_test, t_test) = mnist.load(flatten=False)
# The number of epochs
max_epochs = 20
# Convolutional Neural Network
network = SimpleConvNet(
# MNIST data: 1 channel, 28 x 28
input_dim=(1, 28, 28),
# 30 nodes for convolution layers,
# Filter: 5 x 5 weight matrix
# no padding, 1 stride
conv_param={'filter_num': 30,
'filter_size': 5,
'pad': 0,
'stride': 1},
# Size of input for hidden layers
hidden_size=100,
# Size of output for output layers
output_size=10,
# Initial standard variation of weight
weight_init_std=0.01)
# Trainer
trainer = Trainer(network, x_train, t_train, x_test, t_test,
epochs=max_epochs, mini_batch_size=100,
optimizer='adam', optimizer_param={'lr': 0.001},
evaluate_sample_num_per_epoch=1000, verbose=False)
# Train
start = datetime.datetime.now()
trainer.train()
end = datetime.datetime.now()
# Print total execution time
elapsed = end - start
print("Elapsed time: {0}".format(elapsed))
# Save params
network.save_params("params.pkl")
print("Saved Network Parameters")
# Draw graph
markers = {'train': 'o', 'test': 's'}
x = np.arange(max_epochs)
plt.plot(x, trainer.train_acc_list,
marker=markers["train"], label="train", markevery=2)
plt.plot(x, trainer.test_acc_list,
marker=markers["test"], label="test", markevery=2)
plt.xlabel("epochs")
plt.ylabel("accuracy")
plt.ylim(-0.1, 1.1)
plt.grid()
plt.title("Accuracies")
plt.legend(loc="lower right")
plt.show()
if __name__ == "__main__":
main()
=== epoch:1, train acc:0.18, test acc:0.192 ===
=== epoch:2, train acc:0.959, test acc:0.958 ===
=== epoch:3, train acc:0.973, test acc:0.968 ===
=== epoch:4, train acc:0.983, test acc:0.981 ===
=== epoch:5, train acc:0.989, test acc:0.985 ===
=== epoch:6, train acc:0.989, test acc:0.982 ===
=== epoch:7, train acc:0.988, test acc:0.982 ===
=== epoch:8, train acc:0.991, test acc:0.985 ===
=== epoch:9, train acc:0.989, test acc:0.982 ===
=== epoch:10, train acc:0.994, test acc:0.987 ===
=== epoch:11, train acc:0.994, test acc:0.991 ===
=== epoch:12, train acc:0.994, test acc:0.983 ===
=== epoch:13, train acc:0.993, test acc:0.985 ===
=== epoch:14, train acc:0.997, test acc:0.986 ===
=== epoch:15, train acc:0.995, test acc:0.989 ===
=== epoch:16, train acc:0.998, test acc:0.986 ===
=== epoch:17, train acc:0.997, test acc:0.986 ===
=== epoch:18, train acc:0.998, test acc:0.989 ===
=== epoch:19, train acc:0.998, test acc:0.985 ===
=== epoch:20, train acc:0.998, test acc:0.989 ===
Elapsed time: 0:58:24.092331
Saved Network Parameters
Image 1. MNIST Accuracy from CNN
Watch filters of Inner Layer
- The benefit of neural network is able to watch weights in every layers. As a result, for CNN, it is possible to see what each filters target.
- Because weights are usually generated by a random function, they do not have any patterns. However, after training, weights indicate some information. Besides, as depth of CNN is deeper, the detected information is more human recognizable. This is the reason why CNN also should have more convolution layers.
![layers](https://lh3.googleusercontent.com/-CrOR88WtvKI/WYyCACsqj4I/AAAAAAAAR3o/mVrKlw5FwIM8YA0rgpSwgHcLke8PR6RugCLcBGAs/s0/layers.png)
Image [CAPTION](src: vision03.csail.mit.edu)
COMMENTS