CNN implementation with numpy

tensorflow

Layers for CNN

To implement CNN, newly convolution layer and pooling layer should be defined. With them, feature extraction section can be established, and CNN can train and inference.

Convolution Layer

Basic CNN targets 3d images(channels, width, height), but like DNN, CNN should support batch operations. Also, there are multiple 2d filters. Therefore, actual target of CNN is 4d images and 3d filters.
However, It is not easy to write a code for convolution of 4d images and 3d filters.

im2col Transform

Fortunately, there is a simple method to convolution batch images and multiple filters. The method is usually called im2col in popular machine learning frameworks.

Image 1. Forward operations in Convolution layer for batch image

With im2col and matrix reshape, convolution is replaced with dot product. After dot product, its result should be reshaped to 3d feature map.

def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
    """
    Transform 4 dimensional images to 2 dimensional array.

    Parameters
    ----------
    input_data : 4 dimensional input images
                 (The number of images, The number of channels,
                  Height, Widht)
    filter_h : height of filter
    filter_w : width of fiter
    stride : the interval of stride
    pad : the interval of padding

    Returns
    -------
    col : 2 dimnesional array
    """
    N, C, H, W = input_data.shape
    out_h = (H + 2 * pad - filter_h) // stride + 1
    out_w = (W + 2 * pad - filter_w) // stride + 1

    img = np.pad(input_data,
                 [(0, 0), (0, 0), (pad, pad), (pad, pad)],
                 'constant')
    col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))

    for y in range(filter_h):
        y_max = y + stride * out_h
        for x in range(filter_w):
            x_max = x + stride * out_w
            col[:, :, y, x, :, :] = \
                img[:, :, y:y_max:stride, x:x_max:stride]

    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N * out_h * out_w, -1)
    return col

Furthermore, col2im is required for back propagation. This is implementation of col2im.

def col2im(col, input_shape, filter_h, filter_w, stride=1, pad=0):
    """Inverse of im2col.

    Parameters
    ----------
    col : 2 dimensional array
    input_shape : the shape of original input images
    filter_h : height of filter
    filter_w : width of filter
    stride : the interval of stride
    pad : the interval of padding

    Returns
    -------
    img : images
    """
    N, C, H, W = input_shape
    out_h = (H + 2 * pad - filter_h) // stride + 1
    out_w = (W + 2 * pad - filter_w) // stride + 1
    col = col.reshape(N, out_h, out_w, C,
                      filter_h, filter_w).transpose(0, 3, 4, 5, 1, 2)

    img = np.zeros((N, C, H + 2 * pad + stride - 1, W + 2 * pad + stride - 1))
    for y in range(filter_h):
        y_max = y + stride * out_h
        for x in range(filter_w):
            x_max = x + stride * out_w
            img[:, :, y:y_max:stride, x:x_max:stride] += col[:, :, y, x, :, :]

    return img[:, :, pad:H + pad, pad:W + pad]

Forward and Backward

With im2col and col2im, forward and backward of convolution layer is

class Convolution:
    def __init__(self, W, b, stride=1, pad=0):
        self.W = W
        self.b = b
        self.stride = stride
        self.pad = pad

        self.x = None
        self.col = None
        self.col_W = None

        self.dW = None
        self.db = None

    def forward(self, x):
        FN, C, FH, FW = self.W.shape
        N, C, H, W = x.shape
        out_h = 1 + int((H + 2*self.pad - FH) / self.stride)
        out_w = 1 + int((W + 2*self.pad - FW) / self.stride)

        col = im2col(x, FH, FW, self.stride, self.pad)
        col_W = self.W.reshape(FN, -1).T

        out = np.dot(col, col_W) + self.b
        out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)

        self.x = x
        self.col = col
        self.col_W = col_W

        return out

    def backward(self, dout):
        FN, C, FH, FW = self.W.shape
        dout = dout.transpose(0,2,3,1).reshape(-1, FN)

        self.db = np.sum(dout, axis=0)
        self.dW = np.dot(self.col.T, dout)
        self.dW = self.dW.transpose(1, 0).reshape(FN, C, FH, FW)

        dcol = np.dot(dout, self.col_W.T)
        dx = col2im(dcol, self.x.shape, FH, FW, self.stride, self.pad)

        return dx

Also, as Image 1, some reshapes are required.
The last operation is transpose. Transpose rearrange (N, Height, Width, Channel) to (N, Channel, Height, Width).

Pooling Layer

Pooling layer is a layer to select a special value in target area.
Pooling layer uses also im2col for forward propagation and col2im for back propagation. However, it does not require filters, because it can select the specific value with a simple criterion.

Image 2. Forward operations in Pooling layer with max pooling

class Pooling:
    def __init__(self, pool_h, pool_w, stride=1, pad=0):
        self.pool_h = pool_h
        self.pool_w = pool_w
        self.stride = stride
        self.pad = pad

        self.x = None
        self.arg_max = None

    def forward(self, x):
        N, C, H, W = x.shape
        out_h = int(1 + (H - self.pool_h) / self.stride)
        out_w = int(1 + (W - self.pool_w) / self.stride)

        col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
        col = col.reshape(-1, self.pool_h*self.pool_w)

        arg_max = np.argmax(col, axis=1)
        out = np.max(col, axis=1)
        out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)

        self.x = x
        self.arg_max = arg_max

        return out

    def backward(self, dout):
        dout = dout.transpose(0, 2, 3, 1)

        pool_size = self.pool_h * self.pool_w
        dmax = np.zeros((dout.size, pool_size))
        dmax[np.arange(self.arg_max.size), self.arg_max.flatten()] = \
            dout.flatten()
        dmax = dmax.reshape(dout.shape + (pool_size,))

        dcol = dmax.reshape(dmax.shape[0] * dmax.shape[1] * dmax.shape[2], -1)
        dx = col2im(dcol, self.x.shape,
                    self.pool_h, self.pool_w,
                    self.stride, self.pad)

        return dx

Linking of CNN

The simple CNN is

class SimpleConvNet:
    """
    conv - relu - pool - affine - relu - affine - softmax

    Parameters
    ----------
    input_size: input data size
    conv_param: parameters for convolution layers
    hidden_size: size of input for hidden layers
    output_size: size of output for output layers
    activation: activation function
    weight_init_std: initial standard variation of weight
    """

    def __init__(self, input_dim=(1, 28, 28),
                 conv_param={'filter_num': 30,
                             'filter_size': 5,
                             'pad': 0,
                             'stride': 1},
                 hidden_size=100, output_size=10, weight_init_std=0.01):
        filter_num = conv_param['filter_num']
        filter_size = conv_param['filter_size']
        filter_pad = conv_param['pad']
        filter_stride = conv_param['stride']
        input_size = input_dim[1]
        conv_output_size = \
            (input_size - filter_size + 2 * filter_pad) / filter_stride + 1
        pool_output_size = \
            int(filter_num * (conv_output_size / 2) * (conv_output_size / 2))

        # Initialize weights and biases
        self.params = {}
        self.params['W1'] = weight_init_std * \
                            np.random.randn(filter_num, input_dim[0],
                                            filter_size, filter_size)
        self.params['b1'] = np.zeros(filter_num)
        self.params['W2'] = weight_init_std * \
                            np.random.randn(pool_output_size, hidden_size)
        self.params['b2'] = np.zeros(hidden_size)
        self.params['W3'] = weight_init_std * \
                            np.random.randn(hidden_size, output_size)
        self.params['b3'] = np.zeros(output_size)

        # Create layers
        self.layers = OrderedDict()
        self.layers['Conv1'] = Convolution(self.params['W1'],
                                           self.params['b1'],
                                           conv_param['stride'],
                                           conv_param['pad'])
        self.layers['Relu1'] = Relu()
        self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
        self.layers['Affine1'] = Affine(self.params['W2'], self.params['b2'])
        self.layers['Relu2'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W3'], self.params['b3'])

        self.last_layer = SoftmaxWithLoss()

    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)

        return x

    def loss(self, x, t):
        """
        Parameters
        ----------
        x : Input data
        t : answer labels
        """
        y = self.predict(x)
        return self.last_layer.forward(y, t)

    def accuracy(self, x, t, batch_size=100):
        if t.ndim != 1: t = np.argmax(t, axis=1)

        acc = 0.0

        for i in range(int(x.shape[0] / batch_size)):
            tx = x[i * batch_size:(i + 1) * batch_size]
            tt = t[i * batch_size:(i + 1) * batch_size]
            y = self.predict(tx)
            y = np.argmax(y, axis=1)
            acc += np.sum(y == tt)

        return acc / x.shape[0]

    def gradient(self, x, t):
        """Gradient with back propagation

        Parameters
        ----------
        x : Input data
        t : Answer labels

        Returns
        -------
        Dictionary having gradients for each layers
        """
        # forward
        self.loss(x, t)

        # backward
        dout = 1
        dout = self.last_layer.backward(dout)

        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        # Create return dictionary
        grads = {}
        grads['W1'], grads['b1'] = \
            self.layers['Conv1'].dW, self.layers['Conv1'].db
        grads['W2'], grads['b2'] = \
            self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W3'], grads['b3'] = \
            self.layers['Affine2'].dW, self.layers['Affine2'].db

        return grads

Train MNIST Data Set

class Trainer:
    """
    Train network
    """

    def __init__(self, network, x_train, t_train, x_test, t_test,
                 epochs=20, mini_batch_size=100,
                 optimizer='SGD', optimizer_param={'lr': 0.01},
                 evaluate_sample_num_per_epoch=None, verbose=True):
        self.network = network
        self.verbose = verbose
        self.x_train = x_train
        self.t_train = t_train
        self.x_test = x_test
        self.t_test = t_test
        self.epochs = epochs
        self.batch_size = mini_batch_size
        self.evaluate_sample_num_per_epoch = evaluate_sample_num_per_epoch

        # optimzer
        optimizer_class_dict = {'adam': Adam, 'sgd':SGD}
        self.optimizer = \
            optimizer_class_dict[optimizer.lower()](**optimizer_param)

        self.train_size = x_train.shape[0]
        self.iter_per_epoch = max(self.train_size / mini_batch_size, 1)
        self.max_iter = int(epochs * self.iter_per_epoch)
        self.current_iter = 0
        self.current_epoch = 0

        self.train_loss_list = []
        self.train_acc_list = []
        self.test_acc_list = []

    def train_step(self):
        batch_mask = np.random.choice(self.train_size, self.batch_size)
        x_batch = self.x_train[batch_mask]
        t_batch = self.t_train[batch_mask]

        grads = self.network.gradient(x_batch, t_batch)
        self.optimizer.update(self.network.params, grads)

        loss = self.network.loss(x_batch, t_batch)
        self.train_loss_list.append(loss)
        if self.verbose: print("train loss:" + str(loss))

        if self.current_iter % self.iter_per_epoch == 0:
            self.current_epoch += 1

            x_train_sample, t_train_sample = self.x_train, self.t_train
            x_test_sample, t_test_sample = self.x_test, self.t_test
            if not self.evaluate_sample_num_per_epoch is None:
                t = self.evaluate_sample_num_per_epoch
                x_train_sample, t_train_sample = \
                    self.x_train[:t], self.t_train[:t]
                x_test_sample, t_test_sample = \
                    self.x_test[:t], self.t_test[:t]

            train_acc = self.network.accuracy(x_train_sample, t_train_sample)
            test_acc = self.network.accuracy(x_test_sample, t_test_sample)
            self.train_acc_list.append(train_acc)
            self.test_acc_list.append(test_acc)

            print("=== epoch:" + str(self.current_epoch) + \
                  ", train acc:" + str(train_acc) + \
                  ", test acc:" + str(test_acc) + " ===")
        self.current_iter += 1

    def train(self):
        for i in range(self.max_iter):
            self.train_step()

        test_acc = self.network.accuracy(self.x_test, self.t_test)

        if self.verbose:
            print("=============== Final Test Accuracy ===============")
            print("test acc:" + str(test_acc))

def main():
    # Get MNIST data set
    mnist = Mnist()
    # For convolution, image is 2 dimension, 28 x 28.
    (x_train, t_train), (x_test, t_test) = mnist.load(flatten=False)

    # The number of epochs
    max_epochs = 20

    # Convolutional Neural Network
    network = SimpleConvNet(
        # MNIST data: 1 channel, 28 x 28
        input_dim=(1, 28, 28),
        # 30 nodes for convolution layers,
        # Filter: 5 x 5 weight matrix
        #  no padding, 1 stride
        conv_param={'filter_num': 30,
                    'filter_size': 5,
                    'pad': 0,
                    'stride': 1},
        # Size of input for hidden layers
        hidden_size=100,
        # Size of output for output layers
        output_size=10,
        # Initial standard variation of weight
        weight_init_std=0.01)

    # Trainer
    trainer = Trainer(network, x_train, t_train, x_test, t_test,
                      epochs=max_epochs, mini_batch_size=100,
                      optimizer='adam', optimizer_param={'lr': 0.001},
                      evaluate_sample_num_per_epoch=1000, verbose=False)

    # Train
    start = datetime.datetime.now()
    trainer.train()
    end = datetime.datetime.now()

    # Print total execution time
    elapsed = end - start
    print("Elapsed time: {0}".format(elapsed))

    # Save params
    network.save_params("params.pkl")
    print("Saved Network Parameters")

    # Draw graph
    markers = {'train': 'o', 'test': 's'}
    x = np.arange(max_epochs)
    plt.plot(x, trainer.train_acc_list,
             marker=markers["train"], label="train", markevery=2)
    plt.plot(x, trainer.test_acc_list,
             marker=markers["test"], label="test", markevery=2)
    plt.xlabel("epochs")
    plt.ylabel("accuracy")
    plt.ylim(-0.1, 1.1)
    plt.grid()
    plt.title("Accuracies")
    plt.legend(loc="lower right")
    plt.show()

if __name__ == "__main__":
    main()

Full source code

=== epoch:1, train acc:0.18, test acc:0.192 ===
=== epoch:2, train acc:0.959, test acc:0.958 ===
=== epoch:3, train acc:0.973, test acc:0.968 ===
=== epoch:4, train acc:0.983, test acc:0.981 ===
=== epoch:5, train acc:0.989, test acc:0.985 ===
=== epoch:6, train acc:0.989, test acc:0.982 ===
=== epoch:7, train acc:0.988, test acc:0.982 ===
=== epoch:8, train acc:0.991, test acc:0.985 ===
=== epoch:9, train acc:0.989, test acc:0.982 ===
=== epoch:10, train acc:0.994, test acc:0.987 ===
=== epoch:11, train acc:0.994, test acc:0.991 ===
=== epoch:12, train acc:0.994, test acc:0.983 ===
=== epoch:13, train acc:0.993, test acc:0.985 ===
=== epoch:14, train acc:0.997, test acc:0.986 ===
=== epoch:15, train acc:0.995, test acc:0.989 ===
=== epoch:16, train acc:0.998, test acc:0.986 ===
=== epoch:17, train acc:0.997, test acc:0.986 ===
=== epoch:18, train acc:0.998, test acc:0.989 ===
=== epoch:19, train acc:0.998, test acc:0.985 ===
=== epoch:20, train acc:0.998, test acc:0.989 ===
Elapsed time: 0:58:24.092331
Saved Network Parameters

Image 1. MNIST Accuracy from CNN

Watch filters of Inner Layer

The benefit of neural network is able to watch weights in every layers. As a result, for CNN, it is possible to see what each filters target.
Because weights are usually generated by a random function, they do not have any patterns. However, after training, weights indicate some information. Besides, as depth of CNN is deeper, the detected information is more human recognizable. This is the reason why CNN also should have more convolution layers.

Image [CAPTION](src: vision03.csail.mit.edu)

Universe In Computer

Header$type=social_icons

$type=grid$count=3$meta=0$sn=0$rm=0

28. CNN Implementation

TOC

Layers for CNN

Convolution Layer

im2col Transform

Forward and Backward

Pooling Layer

Linking of CNN

Train MNIST Data Set

Watch filters of Inner Layer

라벨:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts

$type=grid$count=3$meta=0$sn=0$rm=0

28. CNN Implementation

TOC

Layers for CNN

Convolution Layer

im2col Transform

Forward and Backward

Pooling Layer

Linking of CNN

Train MNIST Data Set

Watch filters of Inner Layer

라벨:

SHARE:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts