28. CNN Implementation

CNN implementation with numpy

tensorflow

Layers for CNN

  • To implement CNN, newly convolution layer and pooling layer should be defined. With them, feature extraction section can be established, and CNN can train and inference.

Convolution Layer

  • Basic CNN targets 3d images(channels, width, height), but like DNN, CNN should support batch operations. Also, there are multiple 2d filters. Therefore, actual target of CNN is 4d images and 3d filters.
  • However, It is not easy to write a code for convolution of 4d images and 3d filters.

im2col Transform

  • Fortunately, there is a simple method to convolution batch images and multiple filters. The method is usually called im2col in popular machine learning frameworks.
Image 1. Forward operations in Convolution layer for batch image
  • With im2col and matrix reshape, convolution is replaced with dot product. After dot product, its result should be reshaped to 3d feature map.
def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
    """
    Transform 4 dimensional images to 2 dimensional array.

    Parameters
    ----------
    input_data : 4 dimensional input images
                 (The number of images, The number of channels,
                  Height, Widht)
    filter_h : height of filter
    filter_w : width of fiter
    stride : the interval of stride
    pad : the interval of padding

    Returns
    -------
    col : 2 dimnesional array
    """
    N, C, H, W = input_data.shape
    out_h = (H + 2 * pad - filter_h) // stride + 1
    out_w = (W + 2 * pad - filter_w) // stride + 1

    img = np.pad(input_data,
                 [(0, 0), (0, 0), (pad, pad), (pad, pad)],
                 'constant')
    col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))

    for y in range(filter_h):
        y_max = y + stride * out_h
        for x in range(filter_w):
            x_max = x + stride * out_w
            col[:, :, y, x, :, :] = \
                img[:, :, y:y_max:stride, x:x_max:stride]

    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N * out_h * out_w, -1)
    return col
  • Furthermore, col2im is required for back propagation. This is implementation of col2im.
def col2im(col, input_shape, filter_h, filter_w, stride=1, pad=0):
    """Inverse of im2col.

    Parameters
    ----------
    col : 2 dimensional array
    input_shape : the shape of original input images
    filter_h : height of filter
    filter_w : width of filter
    stride : the interval of stride
    pad : the interval of padding

    Returns
    -------
    img : images
    """
    N, C, H, W = input_shape
    out_h = (H + 2 * pad - filter_h) // stride + 1
    out_w = (W + 2 * pad - filter_w) // stride + 1
    col = col.reshape(N, out_h, out_w, C,
                      filter_h, filter_w).transpose(0, 3, 4, 5, 1, 2)

    img = np.zeros((N, C, H + 2 * pad + stride - 1, W + 2 * pad + stride - 1))
    for y in range(filter_h):
        y_max = y + stride * out_h
        for x in range(filter_w):
            x_max = x + stride * out_w
            img[:, :, y:y_max:stride, x:x_max:stride] += col[:, :, y, x, :, :]

    return img[:, :, pad:H + pad, pad:W + pad]

Forward and Backward

  • With im2col and col2im, forward and backward of convolution layer is
class Convolution:
    def __init__(self, W, b, stride=1, pad=0):
        self.W = W
        self.b = b
        self.stride = stride
        self.pad = pad

        self.x = None
        self.col = None
        self.col_W = None

        self.dW = None
        self.db = None

    def forward(self, x):
        FN, C, FH, FW = self.W.shape
        N, C, H, W = x.shape
        out_h = 1 + int((H + 2*self.pad - FH) / self.stride)
        out_w = 1 + int((W + 2*self.pad - FW) / self.stride)

        col = im2col(x, FH, FW, self.stride, self.pad)
        col_W = self.W.reshape(FN, -1).T

        out = np.dot(col, col_W) + self.b
        out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)

        self.x = x
        self.col = col
        self.col_W = col_W

        return out

    def backward(self, dout):
        FN, C, FH, FW = self.W.shape
        dout = dout.transpose(0,2,3,1).reshape(-1, FN)

        self.db = np.sum(dout, axis=0)
        self.dW = np.dot(self.col.T, dout)
        self.dW = self.dW.transpose(1, 0).reshape(FN, C, FH, FW)

        dcol = np.dot(dout, self.col_W.T)
        dx = col2im(dcol, self.x.shape, FH, FW, self.stride, self.pad)

        return dx
  • Also, as Image 1, some reshapes are required.
  • The last operation is transpose. Transpose rearrange (N, Height, Width, Channel) to (N, Channel, Height, Width).

Pooling Layer

  • Pooling layer is a layer to select a special value in target area.
  • Pooling layer uses also im2col for forward propagation and col2im for back propagation. However, it does not require filters, because it can select the specific value with a simple criterion.
Image 2. Forward operations in Pooling layer with max pooling
class Pooling:
    def __init__(self, pool_h, pool_w, stride=1, pad=0):
        self.pool_h = pool_h
        self.pool_w = pool_w
        self.stride = stride
        self.pad = pad

        self.x = None
        self.arg_max = None

    def forward(self, x):
        N, C, H, W = x.shape
        out_h = int(1 + (H - self.pool_h) / self.stride)
        out_w = int(1 + (W - self.pool_w) / self.stride)

        col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
        col = col.reshape(-1, self.pool_h*self.pool_w)

        arg_max = np.argmax(col, axis=1)
        out = np.max(col, axis=1)
        out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)

        self.x = x
        self.arg_max = arg_max

        return out

    def backward(self, dout):
        dout = dout.transpose(0, 2, 3, 1)

        pool_size = self.pool_h * self.pool_w
        dmax = np.zeros((dout.size, pool_size))
        dmax[np.arange(self.arg_max.size), self.arg_max.flatten()] = \
            dout.flatten()
        dmax = dmax.reshape(dout.shape + (pool_size,))

        dcol = dmax.reshape(dmax.shape[0] * dmax.shape[1] * dmax.shape[2], -1)
        dx = col2im(dcol, self.x.shape,
                    self.pool_h, self.pool_w,
                    self.stride, self.pad)

        return dx

Linking of CNN

  • The simple CNN is
class SimpleConvNet:
    """
    conv - relu - pool - affine - relu - affine - softmax

    Parameters
    ----------
    input_size: input data size
    conv_param: parameters for convolution layers
    hidden_size: size of input for hidden layers
    output_size: size of output for output layers
    activation: activation function
    weight_init_std: initial standard variation of weight
    """

    def __init__(self, input_dim=(1, 28, 28),
                 conv_param={'filter_num': 30,
                             'filter_size': 5,
                             'pad': 0,
                             'stride': 1},
                 hidden_size=100, output_size=10, weight_init_std=0.01):
        filter_num = conv_param['filter_num']
        filter_size = conv_param['filter_size']
        filter_pad = conv_param['pad']
        filter_stride = conv_param['stride']
        input_size = input_dim[1]
        conv_output_size = \
            (input_size - filter_size + 2 * filter_pad) / filter_stride + 1
        pool_output_size = \
            int(filter_num * (conv_output_size / 2) * (conv_output_size / 2))

        # Initialize weights and biases
        self.params = {}
        self.params['W1'] = weight_init_std * \
                            np.random.randn(filter_num, input_dim[0],
                                            filter_size, filter_size)
        self.params['b1'] = np.zeros(filter_num)
        self.params['W2'] = weight_init_std * \
                            np.random.randn(pool_output_size, hidden_size)
        self.params['b2'] = np.zeros(hidden_size)
        self.params['W3'] = weight_init_std * \
                            np.random.randn(hidden_size, output_size)
        self.params['b3'] = np.zeros(output_size)

        # Create layers
        self.layers = OrderedDict()
        self.layers['Conv1'] = Convolution(self.params['W1'],
                                           self.params['b1'],
                                           conv_param['stride'],
                                           conv_param['pad'])
        self.layers['Relu1'] = Relu()
        self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
        self.layers['Affine1'] = Affine(self.params['W2'], self.params['b2'])
        self.layers['Relu2'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W3'], self.params['b3'])

        self.last_layer = SoftmaxWithLoss()

    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)

        return x

    def loss(self, x, t):
        """
        Parameters
        ----------
        x : Input data
        t : answer labels
        """
        y = self.predict(x)
        return self.last_layer.forward(y, t)

    def accuracy(self, x, t, batch_size=100):
        if t.ndim != 1: t = np.argmax(t, axis=1)

        acc = 0.0

        for i in range(int(x.shape[0] / batch_size)):
            tx = x[i * batch_size:(i + 1) * batch_size]
            tt = t[i * batch_size:(i + 1) * batch_size]
            y = self.predict(tx)
            y = np.argmax(y, axis=1)
            acc += np.sum(y == tt)

        return acc / x.shape[0]

    def gradient(self, x, t):
        """Gradient with back propagation

        Parameters
        ----------
        x : Input data
        t : Answer labels

        Returns
        -------
        Dictionary having gradients for each layers
        """
        # forward
        self.loss(x, t)

        # backward
        dout = 1
        dout = self.last_layer.backward(dout)

        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        # Create return dictionary
        grads = {}
        grads['W1'], grads['b1'] = \
            self.layers['Conv1'].dW, self.layers['Conv1'].db
        grads['W2'], grads['b2'] = \
            self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W3'], grads['b3'] = \
            self.layers['Affine2'].dW, self.layers['Affine2'].db

        return grads

Train MNIST Data Set

class Trainer:
    """
    Train network
    """

    def __init__(self, network, x_train, t_train, x_test, t_test,
                 epochs=20, mini_batch_size=100,
                 optimizer='SGD', optimizer_param={'lr': 0.01},
                 evaluate_sample_num_per_epoch=None, verbose=True):
        self.network = network
        self.verbose = verbose
        self.x_train = x_train
        self.t_train = t_train
        self.x_test = x_test
        self.t_test = t_test
        self.epochs = epochs
        self.batch_size = mini_batch_size
        self.evaluate_sample_num_per_epoch = evaluate_sample_num_per_epoch

        # optimzer
        optimizer_class_dict = {'adam': Adam, 'sgd':SGD}
        self.optimizer = \
            optimizer_class_dict[optimizer.lower()](**optimizer_param)

        self.train_size = x_train.shape[0]
        self.iter_per_epoch = max(self.train_size / mini_batch_size, 1)
        self.max_iter = int(epochs * self.iter_per_epoch)
        self.current_iter = 0
        self.current_epoch = 0

        self.train_loss_list = []
        self.train_acc_list = []
        self.test_acc_list = []

    def train_step(self):
        batch_mask = np.random.choice(self.train_size, self.batch_size)
        x_batch = self.x_train[batch_mask]
        t_batch = self.t_train[batch_mask]

        grads = self.network.gradient(x_batch, t_batch)
        self.optimizer.update(self.network.params, grads)

        loss = self.network.loss(x_batch, t_batch)
        self.train_loss_list.append(loss)
        if self.verbose: print("train loss:" + str(loss))

        if self.current_iter % self.iter_per_epoch == 0:
            self.current_epoch += 1

            x_train_sample, t_train_sample = self.x_train, self.t_train
            x_test_sample, t_test_sample = self.x_test, self.t_test
            if not self.evaluate_sample_num_per_epoch is None:
                t = self.evaluate_sample_num_per_epoch
                x_train_sample, t_train_sample = \
                    self.x_train[:t], self.t_train[:t]
                x_test_sample, t_test_sample = \
                    self.x_test[:t], self.t_test[:t]

            train_acc = self.network.accuracy(x_train_sample, t_train_sample)
            test_acc = self.network.accuracy(x_test_sample, t_test_sample)
            self.train_acc_list.append(train_acc)
            self.test_acc_list.append(test_acc)

            print("=== epoch:" + str(self.current_epoch) + \
                  ", train acc:" + str(train_acc) + \
                  ", test acc:" + str(test_acc) + " ===")
        self.current_iter += 1

    def train(self):
        for i in range(self.max_iter):
            self.train_step()

        test_acc = self.network.accuracy(self.x_test, self.t_test)

        if self.verbose:
            print("=============== Final Test Accuracy ===============")
            print("test acc:" + str(test_acc))

def main():
    # Get MNIST data set
    mnist = Mnist()
    # For convolution, image is 2 dimension, 28 x 28.
    (x_train, t_train), (x_test, t_test) = mnist.load(flatten=False)

    # The number of epochs
    max_epochs = 20

    # Convolutional Neural Network
    network = SimpleConvNet(
        # MNIST data: 1 channel, 28 x 28
        input_dim=(1, 28, 28),
        # 30 nodes for convolution layers,
        # Filter: 5 x 5 weight matrix
        #  no padding, 1 stride
        conv_param={'filter_num': 30,
                    'filter_size': 5,
                    'pad': 0,
                    'stride': 1},
        # Size of input for hidden layers
        hidden_size=100,
        # Size of output for output layers
        output_size=10,
        # Initial standard variation of weight
        weight_init_std=0.01)

    # Trainer
    trainer = Trainer(network, x_train, t_train, x_test, t_test,
                      epochs=max_epochs, mini_batch_size=100,
                      optimizer='adam', optimizer_param={'lr': 0.001},
                      evaluate_sample_num_per_epoch=1000, verbose=False)

    # Train
    start = datetime.datetime.now()
    trainer.train()
    end = datetime.datetime.now()

    # Print total execution time
    elapsed = end - start
    print("Elapsed time: {0}".format(elapsed))

    # Save params
    network.save_params("params.pkl")
    print("Saved Network Parameters")

    # Draw graph
    markers = {'train': 'o', 'test': 's'}
    x = np.arange(max_epochs)
    plt.plot(x, trainer.train_acc_list,
             marker=markers["train"], label="train", markevery=2)
    plt.plot(x, trainer.test_acc_list,
             marker=markers["test"], label="test", markevery=2)
    plt.xlabel("epochs")
    plt.ylabel("accuracy")
    plt.ylim(-0.1, 1.1)
    plt.grid()
    plt.title("Accuracies")
    plt.legend(loc="lower right")
    plt.show()

if __name__ == "__main__":
    main()

Full source code

=== epoch:1, train acc:0.18, test acc:0.192 ===
=== epoch:2, train acc:0.959, test acc:0.958 ===
=== epoch:3, train acc:0.973, test acc:0.968 ===
=== epoch:4, train acc:0.983, test acc:0.981 ===
=== epoch:5, train acc:0.989, test acc:0.985 ===
=== epoch:6, train acc:0.989, test acc:0.982 ===
=== epoch:7, train acc:0.988, test acc:0.982 ===
=== epoch:8, train acc:0.991, test acc:0.985 ===
=== epoch:9, train acc:0.989, test acc:0.982 ===
=== epoch:10, train acc:0.994, test acc:0.987 ===
=== epoch:11, train acc:0.994, test acc:0.991 ===
=== epoch:12, train acc:0.994, test acc:0.983 ===
=== epoch:13, train acc:0.993, test acc:0.985 ===
=== epoch:14, train acc:0.997, test acc:0.986 ===
=== epoch:15, train acc:0.995, test acc:0.989 ===
=== epoch:16, train acc:0.998, test acc:0.986 ===
=== epoch:17, train acc:0.997, test acc:0.986 ===
=== epoch:18, train acc:0.998, test acc:0.989 ===
=== epoch:19, train acc:0.998, test acc:0.985 ===
=== epoch:20, train acc:0.998, test acc:0.989 ===
Elapsed time: 0:58:24.092331
Saved Network Parameters
Image 1. MNIST Accuracy from CNN

Watch filters of Inner Layer

  • The benefit of neural network is able to watch weights in every layers. As a result, for CNN, it is possible to see what each filters target.
  • Because weights are usually generated by a random function, they do not have any patterns. However, after training, weights indicate some information. Besides, as depth of CNN is deeper, the detected information is more human recognizable. This is the reason why CNN also should have more convolution layers.
Image [CAPTION](src: vision03.csail.mit.edu)

COMMENTS

Name

0 weights,1,abstract class,1,active function,3,adam,2,Adapter,1,affine,2,argmax,1,back propagation,3,binary classification,3,blog,2,Bucket list,1,C++,11,Casting,1,cee,1,checkButton,1,cnn,3,col2im,1,columnspan,1,comboBox,1,concrete class,1,convolution,2,cost function,6,data preprocessing,2,data set,1,deep learning,31,Design Pattern,12,DIP,1,django,1,dnn,2,Don't Repeat Your code,1,drop out,2,ensemble,2,epoch,2,favicon,1,fcn,1,frame,1,gradient descent,5,gru,1,he,1,identify function,1,im2col,1,initialization,1,Lab,9,learning rate,2,LifeLog,1,linear regression,6,logistic function,1,logistic regression,3,logit,3,LSP,1,lstm,1,machine learning,31,matplotlib,1,menu,1,message box,1,mnist,3,mse,1,multinomial classification,3,mutli layer neural network,1,Non Virtual Interface,1,normalization,2,Note,21,numpy,4,one-hot encoding,3,OOP Principles,2,Open Close Principle,1,optimization,1,overfitting,1,padding,2,partial derivative,2,pooling,2,Prototype,1,pure virtual function,1,queue runner,1,radioButton,1,RBM,1,regularization,1,relu,2,reshape,1,restricted boltzmann machine,1,rnn,2,scrolledText,1,sigmoid,2,sigmoid function,1,single layer neural network,1,softmax,6,softmax classification,3,softmax cross entropy with logits,1,softmax function,2,softmax regression,3,softmax-with-loss,2,spinBox,1,SRP,1,standardization,1,sticky,1,stride,1,tab,1,Template Method,1,TensorFlow,31,testing data,1,this,2,tkinter,5,tooltip,1,Toplevel,1,training data,1,vanishing gradient,1,Virtual Copy Constructor,1,Virtual Destructor,1,Virtual Function,1,weight decay,1,xavier,2,xor,3,
ltr
item
Universe In Computer: 28. CNN Implementation
28. CNN Implementation
CNN implementation with numpy
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s0/
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s72-c/
Universe In Computer
https://kunicom.blogspot.com/2017/08/28-cnn-implementation.html
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/2017/08/28-cnn-implementation.html
true
2543631451419919204
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy