Basic information of CNN

tensorflow

Convolutional Neural Network (CNN)

A class of deep, feed-forward artificial neural network that have successfully been applied to analyzing visual imagery. - Wiki

Image 1. Convolutional Neural Network(src: http://parse.ele.tue.nl/education/cluster2)

CNN can be also adapted to voice recognition and other fields.

Architecture

Compared to traditional DNN, CNN has feature extraction section including convolution layers and pooling layers additionally.

Feature extraction section consists of convolution layers, active functions and pooling layers. (Pooling layer is optional.)
Classification section is organized with affine layers and active functions, like traditional DNN. This is also called as fully connected network.
From the feature extraction section, object information is extracted from the input image, and this information is classified in classification section. At the end of classification section, the machine guess what the input is finally.

Convolution Layer

As previous MNIST training in traditional DNN, geometric information is disappeared by flattening image to 1 x 784 array. However, CNN can be trained with geometric information. Besides, channel information is trained as the 3rd dimension.
An Input/Output of Convolution layer is called as feature map. Feature map

Image 2. Feature map(src: www.researchgate.net)

Along CNN, features becomes clear. At the beginning of CNN, detected feature size is too small, but the features from the last of feature extraction section is human-distinguishable.

Convolution

The process of adding each element of the image to its local neighbors, weighted by the kernel (= filter). - Wiki

$$ \begin{bmatrix} 3 & 3 & 2 & 1 & 0 \\ 0 & 0 & 1 & 3 & 1 \\ 3 & 1 & 2 & 2 & 3 \\ 2 & 0 & 0 & 2 & 2 \\ 2 & 0 & 0 & 0 & 1 \end{bmatrix} \circledast \begin{bmatrix} 0 & 1 & 2 \\ 2 & 2 & 0 \\ 0 & 1 & 2\end{bmatrix} $$ $$ = \begin{bmatrix} 12 & 12 & 17 \\ 10 & 17 & 19 \\ 9 & 6 & 14 \end{bmatrix}$$

Image 3. Convolution(src: deeplearning.net)

Weights and Biases

CNN also has weights and biases like traditional DNN. Only multiplication of X and W is replaced by convolution of X and W.
In CNN, kernel and filter represent weight.

$$ X \circledast W + b $$ $$ = \begin{bmatrix} 1 & 2 & 3 & 0 \\ 0 & 1 & 2 & 3 \\ 3 & 0 & 1 & 2 \\ 2 & 3 & 0 & 1 \end{bmatrix} \circledast \begin{bmatrix} 2 & 0 & 1 \\ 0 & 1 & 2 \\ 1 & 0 & 2 \end{bmatrix} + 3 $$ $$ = \begin{bmatrix} 15 & 16 \\ 6 & 15 \end{bmatrix} + 3 $$ $$ = \begin{bmatrix} 18 & 19 \\ 9 & 18 \end{bmatrix} $$

Padding

Padding is to wrap the input feature map with a specific value to prevent it from shrinking.
In the previous example, the convolution of the 4x4 input feature map and 3 x 3 filter returns the 2 x 2 output feature map. If input feature map passes through multiple convolution layers without padding, its information will be zipped into a single scalar value.
There are many kinds of Padding but usually zero padding is used.
With padding, the size of output feature map is the same as the size of input feature map.

Image 4. Convolution with padding(src: deeplearning.net)

Stride

Stride is an interval of moving filter.
If stride is 1 x 1, filter is moved as 1 along x axis and y axis. The stride of Image 3 is 1 x 1.
If stride is 2 x 2,

Image 5. Stride: 2 x 2(src: deeplearning.net)

According to stride, the size of output feature image is changed.

$$ OW = \frac{IW + 2P - FW}{SW} + 1 $$ $$ OH = \frac{IH + 2P - FH}{SH} + 1 $$

* IW: Input Width, IH: Input Height
* OW: Output Width, OH: Output Height
* SW: Stride Width, SH: Stride Height
* P: Padding

3 Dimensional Convolution

CNN does not consider only width and height, but also channel, which is color information usually.
For each channel, independent filters are required. The below example, R,G and B feature map has their own RF, GF and BF filters.

Image 6. 3D convolution

It is easy to think all channel's feature map as a block for further extension.

Image 7. Concept of block convolution

For CNN, it is possible to apply multiple filters to input feature map. The number of filters becomes the channel number of the output feature map.

Image 8. Multiple filters

Pooling Layer

Pooling is an operation to shrink input feature map, so it is called sub sampling.
Max pooling is to select the max value, and average pooling is to select the average in target area. The target area is shrunken into a scalar value chosen, then output feature map is smalled than input feature map.

Image 9. Pooling(src: inclass.kaggle.com)

Usually, max pooling is used.
3 features of pooling layer
- No weights to train
- The number of channel is not changed
- Stable to variation of input feature map

Fully Connected Layer

Fully connected (FC) layer is the traditional softmax regression of DNN.
After features are extracted by feature extraction section, FC layer inferences what the objects are with softmax classification.

Universe In Computer

Header$type=social_icons

$type=grid$count=3$meta=0$sn=0$rm=0

27. CNN Basic

TOC

Convolutional Neural Network (CNN)

Architecture

Convolution Layer

Convolution

Weights and Biases

Padding

Stride

3 Dimensional Convolution

Pooling Layer

Fully Connected Layer

라벨:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts

$type=grid$count=3$meta=0$sn=0$rm=0

27. CNN Basic

TOC

Convolutional Neural Network (CNN)

Architecture

Convolution Layer

Convolution

Weights and Biases

Padding

Stride

3 Dimensional Convolution

Pooling Layer

Fully Connected Layer

라벨:

SHARE:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts