Back propagation in Deep Neural Network

tensorflow

Affine

Simple single factor affine is

$$ Y = W \cdot X + b $$

Also, affine can be represented with graph.

This is affine python class.

import numpy ad np

class Affine():
	def __init__(self, W, b):
		self.W = W
		self.b = b
		self.x = None
		self.dW = None
		self.db -= None

	def forward(self, x):
		self.x = x
		return np.dot(x, self.W) + self.b

	def backward(self, d):
		self.dW = np.dot(self.x.T, d)
		self.db = np.sum(d, axis=0)
		dx = np.dot(d, self.W.T)
		return dx

Sigmoid

Sigmoid is a popular activation function.
Equation

$$ Y = \frac{1}{1 + \exp^{-X}} $$

Graph

Python class

import numpy as np

class Sigmoid ():
	def __init__(self):
		self.value = None

    def forward(self, x):
        out = 1 / ( 1 + np.exp(-x))
        self.value = out
        return out

    def backward(self, d)
         dx = d * (1 - self.value) * self.value
         return dx

Softmax-with-loss

Softmax-with-loss is a combination of softmax and cost function.
During training, cost should be calculated to update weights and bias. Therefore, softmax-with-loss is more appropriate for training. For inference, softmax is not necessary, because the highest value would be chosen.
For example, let's assume that 3 labels classification neural network.
It is complicated, so forward and backward graphs are divided. Furthermore, softmax and cross entropy error are divided.
Forward graph: Input -> Softmax -> Cross Entropy Error

Forward graph: Softmax -> Cross Entropy Error -> Output
- L1, L2, L3 are labels

Backward graph: Output -> Cross Entropy Error -> Softmax
- The output of softmax-with-loss, Y, is the cost. Therefore, the differential value of cost node is $ \frac{\partial{COST}}{\partial{Y}} = \frac{\partial{Y}}{\partial{Y}} = 1 $.

Backward graph: Cross Entropy Error -> Softmax -> Input
- If a node spreads its output in forward path, it has multiple inputs in backward path. For that case, add the inputs. See RECIP node.

Python code

import numpy as np

class SoftmaxWithLoss():
    def __init__(self):
        self.loss = None
        self.Y = None
        self.labels = None

    def forward(self, X, labels):
        self.labels = labels
        self.Y = self.softmax(X)
        self.loss = self.cross_entropy_error(self.Y, self.labels)

        return self.loss

    def backward(self, d=1):
        batch_size = self.T.shape[0]
        dx = (self.Y - self.labels) / batch_size

        return dx

    def softmax(self, X):
        ret = None
        if x.ndim == 2:
            X = X.T
            X = X - np.max(X, axis=0)
            Y = np.exp(x) / np.sum(np.exp(X), axis=0)
            ret = Y.T
        else:
            # To avoid overflow
            X = X - np.max(X)
            ret = np.exp(X) / np.sum(np.exp(X))

        return ret

    def cross_entropy_error(self, Y, labels):
        # Translate one-hot encoded labels to answer index.
        labels = labels.argmax(axis=1)

        batch_size = Y.shape[0]
        log_val = np.log(Y[np.arange(batch_size), labels])
        return -np.sum(log_val) / batch_size

Rectified Linear Unit (ReLU)

ReLU is the most representative activation function. - Wiki
The output of differential of ReLU is 1 or 0. Therefore, it makes computation cost less.
It will be explained in detail later, but here only explains its back propagation.
Equation

$$ Y = \begin{cases} X & : X > 0 \\ 0 & : X \le 0 \end{cases} $$

$$ \frac{\partial{Y}}{\partial{X}} = \begin{cases} 1 & : X > 0 \\ 0 & : X \le 0 \end{cases} $$

Graph if X is larger than 0.

Graph if X is less than or equal to 0.

Python code

import numpy as np

class RELU():
    def __init__(self):
        self.mask = None

    def forward(self, X):
        self.mask = (X <= 0)
        out = X.copy()
        out[self.mask] = 0

        return out

    def backward(self, d):
        d[self.mask] = 0
        dx = d

        return dx

Universe In Computer

Header$type=social_icons

$type=grid$count=3$meta=0$sn=0$rm=0

21. Back Propagation in Deep Neural Network

TOC

Affine

Sigmoid

Softmax-with-loss

Rectified Linear Unit (ReLU)

라벨:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts

$type=grid$count=3$meta=0$sn=0$rm=0

21. Back Propagation in Deep Neural Network

TOC

Affine

Sigmoid

Softmax-with-loss

Rectified Linear Unit (ReLU)

라벨:

SHARE:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts