21. Back Propagation in Deep Neural Network

Back propagation in Deep Neural Network

tensorflow

Affine

  • Simple single factor affine is

$$ Y = W \cdot X + b $$

  • Also, affine can be represented with graph.
  • This is affine python class.
import numpy ad np

class Affine():
	def __init__(self, W, b):
		self.W = W
		self.b = b
		self.x = None
		self.dW = None
		self.db -= None

	def forward(self, x):
		self.x = x
		return np.dot(x, self.W) + self.b

	def backward(self, d):
		self.dW = np.dot(self.x.T, d)
		self.db = np.sum(d, axis=0)
		dx = np.dot(d, self.W.T)
		return dx

Sigmoid

  • Sigmoid is a popular activation function.
  • Equation

$$ Y = \frac{1}{1 + \exp^{-X}} $$

  • Graph
  • Python class
import numpy as np

class Sigmoid ():
	def __init__(self):
		self.value = None

    def forward(self, x):
        out = 1 / ( 1 + np.exp(-x))
        self.value = out
        return out

    def backward(self, d)
         dx = d * (1 - self.value) * self.value
         return dx

Softmax-with-loss

  • Softmax-with-loss is a combination of softmax and cost function.
  • During training, cost should be calculated to update weights and bias. Therefore, softmax-with-loss is more appropriate for training. For inference, softmax is not necessary, because the highest value would be chosen.
  • For example, let's assume that 3 labels classification neural network.
  • It is complicated, so forward and backward graphs are divided. Furthermore, softmax and cross entropy error are divided.
  • Forward graph: Input -> Softmax -> Cross Entropy Error
  • Forward graph: Softmax -> Cross Entropy Error -> Output
    • L1, L2, L3 are labels
  • Backward graph: Output -> Cross Entropy Error -> Softmax
    • The output of softmax-with-loss, Y, is the cost. Therefore, the differential value of cost node is \( \frac{\partial{COST}}{\partial{Y}} = \frac{\partial{Y}}{\partial{Y}} = 1 \).
  • Backward graph: Cross Entropy Error -> Softmax -> Input
    • If a node spreads its output in forward path, it has multiple inputs in backward path. For that case, add the inputs. See RECIP node.
  • Python code
import numpy as np

class SoftmaxWithLoss():
    def __init__(self):
        self.loss = None
        self.Y = None
        self.labels = None

    def forward(self, X, labels):
        self.labels = labels
        self.Y = self.softmax(X)
        self.loss = self.cross_entropy_error(self.Y, self.labels)

        return self.loss

    def backward(self, d=1):
        batch_size = self.T.shape[0]
        dx = (self.Y - self.labels) / batch_size

        return dx

    def softmax(self, X):
        ret = None
        if x.ndim == 2:
            X = X.T
            X = X - np.max(X, axis=0)
            Y = np.exp(x) / np.sum(np.exp(X), axis=0)
            ret = Y.T
        else:
            # To avoid overflow
            X = X - np.max(X)
            ret = np.exp(X) / np.sum(np.exp(X))

        return ret

    def cross_entropy_error(self, Y, labels):
        # Translate one-hot encoded labels to answer index.
        labels = labels.argmax(axis=1)

        batch_size = Y.shape[0]
        log_val = np.log(Y[np.arange(batch_size), labels])
        return -np.sum(log_val) / batch_size

Rectified Linear Unit (ReLU)

  • ReLU is the most representative activation function. - Wiki
  • The output of differential of ReLU is 1 or 0. Therefore, it makes computation cost less.
  • It will be explained in detail later, but here only explains its back propagation.
  • Equation

$$ Y = \begin{cases} X & : X > 0 \\ 0 & : X \le 0 \end{cases} $$

$$ \frac{\partial{Y}}{\partial{X}} = \begin{cases} 1 & : X > 0 \\ 0 & : X \le 0 \end{cases} $$

  • Graph if X is larger than 0.
  • Graph if X is less than or equal to 0.
  • Python code
import numpy as np

class RELU():
    def __init__(self):
        self.mask = None

    def forward(self, X):
        self.mask = (X <= 0)
        out = X.copy()
        out[self.mask] = 0

        return out

    def backward(self, d):
        d[self.mask] = 0
        dx = d

        return dx

COMMENTS

Name

0 weights,1,abstract class,1,active function,3,adam,2,Adapter,1,affine,2,argmax,1,back propagation,3,binary classification,3,blog,2,Bucket list,1,C++,11,Casting,1,cee,1,checkButton,1,cnn,3,col2im,1,columnspan,1,comboBox,1,concrete class,1,convolution,2,cost function,6,data preprocessing,2,data set,1,deep learning,31,Design Pattern,12,DIP,1,django,1,dnn,2,Don't Repeat Your code,1,drop out,2,ensemble,2,epoch,2,favicon,1,fcn,1,frame,1,gradient descent,5,gru,1,he,1,identify function,1,im2col,1,initialization,1,Lab,9,learning rate,2,LifeLog,1,linear regression,6,logistic function,1,logistic regression,3,logit,3,LSP,1,lstm,1,machine learning,31,matplotlib,1,menu,1,message box,1,mnist,3,mse,1,multinomial classification,3,mutli layer neural network,1,Non Virtual Interface,1,normalization,2,Note,21,numpy,4,one-hot encoding,3,OOP Principles,2,Open Close Principle,1,optimization,1,overfitting,1,padding,2,partial derivative,2,pooling,2,Prototype,1,pure virtual function,1,queue runner,1,radioButton,1,RBM,1,regularization,1,relu,2,reshape,1,restricted boltzmann machine,1,rnn,2,scrolledText,1,sigmoid,2,sigmoid function,1,single layer neural network,1,softmax,6,softmax classification,3,softmax cross entropy with logits,1,softmax function,2,softmax regression,3,softmax-with-loss,2,spinBox,1,SRP,1,standardization,1,sticky,1,stride,1,tab,1,Template Method,1,TensorFlow,31,testing data,1,this,2,tkinter,5,tooltip,1,Toplevel,1,training data,1,vanishing gradient,1,Virtual Copy Constructor,1,Virtual Destructor,1,Virtual Function,1,weight decay,1,xavier,2,xor,3,
ltr
item
Universe In Computer: 21. Back Propagation in Deep Neural Network
21. Back Propagation in Deep Neural Network
Back propagation in Deep Neural Network
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s0/
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s72-c/
Universe In Computer
https://kunicom.blogspot.com/2017/07/21-back-propagation-in-deep-neural.html
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/2017/07/21-back-propagation-in-deep-neural.html
true
2543631451419919204
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy