20. Back Propagation

Back propagation of deep neural network

tensorflow

Problem of Multi Layer Neural Network

  • Multi layer neural network is a powerful tool for Machine learning, but training the neural network, which means updating weights and bias for each layers, is difficult.
  • To update weights and bias, it is necessary to know how much each weights and bias affect the cost, and this is done by derivative, especially gradient descent algorithm.
  • However, derivative is very expensive computation. So if there are too many weights and bias in multi layer neural network, the problem could be impossible to solve in a certain time.
  • In the previous post, MNIST data set is trained with single layer neural network. For the result of example, it took almost 12 hours.
  • Multi layer neural network could have more weights and bias for additional layers, so it would take much longer time.

Back Propagation

  • A method to calculate the gradient of the loss function with respect to the weights in an artificial neural network. - Wiki
  • Back propagation is a fast partial derivative method with chain rule. - Wiki
  • To understand back propagation, only it is necessary to know basic partial derivative knowledge, but it won't be explained here. - Wiki

Back Propagation with Graph

  • Back propagation is based on chain rule of mathematics, so usually it is explained with complicated mathematics equations.
  • However, it can be also explained with graph and simple partial derivative. For neural network, back propagation can be applied to the network directly.

Simple Example of Back Propagation

  • Let's use simple equation for logit

$$ Y = X1 \cdot X2 + C $$

  • Back propagation calculates how much final node Y is sensitive to the all nodes, Y, X1, X2 and C.
  • As the name of back propagation, the last node is the start point of it, its direction is backward.
  • The first target is Y. It is differential of itself, and the equation is \( \frac{\partial{Y}}{\partial{Y}} = 1 \).
  • The second node is ADD. The input of ADD is S1and S2, and output is Y.

$$ \frac {\partial{Y}} {\partial{S1}} = \frac {\partial{S1 + S2}} {\partial{S1}} = 1 $$

$$ \frac {\partial{Y}} {\partial{S2}} = \frac {\partial{S1 + S2}} {\partial{S2}} = 1 $$

  • The next node is MUL. The input of MUL is X1 and X2, and output is S1.

$$ \frac{\partial{S1}}{\partial{X1}} = \frac{\partial{S1 \cdot S2}}{\partial{X1}} = X2 $$

$$ \frac{\partial{S1}}{\partial{X2}} = = \frac{\partial{S1 \cdot S2}}{\partial{X2}} = X1 $$

  • Now, the graph is
  • The black line is forward propagation, and the red line is backward propagation, and back propagation.
  • The value of the red line is how much the output is sensitive to the input.
  • With chain rule, the partial derivative values for COST are

$$ \frac{\partial{COST}}{\partial{S1}} = \frac{\partial{COST}}{\partial{Y}} \cdot \frac{\partial{Y}}{\partial{S1}} = \frac{\partial{COST}}{\partial{Y}} $$

$$ \frac{\partial{COST}}{\partial{S2}} = \frac{\partial{COST}}{\partial{Y}} \cdot \frac{\partial{Y}}{\partial{S2}} = \frac{\partial{COST}}{\partial{Y}} $$

$$ \frac{\partial{COST}}{\partial{X1}} = \frac{\partial{COST}}{\partial{Y}} \cdot \frac{\partial{Y}}{\partial{S1}} \cdot \frac{\partial{S1}}{\partial{X1}} = \frac{\partial{COST}}{\partial{Y}} \cdot X2 $$

$$ \frac{\partial{COST}}{\partial{X2}} = \frac{\partial{COST}}{\partial{Y}} \cdot \frac{\partial{Y}}{\partial{S1}} \cdot \frac{\partial{S1}}{\partial{X2}} = \frac{\partial{COST}}{\partial{Y}} \cdot X1 $$

  • From this work, it is possible to design python ADD and MUL node which have forward and backward path.
  • ADD

class ADD():
	def __init__(self):
		pass

	def forward(self, x1, x2):
		return x1 + x2

	def backward(self, d):
		dx = d * 1
		dy = d * 1
		return dx, dy
  • MUL

class MUL():
	def __init__(self):
		self.x1 = None
		self.x2 = None 

	def forward(self, x1, x2):
		self.x1 = x1
		self.x2 = x2
		return x1 * x2

	def backward(self, d):
		dx1 = d * self.x2
		dx2 = d * self.x1
		return dx1, dx2
  • With the node, the code for the example is

# Inputs
x1 = 2
x2 = 5
C = 3

# Nodes
mul = MUL()
add = ADD()

# Forward
s1 = mul.forward(x1, x2)
s2 = C
y = add.forward(s1, s2)

# Backward
# Let's assume dCOST/dY is 1
ds1, ds2 = add.backward(1)
dc = ds2
dx1, dx2 = mul.backward(ds1)

print("Y: {0}".format(y))
print("dCOST/dX1: {0}".format(dx1))
print("dCOST/dX2: {0}".format(dx2))
print("dCOST/dC: {0}".format(dc))
Y: 13
dCOST/dX1: 5
dCOST/dX2: 2
dCOST/dC: 1

COMMENTS

Name

0 weights,1,abstract class,1,active function,3,adam,2,Adapter,1,affine,2,argmax,1,back propagation,3,binary classification,3,blog,2,Bucket list,1,C++,11,Casting,1,cee,1,checkButton,1,cnn,3,col2im,1,columnspan,1,comboBox,1,concrete class,1,convolution,2,cost function,6,data preprocessing,2,data set,1,deep learning,31,Design Pattern,12,DIP,1,django,1,dnn,2,Don't Repeat Your code,1,drop out,2,ensemble,2,epoch,2,favicon,1,fcn,1,frame,1,gradient descent,5,gru,1,he,1,identify function,1,im2col,1,initialization,1,Lab,9,learning rate,2,LifeLog,1,linear regression,6,logistic function,1,logistic regression,3,logit,3,LSP,1,lstm,1,machine learning,31,matplotlib,1,menu,1,message box,1,mnist,3,mse,1,multinomial classification,3,mutli layer neural network,1,Non Virtual Interface,1,normalization,2,Note,21,numpy,4,one-hot encoding,3,OOP Principles,2,Open Close Principle,1,optimization,1,overfitting,1,padding,2,partial derivative,2,pooling,2,Prototype,1,pure virtual function,1,queue runner,1,radioButton,1,RBM,1,regularization,1,relu,2,reshape,1,restricted boltzmann machine,1,rnn,2,scrolledText,1,sigmoid,2,sigmoid function,1,single layer neural network,1,softmax,6,softmax classification,3,softmax cross entropy with logits,1,softmax function,2,softmax regression,3,softmax-with-loss,2,spinBox,1,SRP,1,standardization,1,sticky,1,stride,1,tab,1,Template Method,1,TensorFlow,31,testing data,1,this,2,tkinter,5,tooltip,1,Toplevel,1,training data,1,vanishing gradient,1,Virtual Copy Constructor,1,Virtual Destructor,1,Virtual Function,1,weight decay,1,xavier,2,xor,3,
ltr
item
Universe In Computer: 20. Back Propagation
20. Back Propagation
Back propagation of deep neural network
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s0/
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s72-c/
Universe In Computer
https://kunicom.blogspot.com/2017/07/20-back-propagation.html
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/2017/07/20-back-propagation.html
true
2543631451419919204
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy