25. DNN Optimization

How to optimize DNN

tensorflow

Initial Weight Value

  • It is very important to select the initial value of weights and biases.
  • Until now, in several example codes, the initial values of weights and biases are chosen randomly. Therefore, results are varied at every training.
  • Still, how to set the initial values of weights and biases is researched and developed, but there is strict anti-pattern for the initialization.

0 for Weight

  • 0 should not be chosen for weights and biases.
  • If one of weights or biases is 0, the result of forward propagation is 0. Also, for back propagation, 0 will prevent other weights and biases from updating.

RBM Initialization

  • Many approaches to select good initial values are proposed until now.
  • One of popular approaches is Restricted Boltzmann Machine with Deep Belief Network. However, it requires high computation power to select initial values. - Wiki

Xavier/He initialization

  • Another popular approach is Xavier initialization. It focuses on the numbers of inputs and outputs.
W = np.random.randn(fan_in, fan_out) / np.sqrt(fan_in)
  • It provides fast training speed and high accuracy, but there was an optimal point.
  • Xavier's optimal version for ReLU is He initialization.
W = np.random.randn(fan_in, fan_out) / np.sqrt(fan_in / 2)
  • The MNIST performances of typical standard deviation method, Xaliver and He are compared in the below graph
Image 1. Performance comparison among Xaliver, std and He
  • For std and Xavier, sigmoid is used as activation function, and He uses ReLU.
  • He shows smaller cost during training, and it learns faster than Xavier.

Drop Out

  • Drop out is one of powerful technique to avoid overfitting.
  • Randomly set some neurons to zero in the forward pass.

  • In Drop out graph, grey nodes are dropped out, and grey lines are not linked practically. Therefore, only black nodes and lines are working.
  • Drop out technique should be applied during training, not inference. For inference, DNN should use all nodes and links.
import numpy an np
class Dropout:
    def __init__(self, ratio=0.5):
        self._ration = ration
        self._mask = None

    def foward(self, X, isTrain=True):
        if isTrain:
            self._mask = np.random.rand(*X.shape) > self._ratio
            return X * self._mask
        else:
            return X

    def backward(self, d):
        return d * self._mask
Image 2. MNIST training without Drop out (300 epochs)
Image 3. MNIST training with Drop out (300 epochs
Image 3. MNIST training with Drop out (600 epochs
  • As a result, the difference between train and test become reduced, and it requires more training time for high accuracy.

Ensemble

  • Ensemble is to inference with combination of independently trained multiple DNN machine.
  • Its concept is similar to Drop out. Training multiple machines is similar to dropping out nodes randomly, and inference with combination of multiple machines is similar to using all nodes in drop out technique.

COMMENTS

Name

0 weights,1,abstract class,1,active function,3,adam,2,Adapter,1,affine,2,argmax,1,back propagation,3,binary classification,3,blog,2,Bucket list,1,C++,11,Casting,1,cee,1,checkButton,1,cnn,3,col2im,1,columnspan,1,comboBox,1,concrete class,1,convolution,2,cost function,6,data preprocessing,2,data set,1,deep learning,31,Design Pattern,12,DIP,1,django,1,dnn,2,Don't Repeat Your code,1,drop out,2,ensemble,2,epoch,2,favicon,1,fcn,1,frame,1,gradient descent,5,gru,1,he,1,identify function,1,im2col,1,initialization,1,Lab,9,learning rate,2,LifeLog,1,linear regression,6,logistic function,1,logistic regression,3,logit,3,LSP,1,lstm,1,machine learning,31,matplotlib,1,menu,1,message box,1,mnist,3,mse,1,multinomial classification,3,mutli layer neural network,1,Non Virtual Interface,1,normalization,2,Note,21,numpy,4,one-hot encoding,3,OOP Principles,2,Open Close Principle,1,optimization,1,overfitting,1,padding,2,partial derivative,2,pooling,2,Prototype,1,pure virtual function,1,queue runner,1,radioButton,1,RBM,1,regularization,1,relu,2,reshape,1,restricted boltzmann machine,1,rnn,2,scrolledText,1,sigmoid,2,sigmoid function,1,single layer neural network,1,softmax,6,softmax classification,3,softmax cross entropy with logits,1,softmax function,2,softmax regression,3,softmax-with-loss,2,spinBox,1,SRP,1,standardization,1,sticky,1,stride,1,tab,1,Template Method,1,TensorFlow,31,testing data,1,this,2,tkinter,5,tooltip,1,Toplevel,1,training data,1,vanishing gradient,1,Virtual Copy Constructor,1,Virtual Destructor,1,Virtual Function,1,weight decay,1,xavier,2,xor,3,
ltr
item
Universe In Computer: 25. DNN Optimization
25. DNN Optimization
How to optimize DNN
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s0/
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s72-c/
Universe In Computer
https://kunicom.blogspot.com/2017/08/25-optimization-of-dnn.html
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/2017/08/25-optimization-of-dnn.html
true
2543631451419919204
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy