15. Learning Rate & Data Processing & Overfitting

explain of learning rate, data preprocessing and overfitting in machine learning

tensorflow

Learning Rate

  • Learning rate is a value indicating how much wegihts and biases to change.

$$ W_{n} = W_{n-1} - \alpha \cdot \sum \frac{\partial Cost(W)}{\partial W} $$ $$ b_{n} = b_{n-1} - \alpha \cdot \sum \frac{\partial Cost(b)}{\partial b} $$

  • If learning rate is large, weights and biases are moved to a lot along the cost function.

Overshooting - Too Large Learning Rate

  • If learning rate is too large,
import numpy as np
import matplotlib.pyplot as plt

def calc_cost(w):
    # hypothesis = W * x
    hypo = [w * _x for _x in x]

    _mse = list(map(lambda _hypo, _answer : (_hypo - _answer) ** 2, hypo, y))
    sumMse = sum(_mse)

    # 1 / m * sum(W * x - y)^2
    return 1 / (len(W)) * sumMse

# Generate input and answer
# y = x
x = [i * 0.01 for i in range(-100, 100)]
y = [i for i in x]

# For the Mean Square Error
# To show the effect of the large learning rate,
#  weight has large range
W = [i * 0.001 for i in range(-5000, 5001)]

# Costs
costs = []
# Draw cost function
for w in W:
    costs.append(calc_cost(w))
plt.plot(W, costs, "r")

# Start from particular points
w = 0.18
learning_rate = 100

# Actual weight to learn
W = [i * 0.001 for i in range(-1000, 1001)]
# Cost and Gradient descent
for i in range(3):
    # Calculate cost
    _cost = calc_cost(w)

    # Draw calculated value
    #if w > 0.95:
    plt.plot(w, _cost, "o", \
    label="Trial: {0} W: {1:3.2f}, Cost(W): {2:3.2f}".format(i, w, _cost))
    
    # Calculate gradient descent
    gradients = list(map(\
        lambda _input, _answer : ((w * _input) - _answer) * _input, x, y))

    sumGrad = sum(gradients)

    # Descent and update w
    w = w - learning_rate / len(W) * sumGrad

plt.title("Effect of Large Learning Rate")
plt.xlabel("W")
plt.ylabel("Cost(W)")
plt.xlim(-5, 5)
plt.grid()
plt.legend(numpoints=1,loc='upper right')
plt.show()
Image 1. Effect of large learning rate
  • Large learning rate leads weight and cost wrong.
  • In this example, weight climbed up the cost graph, and cost become high.
  • If learning rate is too small,
import numpy as np
import matplotlib.pyplot as plt

def calc_cost(w):
    # hypothesis = W * x
    hypo = [w * _x for _x in x]

    _mse = list(map(lambda _hypo, _answer : (_hypo - _answer) ** 2, hypo, y))
    sumMse = sum(_mse)

    # 1 / m * sum(W * x - y)^2
    return 1 / (len(W)) * sumMse

# Generate input and answer
# y = x
x = [i * 0.01 for i in range(-100, 100)]
y = [i for i in x]

# For the Mean Square Error
# To show the effect of the large learning rate,
#  weight has large range
W = [i * 0.001 for i in range(0, 2001)]

# Costs
costs = []
# Draw cost function
for w in W:
    costs.append(calc_cost(w))
plt.plot(W, costs, "r")

# Start from particular points
w = 0.18
learning_rate = 0.000001

# Actual weight to learn
W = [i * 0.001 for i in range(-1000, 1001)]
# Cost and Gradient descent
for i in range(10000):
    # Calculate cost
    _cost = calc_cost(w)

    # Draw calculated value
    #if w > 0.95:
    plt.plot(w, _cost, "o")
    
    # Calculate gradient descent
    gradients = list(map(\
        lambda _input, _answer : ((w * _input) - _answer) * _input, x, y))

    sumGrad = sum(gradients)

    # Descent and update w
    w = w - learning_rate / len(W) * sumGrad

plt.title("Effect of Small Learning Rate")
plt.xlabel("W")
plt.ylabel("Cost(W)")
plt.xlim(0.179, 0.181)
plt.grid()
plt.show()
Image 2. Effect of small learning rate
  • It moves too slow.
  • In this example, weight is trained 10000 times, but it still near the start point.
  • Therefore, it is very important to set learning rate.
  • Learning rate is set intuitively now, but it will be introduced soon how to choose optimized it.

Data Preprocessing

  • Input data is from anywhere, so its ranges are various. Some data will be placed between -1 ~ 1, and another data will be placed -100 ~ 100.
  • If we have 2 factors, A and B, for 1 training instance, and range of A is -1 ~ 1 and B's range is -1000 ~ 1000, It is difficult to optimize weight. Since A's range is too smaller than B, A is more sensitive.
import numpy as np
from matplotlib import patches
import matplotlib.pyplot as plt

# Set size of plot
width = 7
height = 7
plt.figure(figsize=(width, height))

ax = plt.gca()

xcenter, ycenter = 0, 0
width, height = 2, 25

for i in range(20):
    width = width - width * 0.1
    height = height - height * 0.1

    theta = np.deg2rad(np.arange(0.0, 360.0, 1.0))
    x = 0.5 * width * np.cos(theta)
    y = 0.5 * height * np.sin(theta)

    x += xcenter
    y += ycenter

    e1 = patches.Ellipse((xcenter, ycenter), width, height,
                         fill=False, zorder=2)
    ax.add_patch(e1)

plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xlabel("A")
plt.ylabel("B")
plt.title("Ellipse for Different Ranges' Data")
plt.show()
Image 3. Ellipse for different ranges' data
  • In this example, reducing the value of A is more affective to be close to the center than B. This is unfair for B. Therefore, the found local minimum might not be a global minimum, but only for A, because A will more affect to gradient descent which is a way to shortest path for local minimum.
  • To overcome this problem, preprocessing is useful. - CS231n
    • zero-centered data
    • normalized data
Image 4. Data preprocessing
  • Standardization is one of popular normalization method.

$$ X^{'} = \frac{X - \mu}{\sigma} $$

  • \( \mu \) is a mean value of X, and \( \sigma \) is a standard deviation of X.
  • This can be represented by the below code in python.
X_std[:, 0] = (X[:, 0] - X[:, 0].mean()) / X[:, 0].std()

Overfitting

  • A model describes random error or noise instead of the underlying relationship. - Wiki
  • Main task of Machine Learning is to find a general model to fit a set of training data. However, some training data might be noises, so it can disturb training. Furthermore, they can distort the model to be close to them. In that case, we say the model is overfitting.
Image 5. Overfitting
  • The black line in the image is the general decision model. However, when some noises are strongly affected during the training, the model could be green line.

  • To overcome overfitting,

    • Train with many training data
    • Reduce the number of factors
  • Also, there are some techniques to reduce overfitting.

Regularization

  • Weight decay
  • Not to have too big number of weight.
  • To do that, add square of weight to cost function with regularization strength(\(\lambda\)).

$$ cost(W) = \frac{1}{N} \sum_{i} Diff(H(X_i), Y_i) + \lambda \cdot \sum W^2 $$ $$ W = W - \alpha {\partial \over\partial W} cost(W, b) $$

  • Overfitting is usually appeared when weight is too big.
  • If the square of weight is added to cost function, it makes weight small faster. Therefore, it helps to avoid overfitting.
  • Regularization strength is similar to learning rate. If it is big, penalty of weight is high. As a result, weight become smaller fast.

COMMENTS

Name

0 weights,1,abstract class,1,active function,3,adam,2,Adapter,1,affine,2,argmax,1,back propagation,3,binary classification,3,blog,2,Bucket list,1,C++,11,Casting,1,cee,1,checkButton,1,cnn,3,col2im,1,columnspan,1,comboBox,1,concrete class,1,convolution,2,cost function,6,data preprocessing,2,data set,1,deep learning,31,Design Pattern,12,DIP,1,django,1,dnn,2,Don't Repeat Your code,1,drop out,2,ensemble,2,epoch,2,favicon,1,fcn,1,frame,1,gradient descent,5,gru,1,he,1,identify function,1,im2col,1,initialization,1,Lab,9,learning rate,2,LifeLog,1,linear regression,6,logistic function,1,logistic regression,3,logit,3,LSP,1,lstm,1,machine learning,31,matplotlib,1,menu,1,message box,1,mnist,3,mse,1,multinomial classification,3,mutli layer neural network,1,Non Virtual Interface,1,normalization,2,Note,21,numpy,4,one-hot encoding,3,OOP Principles,2,Open Close Principle,1,optimization,1,overfitting,1,padding,2,partial derivative,2,pooling,2,Prototype,1,pure virtual function,1,queue runner,1,radioButton,1,RBM,1,regularization,1,relu,2,reshape,1,restricted boltzmann machine,1,rnn,2,scrolledText,1,sigmoid,2,sigmoid function,1,single layer neural network,1,softmax,6,softmax classification,3,softmax cross entropy with logits,1,softmax function,2,softmax regression,3,softmax-with-loss,2,spinBox,1,SRP,1,standardization,1,sticky,1,stride,1,tab,1,Template Method,1,TensorFlow,31,testing data,1,this,2,tkinter,5,tooltip,1,Toplevel,1,training data,1,vanishing gradient,1,Virtual Copy Constructor,1,Virtual Destructor,1,Virtual Function,1,weight decay,1,xavier,2,xor,3,
ltr
item
Universe In Computer: 15. Learning Rate & Data Processing & Overfitting
15. Learning Rate & Data Processing & Overfitting
explain of learning rate, data preprocessing and overfitting in machine learning
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s0/
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s72-c/
Universe In Computer
https://kunicom.blogspot.com/2017/07/15-learning-rate-data-processing.html
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/2017/07/15-learning-rate-data-processing.html
true
2543631451419919204
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy