05. Gradient Descent for Linear Regression

Gradient descent for linear regression

tensorflow

TOC

Gradient Descent Algorithm

  • A first-order iterative optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point. - Wiki
Image 1. Gradient descent
  • Gradient Descent Algorithm repeats to track down as much as the slope at the current point along the cost until it finds the local minimum.

  • New W and b are the subtraction between the old values and the multiply of the learning rate and partial derivatives of the cost function at the current point, W and b.

$$ W = W - \alpha {\partial \over\partial W} cost(W, b) $$

$$ b = b - \alpha {\partial \over\partial b} cost(W, b) $$

  • \( \alpha \) is a learning rate. learning rate is how much jump along the cost function. If it is large, W and b will changed a lot.
  • learning rate is a tunable value, and usually it is less than 1. Also as test runs, it is good to change the values small.

Gradient Descent Algorithm for Linear Regression

  • Gradient Descent Algorithm

$$ W_{new} = W - \alpha {\frac{1}{m} \sum_{i=1}^m ((W * x_{i})-y_{i}) * x_i} $$

  • Proof of Gradient Descent Algorithm for simplified linear regression

$$ cost(W) = \frac{1}{m} \sum_{i=1}^m ((W * x_{i})-y_{i})^2 $$ $$ W_{new} = W - \alpha {\partial \over\partial W} cost(W) $$ $$ W_{new} = W - \alpha {\partial \over\partial W} {\frac{1}{m} \sum_{i=1}^m ((W * x_{i})-y_{i})^2} $$ $$ W_{new} = W - \alpha {\partial \over\partial W} {\frac{1}{2m^{'}} \sum_{i=1}^m ((W * x_{i})-y_{i})^2} $$ $$ W_{new} = W - \alpha {\frac{1}{2m^{'}} \sum_{i=1}^m 2((W * x_{i})-y_{i}) * x_i} $$ $$ W_{new} = W - \alpha {\frac{1}{m} \sum_{i=1}^m ((W * x_{i})-y_{i}) * x_i} $$

import numpy as np
import matplotlib.pyplot as plt

def calc_cost(w):
    # hypothesis = W * x
    hypo = [w * _x for _x in x]

    _mse = list(map(lambda _hypo, _answer : \
                            (_hypo - _answer) ** 2, hypo, y))
    sumMse = sum(_mse)

    # 1 / m * sum(W * x - y)^2
    return 1 / (len(W)) * sumMse

# Set size of plot
width = 8
height = 7
plt.figure(figsize=(width, height))

# number of points
num_points = 300

# For the sample data
# Answer values
x_ans = [i * 0.01 for i in range(-200, 200)]
W_ans = 0.1

# Generate normal random values for input and output
x = [np.random.normal(0.0, 0.55) for i in range(num_points + 1)]
y = [W_ans * _x + np.random.normal(0.0, 0.03) for _x in x]

# For the Mean Square Error
# Weight from 0 to 0.2
W = [i * 0.001 for i in range(0, 201)]
# Cost
cost = []

# Draw cost function
for w in W:
    cost.append(calc_cost(w))

plt.plot(W, cost, "r")

# Start from particular points
w = 0.18
learning_rate = 0.1

# Cost and Gradient descent
for i in range(10):
  	# Calculate cost
    _cost = calc_cost(w)

    # Draw calculated value
    plt.plot(w, _cost, "o", \
                label="W: {0:5.3f}, Cost(W): {1:5.5f}".format(w, _cost))

    # Calculate gradient descent
    gradients = list(map(\
        lambda _input, _answer : ((w * _input) - _answer) * _input, x, y))

    sumGrad = sum(gradients)

    # Descent and update w
    w = w - learning_rate / len(W) * sumGrad

plt.title("Gradient Descent Algorithm")
plt.xlim(0.15, 0.19)
plt.ylim(0.0025, 0.0045)
plt.xlabel("W")
plt.ylabel("Cost(W)")
plt.grid()
plt.legend(numpoints=1,loc='lower right')
plt.show()
Image 2. Output

Convex Function

  • Gradient Descent Algorithm uses partial derivative, so it has a problem.
  • It tries to find a point which is local minimum. However, local minimum can be placed in several points, not only one, like below image. Therefore, we should find global minimum which is the lowest point.
Image 3. Non convex function - SRC
  • Therefore, the starting value is very important for this case.

  • However, if cost function is a convex function, this problem is no matter.

  • Convec function has only one local minimum like below image. That is the global minimum.

convex
Image 4. Convex function - SRC
  • Therefore, when you solve a problem with machine learning, you should try to find cost function which is convex function. If not, you need some techniques to find better starting point and make machine learning with multiple starting points.

COMMENTS

Name

0 weights,1,abstract class,1,active function,3,adam,2,Adapter,1,affine,2,argmax,1,back propagation,3,binary classification,3,blog,2,Bucket list,1,C++,11,Casting,1,cee,1,checkButton,1,cnn,3,col2im,1,columnspan,1,comboBox,1,concrete class,1,convolution,2,cost function,6,data preprocessing,2,data set,1,deep learning,31,Design Pattern,12,DIP,1,django,1,dnn,2,Don't Repeat Your code,1,drop out,2,ensemble,2,epoch,2,favicon,1,fcn,1,frame,1,gradient descent,5,gru,1,he,1,identify function,1,im2col,1,initialization,1,Lab,9,learning rate,2,LifeLog,1,linear regression,6,logistic function,1,logistic regression,3,logit,3,LSP,1,lstm,1,machine learning,31,matplotlib,1,menu,1,message box,1,mnist,3,mse,1,multinomial classification,3,mutli layer neural network,1,Non Virtual Interface,1,normalization,2,Note,21,numpy,4,one-hot encoding,3,OOP Principles,2,Open Close Principle,1,optimization,1,overfitting,1,padding,2,partial derivative,2,pooling,2,Prototype,1,pure virtual function,1,queue runner,1,radioButton,1,RBM,1,regularization,1,relu,2,reshape,1,restricted boltzmann machine,1,rnn,2,scrolledText,1,sigmoid,2,sigmoid function,1,single layer neural network,1,softmax,6,softmax classification,3,softmax cross entropy with logits,1,softmax function,2,softmax regression,3,softmax-with-loss,2,spinBox,1,SRP,1,standardization,1,sticky,1,stride,1,tab,1,Template Method,1,TensorFlow,31,testing data,1,this,2,tkinter,5,tooltip,1,Toplevel,1,training data,1,vanishing gradient,1,Virtual Copy Constructor,1,Virtual Destructor,1,Virtual Function,1,weight decay,1,xavier,2,xor,3,
ltr
item
Universe In Computer: 05. Gradient Descent for Linear Regression
05. Gradient Descent for Linear Regression
Gradient descent for linear regression
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s0/
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s72-c/
Universe In Computer
https://kunicom.blogspot.com/2017/06/05-gradient-descent-for-linear.html
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/2017/06/05-gradient-descent-for-linear.html
true
2543631451419919204
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy