Gradient descent for linear regression

tensorflow

Gradient Descent Algorithm

A first-order iterative optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point. - Wiki

Image 1. Gradient descent

Gradient Descent Algorithm repeats to track down as much as the slope at the current point along the cost until it finds the local minimum.
New W and b are the subtraction between the old values and the multiply of the learning rate and partial derivatives of the cost function at the current point, W and b.

$$ W = W - \alpha {\partial \over\partial W} cost(W, b) $$

$$ b = b - \alpha {\partial \over\partial b} cost(W, b) $$

$ \alpha $ is a learning rate. learning rate is how much jump along the cost function. If it is large, W and b will changed a lot.
learning rate is a tunable value, and usually it is less than 1. Also as test runs, it is good to change the values small.

Gradient Descent Algorithm for Linear Regression

Gradient Descent Algorithm

$$ W_{new} = W - \alpha {\frac{1}{m} \sum_{i=1}^m ((W * x_{i})-y_{i}) * x_i} $$

Proof of Gradient Descent Algorithm for simplified linear regression

$$ cost(W) = \frac{1}{m} \sum_{i=1}^m ((W * x_{i})-y_{i})^2 $$ $$ W_{new} = W - \alpha {\partial \over\partial W} cost(W) $$ $$ W_{new} = W - \alpha {\partial \over\partial W} {\frac{1}{m} \sum_{i=1}^m ((W * x_{i})-y_{i})^2} $$ $$ W_{new} = W - \alpha {\partial \over\partial W} {\frac{1}{2m^{'}} \sum_{i=1}^m ((W * x_{i})-y_{i})^2} $$ $$ W_{new} = W - \alpha {\frac{1}{2m^{'}} \sum_{i=1}^m 2((W * x_{i})-y_{i}) * x_i} $$ $$ W_{new} = W - \alpha {\frac{1}{m} \sum_{i=1}^m ((W * x_{i})-y_{i}) * x_i} $$

import numpy as np
import matplotlib.pyplot as plt

def calc_cost(w):
    # hypothesis = W * x
    hypo = [w * _x for _x in x]

    _mse = list(map(lambda _hypo, _answer : \
                            (_hypo - _answer) ** 2, hypo, y))
    sumMse = sum(_mse)

    # 1 / m * sum(W * x - y)^2
    return 1 / (len(W)) * sumMse

# Set size of plot
width = 8
height = 7
plt.figure(figsize=(width, height))

# number of points
num_points = 300

# For the sample data
# Answer values
x_ans = [i * 0.01 for i in range(-200, 200)]
W_ans = 0.1

# Generate normal random values for input and output
x = [np.random.normal(0.0, 0.55) for i in range(num_points + 1)]
y = [W_ans * _x + np.random.normal(0.0, 0.03) for _x in x]

# For the Mean Square Error
# Weight from 0 to 0.2
W = [i * 0.001 for i in range(0, 201)]
# Cost
cost = []

# Draw cost function
for w in W:
    cost.append(calc_cost(w))

plt.plot(W, cost, "r")

# Start from particular points
w = 0.18
learning_rate = 0.1

# Cost and Gradient descent
for i in range(10):
  	# Calculate cost
    _cost = calc_cost(w)

    # Draw calculated value
    plt.plot(w, _cost, "o", \
                label="W: {0:5.3f}, Cost(W): {1:5.5f}".format(w, _cost))

    # Calculate gradient descent
    gradients = list(map(\
        lambda _input, _answer : ((w * _input) - _answer) * _input, x, y))

    sumGrad = sum(gradients)

    # Descent and update w
    w = w - learning_rate / len(W) * sumGrad

plt.title("Gradient Descent Algorithm")
plt.xlim(0.15, 0.19)
plt.ylim(0.0025, 0.0045)
plt.xlabel("W")
plt.ylabel("Cost(W)")
plt.grid()
plt.legend(numpoints=1,loc='lower right')
plt.show()

Image 2. Output

Convex Function

Gradient Descent Algorithm uses partial derivative, so it has a problem.
It tries to find a point which is local minimum. However, local minimum can be placed in several points, not only one, like below image. Therefore, we should find global minimum which is the lowest point.

Image 3. Non convex function - SRC

Therefore, the starting value is very important for this case.
However, if cost function is a convex function, this problem is no matter.
Convec function has only one local minimum like below image. That is the global minimum.

Image 4. Convex function - SRC

Therefore, when you solve a problem with machine learning, you should try to find cost function which is convex function. If not, you need some techniques to find better starting point and make machine learning with multiple starting points.

Universe In Computer

Header$type=social_icons

$type=grid$count=3$meta=0$sn=0$rm=0

05. Gradient Descent for Linear Regression

TOC

Gradient Descent Algorithm

Gradient Descent Algorithm for Linear Regression

Convex Function

라벨:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts

$type=grid$count=3$meta=0$sn=0$rm=0

05. Gradient Descent for Linear Regression

TOC

Gradient Descent Algorithm

Gradient Descent Algorithm for Linear Regression

Convex Function

라벨:

SHARE:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts