Multi-variables for linear regression

tensorflow

3 Equations for Linear Regression

Hypothesis

$$ H(x) = Wx + b $$

Cost function

$$ cost(W,b) = \frac{1}{m} \sum_{i=1}^m (H(x_i) - y_i)^ 2 $$

Gradient descent

$$ W_new = W - \alpha {\frac{1}{m} \sum_{i=1}^m ((W * x_{i}) - y_{i}) * x_i} $$

Multi-variable for Linear Regression

If one data set consists of multiple inputs, how do we care all of them?
If there are multiple data sets, what should we do for them?
Example:
- We will predict the last test score from previous 3 test scores.
- There are 6 students, and we know all scores for 5 students and 3 previous scores for target student.

Student	test1	test2	test3	final
A	70	80	75	75
B	90	90	90	95
C	50	70	80	70
D	80	90	85	95
E	90	95	90	95
F	80	80	80	?

Table 1. Scores of Students

Hypothesis & Cost Function for Multi-variable

Hypothesis

$$ H(x_1, x_2, x_3, ..., x_n) = w_1x_1 + w_2x_2 + w_3x_3 + ... + w_nx_n b $$

Cost function

$$ cost(W, b) = \frac{1}{m} \sum_{i=1}^m (H(x_{i1}, x_{i2}, x_{i3}, ..., x_{in}) - y_i)^ 2 $$

Gradient descent equation is difficult to show itself as simple equation, because it derives partially n weight parameters.

Matrix

A rectangular array of numbers, symbols, or expressions, arranged in row and columns.1

Image 1. Matrix

Matrix is a good mathematical expression for multi-variables.
For multi-variables, matrix can expand columns.
Matrix expression example for single data set with multi-variables.

$$ w_!x_1 + w_2x_2 + w_3x_3 + ... + w_nx_n $$

$$ \begin{bmatrix} x_1 & x_2 & x_3 \end{bmatrix} \cdot \begin{bmatrix} w_1 \\ w_2 \\ w_3 \end{bmatrix} = \begin{bmatrix} x_1w_1 + x_2w_2 + x_3w_3 \end{bmatrix} $$

For multiple data sets(instances), matrix can expand rows.
Matrix expression example for multi-instances with multi-variables.

$$ \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \\ x_{41} & x_{42} & x_{43} \end{bmatrix} \cdot \begin{bmatrix} w_1 \\ w_2 \\ w_3 \end{bmatrix} = \begin{bmatrix} x_{11}w_1 + x_{12}w_2 + x_{13}w_3 \\ x_{21}w_1 + x_{22}w_2 + x_{23}w_3 \\ x_{31}w_1 + x_{32}w_2 + x_{33}w_3 \\ x_{41}w_1 + x_{42}w_2 + x_{43}w_3 \end{bmatrix} $$

These expressions are so complex, so we use simplified matrix expression.

$$ H(X) = XW + B $$

Usually, capital letter means matrix.
Previously, we used Wx, but here is XW because of follwing matrix rule.

Determine the Size of Weight and Bias

For Deep Learning, inputs and answer are given, but we need to design weight and bias.
From the input and answer, the size od weight and bias are determined.
At the previous examples, input is 4 x 3 matrix, and answer is 4 x 1 matrix.
In this case, weight is 3 * 1. The number of rows should be the same as the number of input's column, and the number of column should be the same as the number of answer's column. This is matrix multiplication's rule.2 If they are not the same, we could not calculate the matrix multiplication.
For bias, its row size is the same as input's row, and its column size is 1. Because bias is a constant value to be added, so it just needs to be the same number of instances.

Image 2. Matrix multiplication

import numpy as np
import matplotlib.pyplot as plt

nr_student = 5

def calc_cost(W):
    # hypothesis = X * W
    hypo = np.dot(X, W)

    _mse = list(map(\
            lambda _hypo, _answer : (_hypo[0] - _answer[0]) ** 2, hypo, Y))

    sumMse = sum(_mse)

    # 1 / m * sum(X * W - Y)^2
    return 1 / nr_student * sumMse

def calc_gradient(W):
    # Euler method to calculate derivative
    # h is delta of W
    h = 1e-4

    # Defines the same size matrix with W
    grad = np.zeros_like(W)

    # For partial derivatives
    for i in range(W.size):
        tmp_val = W[i]

        # Calculate forward values
        W[i] = tmp_val + h
        fxh1 = calc_cost(W)

        # Calculate backward values
        W[i] = tmp_val - h
        fxh2 = calc_cost(W)

        # Calculate the diff
        grad[i] = (fxh1 - fxh2) / (2*h)

        W[i] = tmp_val

    return grad

# Input
X = np.array([ [70, 80, 75], \
               [90, 90, 90], \
               [50, 70, 80], \
               [80, 90, 85], \
               [90, 95, 90] ])

# Answer
Y = np.array([ [75], \
               [95], \
               [70], \
               [95], \
               [95] ])

# Weight
# W = np.full((3, 1), np.random.normal(0, 10))
# In this test, we have only 5 instances.
# Therefore, if W is started from too far, the result will be weird.
# So, I just picked closed value for the pre-trained W.
W = np.array([ [0.4], \
               [0.15], \
               [0.5] ])

# Learning rate
learning_rate = 0.0001

costs = []
steps = []

nb_train = 10
# Training
for i in range(nb_train):
    # Calculate cost
    _cost = calc_cost(W)

    # Calculate gradient descent
    gradients = calc_gradient(W)

    # Value for descent
    DV = learning_rate / nr_student * gradients

    # Update W
    W = W - DV

    steps.append(i)
    costs.append(_cost)

# Test
x = np.array([80, 80, 80])
y = np.dot(x, W)
print("Answer: {0}".format(y))

plt.plot(steps, costs, label="Costs")
plt.xlabel("trial")
plt.ylabel("Cost(W)")
plt.grid()
plt.show()

Answer:  [ 84.86409948]

Image 3. Costs

Universe In Computer

Header$type=social_icons

$type=grid$count=3$meta=0$sn=0$rm=0

07. Multi-variables for Linear Regression

TOC

3 Equations for Linear Regression

Multi-variable for Linear Regression

Hypothesis & Cost Function for Multi-variable

Matrix

Determine the Size of Weight and Bias

라벨:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts

$type=grid$count=3$meta=0$sn=0$rm=0

07. Multi-variables for Linear Regression

TOC

3 Equations for Linear Regression

Multi-variable for Linear Regression

Hypothesis & Cost Function for Multi-variable

Matrix

Determine the Size of Weight and Bias

라벨:

SHARE:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts