Multi-variables for linear regression
TOC
- 3 Equations for Linear Regression
- Multi-variable for Linear Regression
- Hypothesis & Cost Function for Multi-variable
- Matrix
- Determine the Size of Weight and Bias
3 Equations for Linear Regression
- Hypothesis
$$ H(x) = Wx + b $$
- Cost function
$$ cost(W,b) = \frac{1}{m} \sum_{i=1}^m (H(x_i) - y_i)^ 2 $$
- Gradient descent
$$ W_new = W - \alpha {\frac{1}{m} \sum_{i=1}^m ((W * x_{i}) - y_{i}) * x_i} $$
Multi-variable for Linear Regression
-
If one data set consists of multiple inputs, how do we care all of them?
-
If there are multiple data sets, what should we do for them?
-
Example:
- We will predict the last test score from previous 3 test scores.
- There are 6 students, and we know all scores for 5 students and 3 previous scores for target student.
Student | test1 | test2 | test3 | final |
---|---|---|---|---|
A | 70 | 80 | 75 | 75 |
B | 90 | 90 | 90 | 95 |
C | 50 | 70 | 80 | 70 |
D | 80 | 90 | 85 | 95 |
E | 90 | 95 | 90 | 95 |
F | 80 | 80 | 80 | ? |
Hypothesis & Cost Function for Multi-variable
- Hypothesis
$$ H(x_1, x_2, x_3, ..., x_n) = w_1x_1 + w_2x_2 + w_3x_3 + ... + w_nx_n b $$
- Cost function
$$ cost(W, b) = \frac{1}{m} \sum_{i=1}^m (H(x_{i1}, x_{i2}, x_{i3}, ..., x_{in}) - y_i)^ 2 $$
- Gradient descent equation is difficult to show itself as simple equation, because it derives partially n weight parameters.
Matrix
- A rectangular array of numbers, symbols, or expressions, arranged in row and columns.1
-
Matrix is a good mathematical expression for multi-variables.
-
For multi-variables, matrix can expand columns.
-
Matrix expression example for single data set with multi-variables.
$$ w_!x_1 + w_2x_2 + w_3x_3 + ... + w_nx_n $$
$$ \begin{bmatrix} x_1 & x_2 & x_3 \end{bmatrix} \cdot \begin{bmatrix} w_1 \\ w_2 \\ w_3 \end{bmatrix} = \begin{bmatrix} x_1w_1 + x_2w_2 + x_3w_3 \end{bmatrix} $$
- For multiple data sets(instances), matrix can expand rows.
- Matrix expression example for multi-instances with multi-variables.
$$ \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \\ x_{41} & x_{42} & x_{43} \end{bmatrix} \cdot \begin{bmatrix} w_1 \\ w_2 \\ w_3 \end{bmatrix} = \begin{bmatrix} x_{11}w_1 + x_{12}w_2 + x_{13}w_3 \\ x_{21}w_1 + x_{22}w_2 + x_{23}w_3 \\ x_{31}w_1 + x_{32}w_2 + x_{33}w_3 \\ x_{41}w_1 + x_{42}w_2 + x_{43}w_3 \end{bmatrix} $$
- These expressions are so complex, so we use simplified matrix expression.
$$ H(X) = XW + B $$
- Usually, capital letter means matrix.
- Previously, we used Wx, but here is XW because of follwing matrix rule.
Determine the Size of Weight and Bias
- For Deep Learning, inputs and answer are given, but we need to design weight and bias.
- From the input and answer, the size od weight and bias are determined.
- At the previous examples, input is 4 x 3 matrix, and answer is 4 x 1 matrix.
- In this case, weight is 3 * 1. The number of rows should be the same as the number of input's column, and the number of column should be the same as the number of answer's column. This is matrix multiplication's rule.2 If they are not the same, we could not calculate the matrix multiplication.
- For bias, its row size is the same as input's row, and its column size is 1. Because bias is a constant value to be added, so it just needs to be the same number of instances.
import numpy as np
import matplotlib.pyplot as plt
nr_student = 5
def calc_cost(W):
# hypothesis = X * W
hypo = np.dot(X, W)
_mse = list(map(\
lambda _hypo, _answer : (_hypo[0] - _answer[0]) ** 2, hypo, Y))
sumMse = sum(_mse)
# 1 / m * sum(X * W - Y)^2
return 1 / nr_student * sumMse
def calc_gradient(W):
# Euler method to calculate derivative
# h is delta of W
h = 1e-4
# Defines the same size matrix with W
grad = np.zeros_like(W)
# For partial derivatives
for i in range(W.size):
tmp_val = W[i]
# Calculate forward values
W[i] = tmp_val + h
fxh1 = calc_cost(W)
# Calculate backward values
W[i] = tmp_val - h
fxh2 = calc_cost(W)
# Calculate the diff
grad[i] = (fxh1 - fxh2) / (2*h)
W[i] = tmp_val
return grad
# Input
X = np.array([ [70, 80, 75], \
[90, 90, 90], \
[50, 70, 80], \
[80, 90, 85], \
[90, 95, 90] ])
# Answer
Y = np.array([ [75], \
[95], \
[70], \
[95], \
[95] ])
# Weight
# W = np.full((3, 1), np.random.normal(0, 10))
# In this test, we have only 5 instances.
# Therefore, if W is started from too far, the result will be weird.
# So, I just picked closed value for the pre-trained W.
W = np.array([ [0.4], \
[0.15], \
[0.5] ])
# Learning rate
learning_rate = 0.0001
costs = []
steps = []
nb_train = 10
# Training
for i in range(nb_train):
# Calculate cost
_cost = calc_cost(W)
# Calculate gradient descent
gradients = calc_gradient(W)
# Value for descent
DV = learning_rate / nr_student * gradients
# Update W
W = W - DV
steps.append(i)
costs.append(_cost)
# Test
x = np.array([80, 80, 80])
y = np.dot(x, W)
print("Answer: {0}".format(y))
plt.plot(steps, costs, label="Costs")
plt.xlabel("trial")
plt.ylabel("Cost(W)")
plt.grid()
plt.show()
Answer: [ 84.86409948]
COMMENTS