Concept, active function and output function of softmax regression
TOC
Multinomial Classification
- Softmax Classification = Softmanx Regression
- Finds in which category the input is involved.
import numpy as np
import matplotlib.pyplot as plt
nrDots = 10
def createDots(xMean, yMean, form):
x = [np.random.normal(xMean, 0.5, 1) for i in range(nrDots + 1)]
y = [np.random.normal(yMean, 0.5, 1) for i in range(nrDots + 1)]
plt.plot(x, y, form)
dotConfigs = [(3, 5, "bo"),
(1, 1, "ro"),
(5, 2, "go")]
for conf in dotConfigs:
createDots(conf[0], conf[1], conf[2])
lineConfigs = [(1/6, 3, "b"),
(-7, 15, "r"),
(6, -20, "g")]
x = [i for i in range(-1, 7)]
for conf in lineConfigs:
y = [i * conf[0] + conf[1] for i in x]
plt.plot(x, y, conf[2])
plt.xlim(-1, 6)
plt.ylim(-1, 6)
plt.xlabel("x")
plt.ylabel("y")
plt.title("Mutlinomial Classification")
plt.show()
Image 1. Multinomial classification
- In this example, dots are classified in blue, red and green, and the lines show their borders.
- It is multinomial classification to find the line which divides True and False areas for each color. This means multinomial classification is the combination of binary classifications.
- However, multinomial classification has different active function and cost function to make simple.
- Affine function is the base of neural network, so it is used here too.
$$ Active(W \cdot X) = Y_p $$ $$ Affine = W \cdot X = \begin{bmatrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \\ w_{31} & w_{32} & w_{33} \end{bmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} w_{11}x_1 + w_{12}x_2 + w_{13}x_3 \\ w_{21}x_1 + w_{22}x_2 + w_{23}x_3 \\ w_{31}x_1 + w_{32}x_2 + w_{33}x_3 \end{bmatrix} $$ $$ Active(Affine) = \begin{bmatrix} y_p \\ y_p \\ y_p \end{bmatrix} = Y_p$$
- Therefore, \( Active(Affine) \) is the hypothesis of multinomial classification.
- Until now, \( Y \) is called as answer, but from here it is called as label. Also, \(Y_p \) is predicted label.
Softmax Function
- A generalization of the logistic function that "squashes" a K-dimensional vector z of arbitrary real values to a K-dimensional vector \( \sigma(z) \) of real values in the ranges 0,1 that add up to 1. - Wiki
- Simply speaking, softmax function returns probability of the label.
$$ S(y_i) = \frac{e^{y_i}}{\sum_j e^{y_j}} $$
import numpy as np
# Softmax function
def softmax(a):
max_a = np.max(a) # To prevent overflow
# Exponential could have too large value.
# max_a will be downscale the original value.
exp_a = np.exp(a - max_a)
sum_exp_a = np.sum(exp_a)
return exp_a / sum_exp_a
x = np.random.rand(5,1)
y = softmax(x)
print(" {0:7} {1:7}".format("X", "S(X)"))
for i in range(len(x)):
print(" {0:7.6f} {1:7.6f}".format(x[i][0], y[i][0]))
print("Sum {0:7.6f} {1:7.6f}".format(np.sum(x), np.sum(y)))
X S(X)
0.717855 0.207212
0.620096 0.187914
0.505516 0.167570
0.541942 0.173787
0.958234 0.263518
Sum 3.343643 1.000000
- In this example, each input x are transformed to its probability with softmax function.
- Because of this feature of softmax function, it is used as active function of multinomial classification.
- As we learn from binary classification, if output is between 0 and 1, the neural network become strong from redundant odd inputs.
Output Function
- From softmax function, we know which one has the highest probability. However, we have not still set True or False, yet.
- To do this, multinomial classification uses max function.
- Therefore, the one which has the highest probability is True, and others are False.
- Also, to know the index of True, argmax() is widely used.
- The index is started from 0.
import numpy as np
# Softmax function
def softmax(a):
max_a = np.max(a) # To prevent overflow
# Exponential could have too large value.
# max_a will be downscale the original value.
exp_a = np.exp(a - max_a)
sum_exp_a = np.sum(exp_a)
return exp_a / sum_exp_a
x = np.random.rand(5,1)
y = softmax(x)
max_i = np.argmax(y)
print(" {0:7} {1:7}".format("X", "S(X)"))
for i in range(len(x)):
print(" {0:7.6f} {1:7.6f}".format(x[i][0], y[i][0]))
print("Sum {0:7.6f} {1:7.6f}".format(np.sum(x), np.sum(y)))
print("Biggest index: {0}, value: {1:7.6f}".format(max_i, x[max_i][0]))
X S(X)
0.471060 0.177648
0.557162 0.193621
0.729981 0.230148
0.370831 0.160705
0.763011 0.237877
Sum 2.892045 1.000000
Biggest index: 4, value: 0.763011
One-Hot Encoding
- a group of bits among which the legal combinations of values are only those with a single high(1) bit and all the others low(0). - Wiki
- In a bit stream, only one digit is 1 and the others are 0.
- This is the usual output format of multinomial classification.
- In multinomial classification, 1 means the neural network inferences that this input is involved in this label.
import numpy as np
# Softmax function
def softmax(a):
max_a = np.max(a) # To prevent overflow
# Exponential could have too large value.
# max_a will be downscale the original value.
exp_a = np.exp(a - max_a)
sum_exp_a = np.sum(exp_a)
return exp_a / sum_exp_a
x = np.random.rand(5,1)
y = softmax(x)
max_i = np.argmax(y)
oh = np.zeros((5, 1))
oh[max_i] = 1
print(" {0:7} {1:7} {2:10}".format("X", "S(X)", "Prediction"))
for i in range(len(x)):
print(" {0:7.6f} {1:7.6f} {2}".format(x[i][0], y[i][0], oh[i][0]))
print("Sum {0:7.6f} {1:7.6f} {2}".format(np.sum(x), np.sum(y), np.sum(oh)))
X S(X) Prediction
0.332139 0.181766 0.0
0.299237 0.175883 0.0
0.011281 0.131876 0.0
0.531769 0.221928 0.0
0.794273 0.288546 1.0
Sum 1.968699 1.000000 1.0
COMMENTS