13. Cost Function for Multinomial Classification

Cost function and gradient descent for softmax regression

tensorflow

TOC

Cost Function for Multinomial Classification

  • Like binary classification, multinomial classification uses Cross Entropy Error (CEE) function as a cost function.
  • This is because multinomial classification is the extended version of binary classification, and its active function is softmax function which is composed of exponential function. - Cost Function of Binary Classification

$$ c(\dot{y}, y) = -\sum_{i}{y \cdot \log{\dot{y}}} $$

  • \( Y \) is the labels, and \( \dot{Y} \) is the predicted categories or the output of hypothesis.
  • Above equation can express binary classification. There are only two options, so the sum of their probability is 1. It satisfies softmax function. Also, \( y \) is 0 or 1. As a result,

$$ c(\dot{y}, y) = -(y \cdot \log{\dot{y}}) - ((1 - y) \cdot \log{(1 - \dot{y})}) $$

  • Also, for multinomial classification, \( y \) is 0 or 1. If \( y \) is 1, the input indicates this is involved in the label. If 0, this is not involved in the label.
  • As we know, CEE gives an high value when the prediction is wrong, but it gives an low value when the prediction is correct.

$$ y_i = w \cdot x_i + b $$ $$ H(y_i) = \frac{e^{y_i}}{\sum_{j} e^{y_j)}} $$ $$ c(H(y_i), y_i) = -\sum_{i}{y_i \cdot \log{H(y_i)}} $$ $$ cost(w) = \frac{1}{m}{\sum_{i=0}^{m} c(H(y_i), y_i)} $$

  • However, unlike binary classification, multinomial classification has multiple weights as much as the number of labels.
    • The labels of binary classification are mutually exclusive, so only one weight is fine. H(x) and (1 - H(x))
import numpy as np
import matplotlib.pyplot as plt

# Input, 0 ~ 9
X = [i * 0.01 for i in range(1, 900)]
# Y is just used to show the area of labels.
# Therefore, the value of Y is meaningless.
Y = [np.random.uniform() for i in X]

# Label
# A: 0 ~ 3, Red
# B: 3 ~ 6, Green
# C: 6 ~ 9, Blue
plt.plot(X[:300], Y[:300], "r.", label="A")
plt.plot(X[300:600], Y[300:600], "g.", label="B")
plt.plot(X[600:], Y[600:], "b.", label="C")
plt.ylim(-0.2, 1.2)
plt.title("Area of A, B, C")
plt.grid()
plt.legend(numpoints=1)
plt.show()

# Basis
X = [i * 0.01 for i in range(1, 900)]
# Hypothesis. 
# This is between 0 and 1 because it is from softmax function
H = [i * 0.01 for i in range(1, 100)]

# Labels
# because there are three labels, Y has three row.
# Also, one-hot encoding is used to represent the labels.
Y = []
for i in X:
    if i < 3.0:
        Y.append([1, 0, 0])
    elif (i >= 3.0) & (i < 6.0):
        Y.append([0, 1, 0])
    else:
        Y.append([0, 0, 1])

# Single Cross Entropy Error
#  For multinomial classification,
#  h is the same as the probability.
#  To make problem simple,
#  the probability of label B is out concern.
#  So, h is the probabilty of B.
#  For A and C, their probabilities are (1-h)/2
cost_a = lambda _y : - _y[0] * np.log((1-h)/2)
cost_b = lambda _y : - _y[1] * np.log((h))
cost_c = lambda _y : - _y[2] * np.log((1-h)/2)
cost = lambda _y : \
        - _y[0] * np.log((1-h)/2) \
        - _y[1] * np.log((h)) \
        - _y[2] * np.log((1-h)/2)

# Lists for outputs
costs_a = []
costs_b = []
costs_c = []
costs = []

# h is the probability of label B.
for h in H:
    # Cost for A
    diffSqrts_a = list(map(cost_a, Y))
    sumDiffSqrt_a = sum(diffSqrts_a)
    costs_a.append(sumDiffSqrt_a / len(H))
    
    # Cost for B
    diffSqrts_b = list(map(cost_b, Y))
    sumDiffSqrt_b = sum(diffSqrts_b)
    costs_b.append(sumDiffSqrt_b / len(H))
    
    # Cost for C
    diffSqrts_c = list(map(cost_c, Y))
    sumDiffSqrt_c = sum(diffSqrts_c)
    costs_c.append(sumDiffSqrt_c / len(H))
    
    # Cost for All
    diffSqrts = list(map(cost, Y))
    sumDiffSqrt = sum(diffSqrts)
    costs.append(sumDiffSqrt / len(H))
          
# Graphs
plt.plot(H, costs_a, "rv", label="A") 
plt.plot(H, costs_b, "g.", label="B")
plt.plot(H, costs_c, "b^", label="C")
plt.plot(H, costs, "k*", label="cost(h)")

plt.title("Cross Entropy Error")
plt.xlabel("H")
plt.ylabel("Cost(H)")
plt.xlim(-0.1, 1.1)
plt.grid()
plt.legend(loc="upper center", numpoints=1)
plt.show()
output_1_0
Image 1. Input and labels
output_1_1
Image 1. Cost function for multinomial classification
  • As the result of python, the shape of cost(h) is similar to the shape of 2-dimensional equation, so it has global minimum.

Gradient Descent for Multinomial Classification

  • Because the cost function of multinomial classification has global minimum, gradient descent can be applied.
  • However, since multinomial classification has many variables to be optimized, its partial differential is too complicated to represent with math. Therefore, here only shows python code for gradient descent for multinomial classification.
    • Mathematical approach will be explained later.
import numpy as np
import matplotlib.pyplot as plt

# Hypothesis
hypos = {}
hypos["A"] = lambda _w, _x: \
        (np.exp(_w[0] * _x)) / \
        (np.exp(_w[0] * _x) + np.exp(_w[1] * _x) + np.exp(_w[2] * _x))
hypos["B"] = lambda _w, _x: \
        (np.exp(_w[1] * _x)) / \
        (np.exp(_w[0] * _x) + np.exp(_w[1] * _x) + np.exp(_w[2] * _x))
hypos["C"] = lambda _w, _x: \
        (np.exp(_w[2] * _x)) / \
        (np.exp(_w[0] * _x) + np.exp(_w[1] * _x) + np.exp(_w[2] * _x))
        
# Cost
# label "B" is our concern.
# Therefore, base probability is for "B"
# This example assume "A" and "C" is half of remaining of "B"
cost = lambda _hypo, _y : \
        - _y[0] * np.log((1 - _hypo) / 2) \
        - _y[1] * np.log((_hypo)) \
        - _y[2] * np.log((1 - _hypo) / 2)

# Cost and sum of cost
def calc_cost(w, X, Y, label):
    W = [w for i in range(len(X))]
    _hypo = list(map(hypos[label], W, X))
    _sum = sum(list(map(cost, _hypo, Y))) / len(W) 
        
    return _sum 

# Gradient
def calc_gradient(w, X, Y, label, idx):
    # Euler method to calculate derivative
    # h is delta of W
    h = 1e-2
    
    # For partial derivatives
    tmp_val = w[idx]
        
    # Calculate forward values
    w[idx] = tmp_val + h
    fxh1 = calc_cost(w, X, Y, label)
    
    # Calculate backward values
    w[idx] = tmp_val - h
    fxh2 = calc_cost(w, X, Y, label)
    
    # Calculate the diff
    grad = (fxh1 - fxh2) / (2*h)
    
    w[idx] = tmp_val
    
    return grad

# Training
def train(w, x, label):
    Y = None
    idx = None
    
    if label is "A":
        Y = [A for i in range(len(X))]
        idx = 0
    elif label is "B":
        Y = [B for i in range(len(X))]
        idx = 1
    elif label is "C":
        Y = [C for i in range(len(X))]
        idx = 2
    else:
        print("Error for label")
        
    _cost = calc_cost(w, X, Y, label)
    _grad = calc_gradient(w, X, Y, label, idx)
    
    return _cost, _grad

# Input
X = [i * 0.01 for i in range(1, 900)]
# Label
A = [1.0, 0.0, 0.0]
B = [0.0, 1.0, 0.0]
C = [0.0, 0.0, 1.0]

# Weight
w = [0.5, 0.5, 0.5]
# Learning rate
lr = 0.01

costs = {}
costs["A"] = []
costs["B"] = []
costs["C"] = []
trials = [i for i in range(10)]

print("step   W[0]   W[1]   W[2]")
for t in trials:
    # For A
    cost_a, grad_a = train(w, x, "A")
    costs["A"].append(cost_a)
    
    # For B
    cost_b, grad_b = train(w, x, "B")
    costs["B"].append(cost_b)
    
    # For C
    cost_c, grad_c = train(w, x, "C")
    costs["C"].append(cost_c)
    
    # Update weight
    w[0] = w[0] - lr * grad_a
    w[1] = w[1] - lr * grad_b
    w[2] = w[2] - lr * grad_c
    
    print("   {0}   {1:3.2f}   {2:3.2f}   {3:3.2f}".format(t, w[0], w[1], w[2]))
    
    plt.plot(t, cost_a, "r^")
    plt.plot(t, cost_b, "g.")
    plt.plot(t, cost_c, "bv")

plt.plot(trials, costs["A"], "r", label="A, up triangle")
plt.plot(trials, costs["B"], "g", label="B, dot")
plt.plot(trials, costs["C"], "b", label="C, down triangle")
plt.xlabel("trials")
plt.ylabel("cost")
plt.grid()
plt.legend(numpoints=1, loc="best")
plt.show()
trials   W[0]   W[1]   W[2]
     0   0.48   0.53   0.48
     1   0.47   0.56   0.47
     2   0.46   0.58   0.46
     3   0.45   0.60   0.45
     4   0.44   0.62   0.44
     5   0.43   0.64   0.43
     6   0.42   0.66   0.42
     7   0.41   0.67   0.41
     8   0.41   0.69   0.41
     9   0.40   0.70   0.40
output_4_1
Image 1. Gradient descent for multinomial classification

COMMENTS

Name

0 weights,1,abstract class,1,active function,3,adam,2,Adapter,1,affine,2,argmax,1,back propagation,3,binary classification,3,blog,2,Bucket list,1,C++,11,Casting,1,cee,1,checkButton,1,cnn,3,col2im,1,columnspan,1,comboBox,1,concrete class,1,convolution,2,cost function,6,data preprocessing,2,data set,1,deep learning,31,Design Pattern,12,DIP,1,django,1,dnn,2,Don't Repeat Your code,1,drop out,2,ensemble,2,epoch,2,favicon,1,fcn,1,frame,1,gradient descent,5,gru,1,he,1,identify function,1,im2col,1,initialization,1,Lab,9,learning rate,2,LifeLog,1,linear regression,6,logistic function,1,logistic regression,3,logit,3,LSP,1,lstm,1,machine learning,31,matplotlib,1,menu,1,message box,1,mnist,3,mse,1,multinomial classification,3,mutli layer neural network,1,Non Virtual Interface,1,normalization,2,Note,21,numpy,4,one-hot encoding,3,OOP Principles,2,Open Close Principle,1,optimization,1,overfitting,1,padding,2,partial derivative,2,pooling,2,Prototype,1,pure virtual function,1,queue runner,1,radioButton,1,RBM,1,regularization,1,relu,2,reshape,1,restricted boltzmann machine,1,rnn,2,scrolledText,1,sigmoid,2,sigmoid function,1,single layer neural network,1,softmax,6,softmax classification,3,softmax cross entropy with logits,1,softmax function,2,softmax regression,3,softmax-with-loss,2,spinBox,1,SRP,1,standardization,1,sticky,1,stride,1,tab,1,Template Method,1,TensorFlow,31,testing data,1,this,2,tkinter,5,tooltip,1,Toplevel,1,training data,1,vanishing gradient,1,Virtual Copy Constructor,1,Virtual Destructor,1,Virtual Function,1,weight decay,1,xavier,2,xor,3,
ltr
item
Universe In Computer: 13. Cost Function for Multinomial Classification
13. Cost Function for Multinomial Classification
Cost function and gradient descent for softmax regression
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s0/
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s72-c/
Universe In Computer
https://kunicom.blogspot.com/2017/07/13-cost-function-for-multinomial.html
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/2017/07/13-cost-function-for-multinomial.html
true
2543631451419919204
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy