10. Cost Function for Binary Classification

Cost function and gradient descent for binary classification

Toc

tensorflow

Mean Square Error with Binary Regression

  • MSE(Mean Square Error) function is representative cost function, and it is used for linear classification.
  • Let's apply MSE to binary regression.
import numpy as np
import matplotlib.pyplot as plt

# Result of simplified hypothesis
H = [i * 0.001 for i in range(1, 1000)]

# Answer - if x is less than 0, y is 0.
#  If x is larger than 0, y is 1.
Y = [i > 0 for i in range(-100, 100)]

# Cross Entropy Error
cost0 = lambda _y : -(1-_y) * np.log(1 - h)
cost1 = lambda _y : -_y * np.log(h)
cost = lambda _y : -(1-_y) * np.log(1 - h) -_y * np.log(h)

# Lists for outputs
costs1 = []
costs0 = []
costs = []

for h in H:
    # For y = 0
    diffSqrts0 = list(map(cost0, Y))
    sumDiffSqrt0 = sum(diffSqrts0)
    costs0.append(sumDiffSqrt0 / len(H))

    # For y = 1
    diffSqrts1 = list(map(cost1, Y))
    sumDiffSqrt1 = sum(diffSqrts1)
    costs1.append(sumDiffSqrt1 / len(H))

    # For both
    diffSqrts = list(map(cost, Y))
    sumDiffSqrt = sum(diffSqrts)
    costs.append(sumDiffSqrt / len(H))

# Graphs
plt.plot(H, costs0, label="y = 0")
plt.plot(H, costs1, label="y = 1")
plt.plot(H, costs, label="y = 0 or 1")

plt.title("Cross Entropy Error")
plt.xlabel("H")
plt.ylabel("Cost(H)")
plt.xlim(-0.1, 1.1)
plt.grid()
plt.legend(loc="upper center")
plt.show()
output_1_0
Image 1. MSE for binary regression
  • Unlike our expectation, MSE for binary classification is almost reverse of sigmoid function.
  • For this graph, it is meaningless to find global minimum.
  • Therefore, MSE cannot be used as cost function for binary classification.

Cross Entropy Error(CEE) for Binary Classification

  • For binary classification, CEE is used as a cost function.
  • Simplified CEE for binary classification.

$$ H(x) = \frac{1}{1 + e^{-wx}} $$ $$ c(H(x), y) = \begin{cases} -log(H(x)) & : y = 1 \\ -log(1-H(x)) & : y = 0 \end{cases} $$ $$ c(H(x), y) = -(y \cdot \log(H(x))) - (1-y) \cdot (\log(1-H(x))) $$ $$ cost(w) = \frac{1}{m} \sum_{i=0}^m c(H(x), y) $$

import matplotlib.pyplot as plt

# Simplified hypothesis
hypo = lambda _x : 1 / (1 + np.exp(-w * _x))

# Input
X = [i * 0.01 for i in range(-200, 200)]

# Answer - if x is less than 0, y is 0.
#  If x is larger than 0, y is 1.
Y = [i > 0 for i in range(-200, 200)]

cost = lambda _output, _answer : (_output - _answer) ** 2
costs = []

W = [i * 0.1 for i in range(-1000, 1001)]

for w in W:
    _hypo = list(map(hypo, X))
    diffSqrts = list(map(cost, _hypo, Y))
    sumDiffSqrt= sum(diffSqrts)

    costs.append(sumDiffSqrt / len(W))

# Draw cost function
plt.plot(W, costs)
plt.title("Mean Square Error")
plt.xlabel("W")
plt.ylabel("Cost(W)")
plt.show()
output_4_0
Image 2. CEE for binary classification
  • To make it simple, x axis is the result of hypothesis.
  • The result of hypothesis is from sigmoid function, so it is between 0 to 1.
  • When y = 1, errors become smaller as h is bigger. if h is 1, its error is 0.
  • Errors for y = 0 show reversed shape. They are getting bigger as h is bigger.
  • Therefore, if hypothesis is close to answer y, its error is small. It not, the error is almost infinite value.
  • This is good feature as cost function.
  • Furthermore, the sum of y = 0 and y = 1 is similar to the graph of 2 dimensional equation.
  • As a result, gradient descent algorithm is effective for binary classification to find the global minimum, too.

Gradient Descent for Binary Classification

  • This is the same as the linear classification.

$$ W = W - \alpha {\partial \over\partial W} cost(W) $$

import numpy as np
import matplotlib.pyplot as plt

# Hypothesis
hypo = lambda _w, _x: 1 / (1 + np.exp(-_w * _x))
# Cost
cost = lambda _hypo, _y : -(1-_y) * np.log(1 - _hypo) -_y * np.log(_hypo)
# Gradient
def cost_gradient(w, X, Y):
    # Euler method to calculate derivative
    # h is delta of W
    h = 1e-2

    # For partial derivatives
    tmp_val = w

    # Calculate forward values
    W = [tmp_val + h for i in range(len(X))]
    _hypo = list(map(hypo, W, X))
    fxh1 = sum(list(map(cost, _hypo, Y))) / len(W)

    # Calculate backward values
    W = [tmp_val - h for i in range(len(X))]
    _hypo = list(map(hypo, W, X))
    fxh2 = sum(list(map(cost, _hypo, Y))) / len(W)

    # Calculate the diff
    grad = (fxh1 - fxh2) / (2*h)

    return grad

# Input
X = [i * 0.01 for i in range(-200, 200)]
# Answer
Y = [i > 0 for i in range(-200, 200)]
# Weight
w = 0.6
# Learning rate
lr = 0.1

costs = []
trials = [i for i in range(10)]

for t in trials:
    W = [w for i in range(len(X))]
    _hypo = list(map(hypo, W, X))
    _cost = sum(list(map(cost, _hypo, Y))) / len(_hypo)
    costs.append(_cost)

    grad = cost_gradient(w, X, Y)

    w = w - lr * grad
    plt.plot(t, _cost, "o", label="w = {0:4.3f}".format(w))

plt.plot(trials, costs)
plt.xlabel("trials")
plt.ylabel("cost")
plt.grid()
plt.legend(numpoints=1, loc="upper right", ncol=2)
plt.show()
output_7_0
Image 3. Gradient descent for binary classification

COMMENTS

Name

0 weights,1,abstract class,1,active function,3,adam,2,Adapter,1,affine,2,argmax,1,back propagation,3,binary classification,3,blog,2,Bucket list,1,C++,11,Casting,1,cee,1,checkButton,1,cnn,3,col2im,1,columnspan,1,comboBox,1,concrete class,1,convolution,2,cost function,6,data preprocessing,2,data set,1,deep learning,31,Design Pattern,12,DIP,1,django,1,dnn,2,Don't Repeat Your code,1,drop out,2,ensemble,2,epoch,2,favicon,1,fcn,1,frame,1,gradient descent,5,gru,1,he,1,identify function,1,im2col,1,initialization,1,Lab,9,learning rate,2,LifeLog,1,linear regression,6,logistic function,1,logistic regression,3,logit,3,LSP,1,lstm,1,machine learning,31,matplotlib,1,menu,1,message box,1,mnist,3,mse,1,multinomial classification,3,mutli layer neural network,1,Non Virtual Interface,1,normalization,2,Note,21,numpy,4,one-hot encoding,3,OOP Principles,2,Open Close Principle,1,optimization,1,overfitting,1,padding,2,partial derivative,2,pooling,2,Prototype,1,pure virtual function,1,queue runner,1,radioButton,1,RBM,1,regularization,1,relu,2,reshape,1,restricted boltzmann machine,1,rnn,2,scrolledText,1,sigmoid,2,sigmoid function,1,single layer neural network,1,softmax,6,softmax classification,3,softmax cross entropy with logits,1,softmax function,2,softmax regression,3,softmax-with-loss,2,spinBox,1,SRP,1,standardization,1,sticky,1,stride,1,tab,1,Template Method,1,TensorFlow,31,testing data,1,this,2,tkinter,5,tooltip,1,Toplevel,1,training data,1,vanishing gradient,1,Virtual Copy Constructor,1,Virtual Destructor,1,Virtual Function,1,weight decay,1,xavier,2,xor,3,
ltr
item
Universe In Computer: 10. Cost Function for Binary Classification
10. Cost Function for Binary Classification
Cost function and gradient descent for binary classification
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s0/
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE9QfIQg9MqxmXv8wo1jRHrMgva3N0n9uaoJIHiM44Vt8k6nlufCwcOrXM4piATO-QqQmLgh_JEZUv2KXJVRIATvdu0xwckn-JPaRyfJpu9tFP929dbQgKHcd0zfVFfe9EjSkH18A4MxU4/s72-c/
Universe In Computer
https://kunicom.blogspot.com/2017/06/10-cost-function-for-binary.html
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/
https://kunicom.blogspot.com/2017/06/10-cost-function-for-binary.html
true
2543631451419919204
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy