TOC Training Data Set Online Machine Learning Accuracy Training Data Set & Testing Data Set MNIST Database MNIST T...

tensorflow

Training Data Set

After training with a set of data, we want to know how much our model work correctly.
To measure the accuracy of the model, we can test the model with the data
However, all data is already used during training, then the model gives correct answer always, and the accuracy is 100%.
This is not the correct meaning of accuracy.
Therefore, for test, data which is not trained should be used, and the data set is called as testing data.
If some data is given to us, we should separate them into training data and testing data.
Sometimes, training data is separated again. The new set is validation data. The validation data is used for mock test during training. - InTech

Image 1. Sets

Online Machine Learning

A method in which data becomes available in a sequential order and is used to update our best predictor for future data at each step. - Wiki
Sometimes, given data set is too large to train at once or it is necessary to update already trained model. Online machine learning helps these situations.
Training consists of several steps, and the model from each training step is available to inference.

Accuracy

How many of predictions are correct.

$$ Accuracy = \begin{matrix} \frac{the.number.of.correct.answer}{the.number.of.test.instance}\end{matrix} \times 100 $$

Training Data Set & Testing Data Set

From given data, some data will be used as training data, and the others will be testing data.

import tensorflow as tf
import matplotlib.pyplot as plt

# Training data
x_data = [[1,2,1], [1,3,2], [1,3,4], [1,5,5], 
          [1,7,5], [1,2,5], [1,6,6], [1,7,7]]
y_data = [[0,0,1], [0,0,1], [0,0,1], [0,1,0], 
          [0,1,0], [0,1,0], [1,0,0], [1,0,0]]

# Testing data
x_test = [[2,1,1], [3,1,2], [3,3,4]]
y_test = [[0,0,1], [0,0,1], [0,0,1]]

# Placeholder for inputs and labels
X = tf.placeholder("float", [None, 3])
Y = tf.placeholder("float", [None, 3])

# Weight
W = tf.Variable(tf.random_normal([3, 3]))
# Bias
b = tf.Variable(tf.random_normal([3]))

# Hypothesis
hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)
# Cost function
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis),\
                        axis=1))
# Optimizer
optimizer = tf.train.GradientDescentOptimizer(\
                learning_rate=0.1).minimize(cost)

# Prediction
prediction = tf.argmax(hypothesis, 1)
is_correct = tf.equal(prediction, tf.argmax(Y, 1))
# Accuracy
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# Launch graph
with tf.Session() as sess:
    # Initialize TensorFlow variabels
    sess.run(tf.global_variables_initializer())
    
    steps = [i for i in range(201)]
    costs = []
    Ws = []
    for i in steps:
        cost_val, W_val, _ = sess.run([cost, W, optimizer],
            feed_dict={X: x_data, Y: y_data})
        costs.append(cost_val)
    
    # Predict
    predictionResult = sess.run(prediction, \
                            feed_dict={X: x_test})
    print("Prediction")
    for i in range(len(predictionResult)):
        print("x_data[[{0}]: {1}".format(i, predictionResult[i]))
    print()
    # Calculate the accuracy
    print("Accuracy: ", sess.run(accuracy, \
                    feed_dict={X: x_test, Y: y_test}))
    
    # Plot
    plt.plot(steps, costs)
    plt.xlabel("trials")
    plt.ylabel("cost")
    plt.title("Costs")
    plt.show()

Prediction
x_data[[0]: 2
x_data[[1]: 2
x_data[[2]: 2

Accuracy:  1.0

Image 2. Training and Testing sets

MNIST Database

A large database of handwritten digits that is commonly used for training various image processing systems. - Wiki
The images or one of number 0 ~ 9.
Its images are 28 x 28 x 1, and colors are grey. - TensorFlow

Image 3. one example of MNIST data

MNIST Training with TensorFlow

import tensorflow as tf
# Tensorflow already incldues MNNIST data set
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt
import random

# Get input data as one_hot encoding format
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

# Labels: 0 ~ 9
nb_labels = 10

# MNIST data image of shape 28 x 28 = 784
X = tf.placeholder(tf.float32, [None, 784])
# 0 ~ 9 digits recofnition = 10 labels
Y = tf.placeholder(tf.float32, [None, nb_labels])
# Weight
W = tf.Variable(tf.random_normal([784, nb_labels]))
# Bias
b = tf.Variable(tf.random_normal([nb_labels]))

# Hypothesis - softmax
hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)
# Cost
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis =1))
# Optimizer
optimizer = tf.train.GradientDescentOptimizer(\
                    learning_rate=0.1).minimize(cost)

# Test model
is_correct = tf.equal(tf.argmax(hypothesis, 1), \
                        tf.argmax(Y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# Epoch - How many times data will be trained
training_epochs = 15
# Batch - How many data will be trained at once
batch_size = 100

with tf.Session() as sess:
    # Initialize TensorFlow variables
    sess.run(tf.global_variables_initializer())
    
    # training cycle
    for epoch in range(training_epochs):
        avg_cost = 0
        # Iteration - (the number of data) / (batch size).
        max_iteration = int(mnist.train.num_examples / batch_size)
        
        for itr in range(max_iteration):
            batch_xs, batch_ys = \
                    mnist.train.next_batch(batch_size)
            c, _ = sess.run([cost, optimizer], 
                        feed_dict={X: batch_xs, Y: batch_ys})
            avg_cost += c / max_iteration
            
        print("Epoch: {0:4d}, Cost: {1:0.9f}".format(\
                        epoch + 1, avg_cost))
        
    print("Learning finished")
    
    # Test the model using test sets
    # accuracy.eval() == sess.run()
    print("Accuracy: ", accuracy.eval(session=sess, \
                        feed_dict={X: mnist.test.images, \
                                   Y: mnist.test.labels}))
    
    # Get on and predict
    r = random.randint(0, mnist.test.num_examples - 1)
    # mnist.test.labels are one-hot encoded
    print("Label: {0}".format(\
            sess.run(tf.argmax(mnist.test.labels[r:r+1], 1))))
    print("Prediction: {0}".format(\
            sess.run(tf.argmax(hypothesis, 1), \
            feed_dict = {X: mnist.test.images[r:r+1]})))
    plt.imshow(mnist.test.images[r:r+1].reshape(28, 28),
               cmap="Greys", interpolation="nearest")
    plt.show()

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Epoch:    1, Cost: 2.591440395
Epoch:    2, Cost: 1.119776924
Epoch:    3, Cost: 0.898812341
Epoch:    4, Cost: 0.786791039
Epoch:    5, Cost: 0.714991944
Epoch:    6, Cost: 0.664041839
Epoch:    7, Cost: 0.625205653
Epoch:    8, Cost: 0.595033390
Epoch:    9, Cost: 0.569254169
Epoch:   10, Cost: 0.548133162
Epoch:   11, Cost: 0.529921867
Epoch:   12, Cost: 0.514021380
Epoch:   13, Cost: 0.499853914
Epoch:   14, Cost: 0.487327398
Epoch:   15, Cost: 0.476091112
Learning finished
Accuracy:  0.8894
Label: [0]
Prediction: [0]

Image 4. Prediction for MNIST

Universe In Computer

Header$type=social_icons

$type=grid$count=3$meta=0$sn=0$rm=0

17. Data Set & MNIST with Tensorflow

TOC

Training Data Set

Online Machine Learning

Accuracy

Training Data Set & Testing Data Set

MNIST Database

MNIST Training with TensorFlow

라벨:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts

$type=grid$count=3$meta=0$sn=0$rm=0

17. Data Set & MNIST with Tensorflow

TOC

Training Data Set

Online Machine Learning

Accuracy

Training Data Set & Testing Data Set

MNIST Database

MNIST Training with TensorFlow

라벨:

SHARE:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts