Multi-variables for linear regression with tensorflow

tensorflow

Example

Train 4 test scores of 5 students
Predict the last test score of other students

X1	X2	X3	Y
73	80	75	152
93	88	93	185
89	91	90	180
96	98	100	196
73	66	70	142

Table 1. Scores of students

Multi-variables without Matrix

import tensorflow as tf
import matplotlib.pyplot as plt

# Input data
x1_data = [73., 93., 89., 96., 73.]
x2_data = [80., 88., 91., 98., 66.]
x3_data = [75., 93., 90., 100., 70.]

# Answer data
y_data = [152., 185., 180., 196., 142]

# placeholders for a tensor that will be always fed.
# Input tensor
x1 = tf.placeholder(tf.float32)
x2 = tf.placeholder(tf.float32)
x3 = tf.placeholder(tf.float32)
# Answer tensor
y = tf.placeholder(tf.float32)

# Weight
w1 = tf.Variable(tf.random_normal([1]), name="weight1")
w2 = tf.Variable(tf.random_normal([1]), name="weight2")
w3 = tf.Variable(tf.random_normal([1]), name="weight3")

# Bias
b = tf.Variable(tf.random_normal([1]), name="bias")

# Hypothesis
hypothesis = x1 * w1 + x2 * w2 + x3 * w3 + b

# cost function - MSE
cost = tf.reduce_mean(tf.square(hypothesis - y))
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

# Launch the ngraph in a session
sess = tf.Session()
sess.run(tf.global_variables_initializer())

stamps = { "costs" : [], "hypos" : [] }

# Train data 51 times.
for i in range(51):
    # Training
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train], \
            feed_dict={x1: x1_data, x2:x2_data, x3:x3_data, y:y_data})

    stamps["costs"].append(cost_val)
    stamps["hypos"].append(hy_val)

trials = [i for i in range(51)]

for k, v in stamps.items():
    plt.plot(trials, v)
    plt.title(k)
    plt.xlabel("trials")
    plt.ylabel(k)
    plt.grid()
    plt.show()

Image 1. Costs

Image 2. Hypothesis

Multi-variables with Matrix

Input, answer data and hypothesis should be transformed to matrix.

import tensorflow as tf
import matplotlib.pyplot as plt

# Input data - 5 x 3 matrix
x_data = [[73., 80., 75.],\
          [93., 88., 93.],\
          [89., 91., 90.],\
          [96., 98., 100.],\
          [73., 66., 70.]]
# Answer data - 5 x 1 matrix
y_data = [[152.], [185.], [180.], [196.], [142.]]

# placeholders for a tensor that will be always fed.
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])
# [None, 3] means the number of row is not limited, but column should be 3.

# Weight and bias
W = tf.Variable(tf.random_normal([3, 1]), name="weight")
b = tf.Variable(tf.random_normal([1]), name="bias")

# Hypothesis in matrix
hypothesis = tf.matmul(X, W) + b

# cost function
cost = tf.reduce_mean(tf.square(hypothesis - Y))
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

# Launch the ngraph in a session
sess = tf.Session()
sess.run(tf.global_variables_initializer())

stamps = { "costs" : [] }

# Train data 51 times..
for i in range(51):
    # Training
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train], \
                                  feed_dict={X: x_data, Y: y_data})

    stamps["costs"].append(cost_val)
    # Because hypos are currently two dimesions, I skip hypos plot.

trials = [i for i in range(51)]

for k, v in stamps.items():
    plt.plot(trials, v)
    plt.title(k)
    plt.xlabel("trials")
    plt.ylabel(k)
    plt.grid()
    plt.show()

Image 3. Costs

Loading Data from File

Before doing this, I copied test data from HERE.

# Loading data from file
import numpy as np
import tensorflow as tf

# Load data from file
# Relative path is possible.
xy = np.loadtxt("./data-01-test-score.csv", delimiter=",", dtype=np.float32)
# The last data is for answer, but the others are input
# Store loaded array data to input and answer matrix.
x_data = xy[:, 0: -1]
y_data = xy[:, [-1]]

# Make sure the shape and data are OK
print("X Shape: {0}".format(x_data.shape))
print("Y Shape: {0}".format(y_data.shape))

# placeholders for a tensor that will be always fed.
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])

# Weight and bias
W = tf.Variable(tf.random_normal([3,1]), name="weight")
b = tf.Variable(tf.random_normal([1]), name="bias")

# Hypothesis
hypothesis = tf.matmul(X, W) + b

# Simplified cost function
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

# Launch the ngraph in a session
sess = tf.Session()
sess.run(tf.global_variables_initializer())

print("")
for step in range(2001):
    # Training
    cost_val, hy_val, _ = \
        sess.run([cost, hypothesis, train], feed_dict={X: x_data, Y: y_data})

    if step % 100 == 0:
        print("Trial: {0}, Cost: {1}".format(step, cost_val))

print("")
# Test hypothesis with trained weight and bias
print("Your score will be",\
    sess.run(hypothesis, feed_dict={X: [(100, 70, 101)]}))
print("Other scores will be",\
    sess.run(hypothesis, feed_dict={X: [[60,70,110], [90, 100, 80]]}))

X Shape: (25, 3)
Y Shape: (25, 1)

Trial: 0, Cost: 11846.107421875
Trial: 100, Cost: 106.76359558105469
Trial: 200, Cost: 98.2507095336914
Trial: 300, Cost: 90.45933532714844
Trial: 400, Cost: 83.32809448242188
Trial: 500, Cost: 76.80110931396484
Trial: 600, Cost: 70.82699584960938
Trial: 700, Cost: 65.35875701904297
Trial: 800, Cost: 60.3536262512207
Trial: 900, Cost: 55.772193908691406
Trial: 1000, Cost: 51.57862854003906
Trial: 1100, Cost: 47.73990249633789
Trial: 1200, Cost: 44.22602844238281
Trial: 1300, Cost: 41.0093994140625
Trial: 1400, Cost: 38.06480026245117
Trial: 1500, Cost: 35.369171142578125
Trial: 1600, Cost: 32.90144729614258
Trial: 1700, Cost: 30.642223358154297
Trial: 1800, Cost: 28.57388687133789
Trial: 1900, Cost: 26.68033218383789
Trial: 2000, Cost: 24.946613311767578

Your score will be [[ 165.50801086]]
Other scores will be [[ 162.13847351]
 [ 189.32736206]]

Queue Runner

For very large and may files, TensorFlow provides QueueRunner.
Data will be loaded on-demand by TensorFlow.
Step
1. Register multiple data files on the Queue runner
2. Read data with Reader
3. Decode data
4. Batch data to train
5. Start Queue runner
6. Train and Inference...
7. Close Queue runner

Image 4. Queue runner

# Queue runners
import tensorflow as tf

# 1. Register multiple data files
filename_queue = tf.train.string_input_producer( \
    ['data-01-test-score.csv'], shuffle=False, name='filename_queue')

# 2. Read data with reader
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# 3. Decode data
# Default values, in case of empty columns.
# Also, specifies the type of the decoded result.
# decode_csv(): because file read is csv format.
record_defaults = [[0.], [0.], [0.], [0.]]
xy = tf.decode_csv(value, record_defaults=record_defaults)

# 4. Batch data
# Assign data to input and answer data
# Collect batched of csv in
train_x_batch, train_y_batch = \
    tf.train.batch([xy[0:-1], xy[-1:]], batch_size=10)

# Placeholders for a tensor that will be always fed.
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])

# Weight and bias
W = tf.Variable(tf.random_normal([3, 1]), name="weight")
b = tf.Variable(tf.random_normal([1]), name="bias")

# Hypothesis
hypothesis = tf.matmul(X, W) + b

# Simplified cost funtion
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

# Launch the graph in a session
sess = tf.Session()
# Initializes global variables in the graph
sess.run(tf.global_variables_initializer())

# 5. Start Queue runner
# Start populating the filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

for step in range(2001):
    # 6. Train
    x_batch, y_batch = sess.run([train_x_batch, train_y_batch])
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train], \
                                   feed_dict={X: x_batch, Y: y_batch})

    if step % 100 == 0:
        print("Trial: {0}, Cost: {1}".format(step, cost_val))

# 7. Stop Queue runner
coord.request_stop()
coord.join(threads)

Trial: 0, Cost: 90474.3984375
Trial: 100, Cost: 29.027339935302734
Trial: 200, Cost: 26.71291160583496
Trial: 300, Cost: 24.60808563232422
Trial: 400, Cost: 22.694204330444336
Trial: 500, Cost: 20.95441436767578
Trial: 600, Cost: 19.373111724853516
Trial: 700, Cost: 17.93614387512207
Trial: 800, Cost: 16.63065528869629
Trial: 900, Cost: 15.44487190246582
Trial: 1000, Cost: 14.368145942687988
Trial: 1100, Cost: 13.3906831741333
Trial: 1200, Cost: 12.503578186035156
Trial: 1300, Cost: 11.6986665725708
Trial: 1400, Cost: 10.968642234802246
Trial: 1500, Cost: 10.306745529174805
Trial: 1600, Cost: 9.706822395324707
Trial: 1700, Cost: 9.163233757019043
Trial: 1800, Cost: 8.670954704284668
Trial: 1900, Cost: 8.225290298461914
Trial: 2000, Cost: 7.822013854980469

For multiple files, add file name to the file name list.

filename_queue = tf.train.string_input_producer(\
    ['data-01.csv', 'data-02.csv', ... ],
    suffle=False, name='filename_queue')

If you want to shuffle the batch, you can use shuffle_batch.

# min_after_dequeue defines how big a buffer we will randomly sample
#  from --bigger means better shuffling,
#  but slower start up and more memory used.
# capacity must be larger than min_after_dequeue and the amount larger
#  determines the maximum we will prefetch.
#  Recommendation:
#   min_after_dequeue + (num_threas + a small safetly margin) * batch_size
min_after_deque = 1000
capacity = min_after_dequeue + 3 * batch_size
example_batch, label_batch = tf.train.suffle_batch(\
    [example, label], batch_size=batch_size, capacity=capacity,
    min_after_dequeue=min_after_dequeue)

Universe In Computer

Header$type=social_icons

$type=grid$count=3$meta=0$sn=0$rm=0

08. Multi-variables for Linear Regression with TensorFlow

TOC

Example

Multi-variables without Matrix

Multi-variables with Matrix

Loading Data from File

Queue Runner

라벨:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts

$type=grid$count=3$meta=0$sn=0$rm=0

08. Multi-variables for Linear Regression with TensorFlow

TOC

Example

Multi-variables without Matrix

Multi-variables with Matrix

Loading Data from File

Queue Runner

라벨:

SHARE:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts