Softmax Classifier using TensorFlow on MNIST dataset with sample code
install tensorflow
!pip install tensorflow
Loading Mnist dataset
Every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We’ll call the images “x” and the labels “y”. Both the training set and test set contain images and their corresponding labels; for example the training images are mnist.train.images and the training labels are mnist.train.labels.
import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist=input_data.read_data_sets(“MNIST”, one_hot=True)
Check dimension of train and test of MNIST dataset
print(“number of data points : “, mnist.train.images.shape[0],”number of pixels in each image :”,mnist.train.images.shape[1])
Number of train data points : 55000 number of pixels in each image : 784
mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]. The first dimension is an index into the list of images and the second dimension is the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.
print(“number of data points : “, mnist.test.labels.shape[0],” length of the one hot encoded label vector :”,mnist.test.labels.shape[1])
Number of test data points: 10000 length of the one hot encoded label vector : 10
Activate library
If you want to assign probabilities to an object being one of several different things, softmax (Multiclass Logistic regression) is the thing to do, because softmax gives us a list of values between 0 and 1 that add up to 1. Even later on, when we train more sophisticated models, the final step will be a layer of softmax.
import tensorflow as tf
import numpy as np
Defining Placeholders, Variables, predicted y and loss function
x = tf.placeholder(tf.float32, [None, 784])
x isn’t a specific value. It’s a placeholder. A placeholder can be imagined as
a memory unit that we use to load various mini-batches of input data while training. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784] (Here None means that a dimension can be of any length.)
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
We also need the weights and biases for our model. We could imagine treating these like additional inputs, but TensorFlow has an even better way to handle it: Variable.
y = tf.nn.softmax(tf.matmul(x, W) + b)# predicted y
y_ = tf.placeholder(tf.float32, [None, 10])#actual y
Now, First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs. We then add b, and finally apply tf.nn.softmax.
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))#Tutorial for tf.reduce_sum: https://www.dotnetperls.com/reduce-sum-tensorflow
Defining the loss function: multi-class log-loss/cross-entropy First, tf.log computes the logarithm of each element of y. Next, we multiply each element of y_ with the corresponding element of tf.log(y). Then tf.reduce_sum adds the elements in the second dimension of y, due to the reduction_indices=[1] parameter. Finally, tf.reduce_mean computes the mean over all the examples in the batch. Reduction is an operation that removes one or more dimensions from a tensor by performing certain operations across those dimensions.
Defining optimizer
train_step=tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
#https://www.tensorflow.org/versions/r1.2/api_guides/python/train#Optimizers
In this case, we ask TensorFlow to minimize cross_entropy using the gradient descent algorithm with a learning rate of 0.05. What TensorFlow actually does here, behind the scenes, is to add new operations to your computation-graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, does a step of gradient descent training, slightly tweaking your variables to reduce the loss.
Launch model
#Now launch the model in an InteractiveSession
sess = tf.InteractiveSession()
We first have to create an operation to initialize the variables we created:
tf.global_variables_initializer().run()
# We run train_step feeding in the batches data to replace the placeholders
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Each step of the loop, we get a “mini-batch” of one hundred random data points from our training set. Using small batches of random data is called stochastic training — in this case, stochastic gradient descent. Ideally, we’d like to use all our data for every step of training because that would give us a better sense of what we should be doing, but that’s expensive. So, instead, we use a different subset every time. Doing this is cheap and has much of the same benefit.
# https://stackoverflow.com/a/41863099
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
tf.argmax(input, axis=None, name=None, dimension=None) Returns the index with the largest value across axis of a tensor.
Plotting error
# https://gist.github.com/greydanus/f6eee59eaf1d90fcb3b534a25362cea4
# https://stackoverflow.com/a/14434334
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
import time
def plt_dynamic(x, y, y_1, ax, colors=[‘b’]):
ax.plot(x, y, ‘b’, label=”Train Loss”)
ax.plot(x, y_1, ‘r’, label=”Test Loss”)
if len(x)==1:
plt.legend()
fig.canvas.draw()
Now summarizing everything in a single cell
training_epochs = 15
batch_size = 1000
display_step = 1
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = y, labels = y_))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)fig,ax = plt.subplots(1,1)
ax.set_xlabel(‘epoch’) ; ax.set_ylabel(‘Soft Max Cross Entropy loss’)
xs, ytrs, ytes = [], [], []
for epoch in range(training_epochs):
train_avg_cost = 0.
test_avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
_, c = sess.run([train_step, cross_entropy], feed_dict={x: batch_xs, y_: batch_ys})
train_avg_cost += c / total_batch
c = sess.run(cross_entropy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
test_avg_cost += c / total_batchxs.append(epoch)
ytrs.append(train_avg_cost)
ytes.append(test_avg_cost)
plt_dynamic(xs, ytrs, ytes, ax)plt_dynamic(xs, ytrs, ytes, ax)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(“Accuracy:”, accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))
Above plot is between log-loss vs number of epochs where the blue line show train error and red line show the test error.
Accuracy obtains using tensor flow is 90% on MNIST dataset.
=====Detail code can be found at below GitHub link =======
Reference:
- Applied AI(special thanks)
- https://www.tensorflow.org/
- Google image
======================================