Classifying fashion apparels- Getting started with Computer Vision

Create a model to classify images of fashion apparels.

Posted by Navendu Pottekkat on May 21, 2020

In this guide, you will be training a neural network model to classify images of clothing like shirts, coats, sneakers etc.

Whew! That sounds a lot for a beginner tutorial, I mean we are just getting started right?

Not to worry! Don’t get overwhelmed, it’s okay if you don’t understand all the details. You would learn all the details as you go deeper into the article, trust me :) .

If you are totally new to machine learning, I would suggest you check out my beginner tutorial.

With that said, let’s get started!

The Data

We will be using the Fashion-MNIST dataset. It is a dataset comprised of 60,000 square (28x28 pixel) grayscale images of 10 types of clothing.

Each of the apparels are assigned a particular label:

  • 0- T-shirt/top
  • 1- Trouser
  • 2- Pullover
  • 3- Dress
  • 4- Coat
  • 5- Sandal
  • 6- Shirt
  • 7- Sneaker
  • 8- Bag
  • 9- Ankle boot
The Fashion-MNIST dataset with 10 different classes of fashion apparels.

Let’s get to the code

We will use TensorFlow and TensorFlow Keras for building our model.

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

Keras is TensorFlow's high-level API for building and training deep learning models.

You can read more about them. Getting a basic idea about the tools are enough for now as you would learn more about TensorFlow and Keras as you go along.

As you will see in the code, it is very intuitive and user-friendly and you would be able to build Machine Learning models on the fly.

Less talking, more code!!!

Importing the libraries

We will use Google Colab as our environment to write code.

Open up a new Colab Notebook. The code shown here in each section could be run in new cells each. If you are not entirely sure about what Colab is or if you have never worked with Jupyter Notebooks before, I would sugeest you to check out the introductory notebook to get started.

We will use the numpy and matplotlib also as helper libraries.

            
# Import the necessary libraries

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
from keras.utils import np_utils 
import random

Importing the data

The Fashion-MNIST data is readily available in Keras datasets.


# Import the data from keras.dataset

(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
          

This will load the fashion_mnist data into 4 NumPy arrays:

  • The train_images and train_labels arrays are the training set—the data the model uses to learn.
  • The model is tested against the test set, the test_images, and test_labels arrays.

Exploring the data

The following code shows that there are 6000 training images and 1000 test images of 28x28 pixels. We will train the model on the training images and test the performance of the model by performing predictions on the test images. The images have been labelled correspondingly in the train_labels and test_labels.


# Check out the shape of the dataset

print("Train Images Shape:",train_images.shape)
print("Train Labels Shape:",train_labels.shape)
print("Test Images Shape:",test_images.shape)
print("Test Labels Shape:",test_labels.shape)
          

Train Images Shape: (60000, 28, 28)
Train Labels Shape: (60000,)
Test Images Shape: (10000, 28, 28)
Test Labels Shape: (10000,)
          

Now let’s take a look at the data we just loaded.


# Check out loaded images!

plt.figure()

fig,ax = plt.subplots(2,2)

ax[0][0].imshow(train_images[1])
ax[0][1].imshow(train_images[1000])
ax[1][0].imshow(train_images[22000])
ax[1][1].imshow(train_images[44000])
          

Since the pixel values lie between 0-255, we convert it into values between 0 and 1. I.e we just divide the pixel values by 255.0.


# We normalise the loaded images
# i.e we convert the pixel values which lie between 0-255 into values between 0-1

train_images = train_images/255.0
test_images = test_images/255.0
              

Creating the model

Building a neural network requires configuring the layers of the model and then compiling the model.

Layers are the basic building blocks of a neural network. They extract features or representations from the data that is fed into them. After training, these features would help us solve the problem at hand - classifying fashion apparels.

Here we will chain together some simple layers to create our model.


# We are building a sequential model

model = tf.keras.models.Sequential()

# Add the input layer

model.add(tf.keras.layers.Flatten(input_shape=(28,28)))

# Add a hidden layer

model.add(tf.keras.layers.Dense(units=128, activation="relu"))

# Add an output layer

model.add(tf.keras.layers.Dense(units=10))
          

The first layer of the network, tf.keras.layers.Flatten, transforms the image which is a 2D array (of 28x28 pixels) to a 1D array (of size 28*28 = 784). It basically takes the input image and lines up each row of pixels back to back. This layer is used only for transforming the data.

Once the input images have been transformed by the Flatten layer, the network then has two tf.keras.layers.Dense layers.

These are well, densely connected or fully connected layers.

Densely connected layers

The first Dense layer has 128 neurons and the second Dense layer, which is the last layer of our network, has 10 neurons. The last layer of the network is our output layer which would provide the output of the model. Each of the 10 nodes would contain the probability score that indicates the current image belongs to one of the 10 classes. (Remember there are 10 classifications for the apparels in our data)

Compile the model

We are almost ready to train our model! Before that we have to configure a few more settings.

Loss function: This measures how accurate the model is during training. You want to minimize this function to "steer" the model in the right direction. I.e the model tries to minimise the loss function with each step of the training to improve the model.

Optimizer: Optimizers update the weight parameters to minimize the loss function.

Metrics: A metric is a function that is used to judge the performance of your model. Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. The following model uses accuracy, the fraction of the images that are correctly classified.


model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer="adam", metrics=["accuracy"])
          

You don't need to know all the details about the loss function sparse_categorical_crossentropy or adam optimizer for now. You can check out the docs if you need to learn more. For now having a grasp of what loss function and optimizers are would be enough.

Train the model

For training our model, we simply feed the model our training data and labels contained in train_images and train_labels respectively.

We call the model.fit method to “fit” the model to the training data.


model.fit(train_images, train_labels, epochs=10)
  

Epoch 1/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3768 - accuracy: 0.8636
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3394 - accuracy: 0.8762
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3145 - accuracy: 0.8851
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2965 - accuracy: 0.8902
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2818 - accuracy: 0.8957
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2698 - accuracy: 0.9002
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2582 - accuracy: 0.9043
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2495 - accuracy: 0.9074
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2409 - accuracy: 0.9095
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2324 - accuracy: 0.9137
          

We can see the loss and accuracy metrics displayed as the model is being trained. As the model trains, the loss decreases and accuracy increases. Kudos! Your model is learning!

Evaluate the model

The model has about 90% (0.90) accuracy on the training data. You may have values somewhere around 90% (not to worry if it is slightly different as it is prone to some randomness)

But that is not enough! We still haven't tested the model. We will now test our model on our test data, which the model has never seen before! Let’s see how it performs.


test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)
          

313/313 - 1s - loss: 0.3366 - accuracy: 0.8838

Test accuracy: 0.8838000297546387
          

It turns out that the accuracy on the test dataset is a little less than the train dataset. This could mean that our model is overfitting on our training data. We will not worry about that now. In the future articles, we will discuss what causes overfitting and how we can prevent it.

Making Predictions

Finally! We can now use our model to make predictions on images. Here we have a function to plot 100 random test images and their predicted labels. If a prediction result is different from the label provided in the test_labels dataset, we will highlight it in red color.


# A helper function that returns 'red'/'black' depending on if its two input
# parameter matches or not.

def get_label_color(val1, val2):
  if val1 == val2:
    return 'black'
  else:
    return 'red'

# Predict the labels of digit images in our test dataset.

predictions = model.predict(test_images)

# As the model output 10 float representing the probability of the input image
# being a digit from 0 to 9, we need to find the largest probability value
# to find out which digit the model predicts to be most likely in the image.

prediction_digits = np.argmax(predictions, axis=1)

# Then plot 100 random test images and their predicted labels.
# If a prediction result is different from the label provided label in "test"
# dataset, we will highlight it in red color.

plt.figure(figsize=(18, 18))
for i in range(100):
  ax = plt.subplot(10, 10, i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  image_index = random.randint(0, len(prediction_digits))
  plt.imshow(test_images[image_index], cmap=plt.cm.gray)
  ax.xaxis.label.set_color(get_label_color(prediction_digits[image_index],\
                                           test_labels[image_index]))
  plt.xlabel('Predicted: %d' % prediction_digits[image_index])
plt.show()
          

Wow! You have done it! You have successfully created a model which can look at images of fashion apparels and classify them with a good certainty! When you think about it, all it took was a few lines of code.

We see a few errors, but for our first model, things are looking pretty good!

The completed Colab Notebook is available here.

With this new knowledge of TensorFlow, Keras and machine learning in general, you would be able to create your own models for a wide variety of datasets. Moreover, the tools and techniques you learned here are the foundations of complex models used in practice.

In the coming tutorial, we will take a look at Convolutional Neural Networks- a type of neural network used widely for computer vision applications. We will see that we can improve the accuracy of our model further using CNNs.

Happy Coding!