Deep learning in edge devices- Introduction to TensorFlow Lite

Create a mobile app that uses ML to classify handwritten digits.

Posted by Navendu Pottekkat on May 24, 2020

As the adoption of machine learning models have evolved over the past couple of years, so has the need to deploy them on mobile and embedded devices. Instead of sending data back and forth from a server, there needs to be a viable, low-latency on-device solution for performing inference on machine learning models.

Back in 2017, Google introduced TensorFlow Lite, a set of tools to run TensorFlow models on mobile, embedded and IoT devices. It is designed to be lightweight, cross-platform and fast.

With the recent release of TensorFlow Lite Model Maker, which makes deploying machine learning models to end devices as easy as writing a few lines of code, and with over 4 billion edge devices worldwide in every different platforms, adding this tool to your belt would grant you a ticket to the future of machine learning.

“Well, the future is strong in this one!”

TensorFlow Lite consists of two main components:

TensorFlow Lite converter- converts TensorFlow models into an efficient form for use by the interpreter, and can introduce optimizations to improve binary size and performance.

TensorFlow Lite interpreter- which runs specially optimized models on many different hardware types, including mobile phones, embedded Linux devices, and microcontrollers.

Components of TensorFlow Lite

Building our TensorFlow Lite model

Now that we have an idea of how TensorFlow Lite works, we will build and deploy a model in an Android app. We will look at different optimization techniques that TensorFlow Lite converter provides as we code along.

Having a general idea of what TensorFlow Lite is enough to proceed as we will take a look at things more deeply as we build and deploy our model.

We will be building a handwritten digits classifier app that takes a handwritten input and uses an ML model to infer the digit written.

We will start by building our digit-classification TensorFlow model. Next we will convert this trained model to TensorFlow Lite. The completed model is available in this Colab.

Although this is a simple app, you will learn all the basic concepts for using TensorFlow Lite and you will be able to use that knowledge to build your own models

Okay then, let’s take a look at our data!

The data

As you might have guessed, we will be using the MNIST dataset. You might have used this dataset before if you are familiar with computer vision.

The MNIST dataset contains 60,000 training images and 10,000 testing images of handwritten digits. We will use the dataset to train our digit classification model.

Each image in the MNIST dataset is a 28x28 grayscale image containing a digit from 0 to 9, and a label identifying which digit is in the image.

The model

We use Keras API to build a TensorFlow model.

Here we will use a simple convolutional neural network(CNN). If you are not familiar with CNN or Keras, I will suggest you check out this article to get started with TensorFlow, Keras and CNNs. You can also follow along this tutorial from this Colab.

We will then train our model on the MNIST “train” dataset. After the model is trained, we will be able to use it to classify the handwritten digits.

            
Model: "sequential"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

reshape (Reshape) (None, 28, 28, 1) 0

_________________________________________________________________

conv2d (Conv2D) (None, 26, 26, 32) 320

_________________________________________________________________

conv2d_1 (Conv2D) (None, 24, 24, 64) 18496

_________________________________________________________________

max_pooling2d (MaxPooling2D) (None, 12, 12, 64) 0

_________________________________________________________________

dropout (Dropout) (None, 12, 12, 64) 0

_________________________________________________________________

flatten (Flatten) (None, 9216) 0

_________________________________________________________________

dense (Dense) (None, 10) 92170

=================================================================

Total params: 110,986

Trainable params: 110,986

Non-trainable params: 0

_________________________________________________________________

          
Summary of the model
            
313/313 [==============================] - 1s 2ms/step - loss: 0.0340 - accuracy: 0.9897

Test accuracy: 0.9897000193595886
 
Test results of the model

Converting the model to TensorFlow Lite

As we have trained our digit classifier model, we can now convert it to TensorFlow Lite format for deploying it to our Android app (we will look at it at the end).


# Convert Keras model to TF Lite format.

converter =
tf.lite.TFLiteConverter.from_keras_model(model)

tflite_float_model = converter.convert()
                    

# Show model size in KBs.

float_model_size = len(tflite_float_model) / 1024

print('Float model size = %dKBs.' % float_model_size)
                    

Float model size = 35KBs.
          
          

That is it! All it took was two lines of code. But, as you might wonder, we will need to make our model as small and as fast as possible before using it in our Android app.

We will use a common technique called quantization for shrinking our model. We will approximate the floating point, 32-bit weights in our model to 8-bit numbers which should reduce the size of the model to 1/4th of its original size.

At inference, weights are converted from 8-bits to floating point and computed using floating-point kernels. This conversion is done once and cached to reduce latency.


# Here we will use 8-bit number to approximate our 32-bit weights,
# which in turn shrinks the model size by a factor of 4.

# Re-convert the model to TF Lite using quantization.

converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quantized_model = converter.convert()

# Show model size in KBs.

quantized_model_size = len(tflite_quantized_model) / 1024
print('Quantized model size = %dKBs,' % quantized_model_size)
print('which is about %d%% of the float model size.'\
      % (quantized_model_size * 100 / float_model_size))
          
          

Quantized model size = 111KBs,

which is about 25% of the float model size.
                    
          

As you can see, our quantized model is only 25% of our float model. You can check out the other quantization methods here.

Since the model is quantized, we might have a bit of an accuracy drop. When converting models to TF Lite, this trade offs between accuracy, size and latency must be taken care of.

Evaluating the TF Lite model

Let’s calculate the accuracy of our quantized model and the float model and check if there is any accuracy drop.

Please check the Colab for the complete code. We only look into performing inference with the TF Lite model in detail in the section below.

To perform inference using a TensorFlow Lite model, first we have to load the TF Lite model into memory.


interpreter = tf.lite.Interpreter(model_content=tflite_model)
          

Before using the interpreter, we have to allocate memory for the input and output tensors.


interpreter.allocate_tensors() # memory allocation for input and output tensors
input_tensor_index = interpreter.get_input_details()[0]["index"]
output = interpreter.tensor(interpreter.get_output_details()[0]["index"])
                    

Before passing the input image to the model, we have to make sure that we convert it to float 32 to match the model’s input data format. We also add a batch dimension to the image. (test_image)

After preprocessing the input we set the input tensor values.

            
test_image = np.expand_dims(test_image, axis=0)
             .astype(np.float32)

interpreter.set_tensor(input_tensor_index, test_image)
          
          

Next, we invoke the interpreter, i.e run inference on the model.


interpreter.invoke()
                    

We then have to read the output tensor values and convert it back to a proper format.

In our example, we remove the batch dimension and find the digit with the highest probability. (The digit with the highest probability would be our result)


digit = np.argmax(output()[0])

prediction_digits.append(digit)
                    

We then check this with the ground truth labels to check the accuracy.

After evaluating the float model and the quantized model we get the accuracy of both the models to be almost the same.


Float model accuracy = 0.9897

Quantized model accuracy = 0.9897

Accuracy drop = 0.0000
                    

There is no significant accuracy drop that would prevent us from deploying the quantized model in our Android app.

We looked at how we can perform inference on our model in the above steps. We can now download the model to our app and use it to classify handwritten digits.

We have to follow the same steps as we did above. The only difference is that we use the Java APIs for performing inference.

In general the steps above are the typical steps that you would take when you are performing inference regardless of the platform that you are in. You can check out this doc for more info.

Deploying the model to the Android app

We first download our TensorFlow Lite model.


# Save the quantized model to file to the Downloads directory

f = open('mnist.tflite', "wb")
f.write(tflite_quantized_model)
f.close()

# Download the digit classification model

from google.colab import files
files.download('mnist.tflite')
print('`mnist.tflite` has been downloaded')
                    

TensorFlow team has actually built a skeleton app that we can use to plug in our model and perform inference.

The app gets a handwriting input from the user and using our .tflite model, it would identify the digit.

You can download the app from here .

The downloaded file will contain both the finished app and the starting app.

In the steps below we will take a look at how we can use our downloaded model in our app. To follow along, use the start folder.

Copy the mnist.tflite model that we downloaded earlier to the assets folder of our app.


start/app/src/main/assets/
                    

Open Android Studio and click Import project.

Choose the ~/start folder.

Update build.gradle

Go to the build.gradle of the app module and find the dependencies block.


dependencies {
...
// TODO: Add TF Lite
...
}
                    

Add TensorFlow Lite to the app's dependencies.


implementation 'org.tensorflow:tensorflow-lite:2.0.0'
                    

We need to prevent Android from compressing TensorFlow Lite model files when generating the app binary.

Find this code block.


android {
...
// TODO: Add an option to avoid compressing TF Lite
model file
...
}
                    

And add the following lines of code.


aaptOptions {
noCompress "tflite"
}
                    

Click Sync Now to apply the changes.

Initialize TensorFlow Lite interpreter

Open DigitClassifier.kt. This is where we add TensorFlow Lite code.

First, add a field to the DigitClassifier class.


class DigitClassifier(private val context: Context) {

private var interpreter: Interpreter? = null

// ...

}
                    

Android Studio now raises an error: Unresolved reference: Interpreter . Follow its suggestion and import org.tensorflow.lite.Interpreter to fix the error.

Next, find this code block.


private fun initializeInterpreter() {

// TODO: Load the TF Lite model from file and
initialize an interpreter.

// ...

}
                    

Then add these lines to initialize a TensorFlow Lite interpreter instance using the mnist.tflite model from the assets folder.


// Load the TF Lite model from the asset folder.

val assetManager = context.assets

val model = loadModelFile(assetManager, "mnist.tflite")
                    

Add these lines right below to read the model input shape from the model.


// Read input shape from model file

val inputShape = interpreter.getInputTensor(0).shape()

inputImageWidth = inputShape[1]

inputImageHeight = inputShape[2]

modelInputSize = FLOAT_TYPE_SIZE * inputImageWidth *
inputImageHeight * PIXEL_SIZE

// Finish interpreter initialization

this.interpreter = interpreter
                    
  • modelInputSize indicates how many bytes of memory we should allocate to store the input for our TensorFlow Lite model.

  • FLOAT_TYPE_SIZE indicates how many bytes our input data type will require. We use float32, so it is 4 bytes.

  • PIXEL_SIZE indicates how many color channels there are in each pixel. Our input image is a monochrome image, so we only have 1 color channel.

After we have finished using the TensorFlow Lite interpreter, we should close it to free up resources. In this sample, we synchronize the interpreter lifecycle to the activity MainActivity lifecycle, and we will close the interpreter when the activity is going to be destroyed. Let's find this comment block in DigitClassifier#close() method.


// TODO: close the TF Lite interpreter here
                    

Then add this line.


interpreter?.close()
                    

Run inference with our model

Our TensorFlow Lite interpreter is set up, so let's write code to recognize the digit in the input image. We will need to do the following:

  • Pre-process the input: convert a Bitmap instance to a ByteBuffer instance containing the pixel values of all pixels in the input image. We use ByteBuffer because it is faster than a Kotlin native float multidimensional array.

  • Run inference.

  • Post-processing the output: convert the probability array to a human-readable string.

Find this code block in DigitClassifier.kt.


private fun classify(bitmap: Bitmap): String {
// ...
// TODO: Add code to run inference with TF Lite.
// ...
}
                    

Add code to convert the input Bitmap instance to a ByteBuffer instance to feed to the model.


// Preprocessing: resize the input image to match the
model input shape.
val resizedImage = Bitmap.createScaledBitmap(
bitmap,
inputImageWidth,
inputImageHeight,
true
)

val byteBuffer =
convertBitmapToByteBuffer(resizedImage)
                    

Then run inference with the preprocessed input.


// Define an array to store the model output.

val output = Array(1) {
FloatArray(OUTPUT_CLASSES_COUNT) }

// Run inference with the input data.

interpreter?.run(byteBuffer, output)
                    

Then identify the digit with the highest probability from the model output, and return a human readable string that contains the prediction result and confidence. Replace the return statement that is in the starting code block.


// Post-processing: find the digit that has the highest
probability
// and return it a human-readable string.

val result = output[0]

val maxIndex = result.indices.maxBy { result[it] } ?: -1

val resultString = "Prediction Result: %d\nConfidence: %2f".

format(maxIndex, result[maxIndex])

return resultString
                    

Run and test the app

You can deploy the app to an Android Emulator or a physical Android device.

Click the Run button in the toolbar.

Draw digits in the screen and check if the app can recognize it.

Well, the model recognises the digits fairly accurately. As you probably saw, the basic process for running inference using the TensorFlow Lite was the same as we did using Python.

With this knowledge, you would be able to create, optimize and deploy models to edge devices according to your requirement.

The best place to learn more about TF Lite is from the official docs.

Happy Coding!