Google Releases Quantization Aware Training API To Train Smaller, Faster AI Models

Navin Bondade 0 Comments Artificial Intelligence, Data Science, Deep Learning, Machine Learning

Google has released Quantization Aware Training (QAT) API, which enables developers to train and deploy models with the performance benefits of quantization, the process of mapping input values from a large set to output values in a smaller set while retaining close to their original accuracy.

The goal of this Quantization Aware Training (QAT) API is to support the development of smaller, faster, and more efficient machine learning models well-suited to run on off-the-shelf machines, such as those in medium- and small-business environments where computation resources are at a premium.

Often, the process of going from a higher to lower precision is noisy. That’s because quantization squeezes a small range of floating-point values into a fixed number of information buckets, leading to information loss similar to rounding errors when fractional values are represented as integers.

Problematically, when the lossy numbers are used in several computations, the losses accumulate and need to be rescaled for the next computation.

The Quantization Aware Training (QAT) API solves this by simulating low-precision computation during the AI model training process. Quantization error is introduced as noise throughout the training, which QAT API’s algorithm tries to minimize so that it learns variables that are more robust to quantization.

A training graph leverages operations that convert floating-point objects into low-precision values and then convert low-precision values back into floating-point, ensuring that quantization losses are introduced in the computation and that further computations emulate low-precision.

In tests, Google reports that an image classification model (MobilenetV1 224) with a Non-quantized accuracy of 71.03% achieved 71.06% accuracy after quantization when tested on the open-source Imagenet data set.

Another classification model (Nasnet-Mobile) tested against the same data set only experienced a 1% loss in accuracy (74% to 73%) post-quantization.

Aside from emulating the reduced precision computation, QAT API is responsible for recording the statistics necessary to quantize a trained model or parts of it.

The Quantization Aware Training (QAT) API enables developers to convert a model trained with the API to a quantized integer-only TensorFlow Lite model, for example, or to experiment with various quantization strategies while simulating how quantization affects accuracy for different hardware backends.

Google says that by default, QAT API which is a part of the TensorFlow Model Optimization Toolkit is configured to work with the quantized execution support available in TensorFlow Lite, Google’s toolset designed to adapt models architected on its TensorFlow ML framework to mobile & IoT devices.

“We are very excited to see how the QAT API further enables TensorFlow users to push the boundaries of efficient execution in their TensorFlow Lite-powered products as well as how it opens the door to researching new quantization algorithms and further developing new hardware platforms with different levels of precision,” Google said.