Researchers at Google have open-sourced a new framework that can scale up AI model training across thousands of machines & can make reinforcement learning scale better & improve computational efficiency.
The framework is named SEED RL. The SEED stands for scalable, efficient, deep reinforcement learning and describes a modern RL agent that scales well, is flexible and efficiently utilizes available resources.
The researchers said that SEED RL can make reinforcement learning scale better and improve computational efficiency, It could facilitate training at millions of frames per second on a machine while reducing costs by up to 80%, potentially leveling the playing field for startups that couldn’t previously compete with large AI labs.
That kind of reduction could help to level the playing field a bit for startups that previously haven’t been able to compete with major players such as Google, Amazon, International Business Machine (IBM).
Indeed, the cost of training sophisticated machine learning models in the cloud is surprisingly expensive.
Lasse Espeholt and team cite the possibility of training agents on millions of frames per second and lowering the cost of experiments as the approaches key benefits, potentially opening RL up to a wider audience.
Synced said that the University of Washington racked up $25,000 in costs to train its Grover model, which is used to detect and generate fake news. Meanwhile, OpenAI paid $256 per hour to train its GPT-2 language model, while Google itself spend around $6,912 to train its BERT model for natural language processing tasks.
SEED RL is built on top of the TensorFlow 2.0 framework and features an architecture that takes advantage of graphics cards and tensor processing units by centralizing model inference. It works by leveraging a combination of graphics processing units and tensor processing units to centralize model inference.
Inference is performed centrally using a learner component. To avoid data transfer bottlenecks, it performs AI inference centrally with a learner component that trains model using input from distributed inference.
It uses policy gradient-based V-trace for predicting action distributions to sample actions from, and Q-learning method R2D2 to select an action based on the predictions.
The target model’s variables and state information are kept local, and observations on them are sent to the learner at every step of the process. Google’s new framework SEEN reinforcement learning also uses a network library based on the open-source universal RPC framework to minimize latency.
Architectures following a similar approach include distributed agent IMPALA, which, compared to SEED RL, supposedly has a number of drawbacks. It, for example, keeps sending parameters and intermediate model states between actors and learners, which can quickly turn into a bottleneck.
It also sticks to CPUs when applying model knowledge to a problem, which isn’t the most performant option when working with complex models as per Espeholt et al, it doesn’t utilize machine resources optimally.
The researchers at Google said the learner component of SEED RL can be scaled across thousands of cores, while the number of actors that iterate between taking steps in the environment and running inference on the model to predict the next action, can scale to thousands of machines.
SEED RL solves all this by using a learner to perform neural network inference centrally on GPUs and TPUs, the number of which can be changed depending on need.
The system also includes a batching layer to collect data from multiple actors for added efficiency. Since the model parameters and the state are kept local, data transfer is less of an issue, while observations are sent through a low latency network based on gRPC to keep things running smoothly.
The results show they managed to solve a Google Research Football task while training the model at 2.4 million frames per second using 64 Cloud Tensor Processing Unit chips.
That’s around 80 times faster than previous frameworks, Google said.
“This results in a significant speed-up in wall-clock time and, because accelerators are orders of magnitude cheaper per operation than CPUs, the cost of experiments is reduced drastically,” Lasse Espeholt, a research engineer at Google Research in Amsterdam, wrote in the company’s AI blog Monday.
“We believe SEED RL, and the results presented, demonstrate that reinforcement learning has once again caught up with the rest of the deep learning field in terms of taking advantage of accelerators.”
Constellation Research Inc. analyst Holger Mueller said that SEED RL is an amazing technology, which he said is emerging as one of the most promising AI techniques to advance next-generation applications.
“When you tweak the software to work well with hardware, you usually see major advances and that is what Google is showing here – the combination of its SEED RL library with its TPU architecture,” Mueller said.
“Not surprisingly it provides substantial performance gains over conventional solutions. This makes reinforcement learning available to the masses, although users would be locked into the Google Cloud Platform. But AI is served best in the cloud, and GCP is a very good choice for AI apps.”
Google said the code for SEED RL has been open-sourced and made available on Github, together with examples that show how to run it on Google Cloud with graphics processing units.
More in AI