In a recently published paper, researchers from Google Brain and the University of Calfornia at Berkeley presents an AI technique that enables an agent, for example, a robot to make various decisions like which action to perform while performing a previous action, the goal here is to make the system less failure-prone.
The researchers said while AI algorithms have achieved success in video games, robotic grasping, most use a blocking observe-think-act paradigm an agent assumes that its environment will remain static while it “thinks” so its actions will be executed on the same states from which they were computed.
This holds true in simulation but not in the real world, where the environment state evolves as the agent processes observations and plans its next actions. The solution that the team of researchers present is a framework that can handle concurrent environments in the context of machine learning.
It leverages standard reinforcement learning formulations, formulations that drive an agent present in an environment toward goals via rewards wherein an agent receives a state from a set of possible states and selects an action from some set of possible actions according to policy i.e state-action pair set.
The environment returns the next state sampled from a transition distribution and a reward, such that the agent learns to maximize the expected return i.e the sum of the total from the next state.
In addition to the previous action, two additional features action selection time and vector-to-go (VTG) help to encapsulate concurrent knowledge. (The researchers define vector-to-go (VTG) as the last action to be executed the instant the state of the environment is measured.)
Concurrent action environments capture the state while the previous action is being executed and after the state is captured. And the policy selects an action and executes it regardless of whether the previous action has been completed — even if that necessitates interrupting the previous action.
The researchers conducted experiments on a real-world robot arm, which they tasked with grasping and moving various objects from a bin. They say their framework achieved grasp success comparable to a baseline blocking model but that it was 49% faster than the blocking model in terms of policy duration.
Moreover, the concurrent model was able to execute “smoother” and swifter trajectories than the baseline.
“Concurrent methods may allow robotic control in dynamic environments where it is not possible for the robot to stop the environment before computing the action,” wrote the coauthors. “In these scenarios, robots must truly think and act at the same time.”
The work follows a Google-led study describing an AI system that learned from the motions of animals to give robots greater agility. The coauthors believed their approach could bolster the development of robots that can complete tasks in the real world, such as transporting materials between multilevel warehouses and fulfillment centers.
Link to paper:
More in AI