Peter Chen, CEO of bot software company Covariant, sits in front of a chatbot interface similar to the one used to communicate with ChatGPT. “Show me the handbag in front of you,” he typed. In response, a video emerged showing a robotic arm dangling from a box containing various items – a pair of socks, a tube of potato chips, and an apple among them.
The chatbot can discuss the items it sees and can also manipulate them. When Wired suggested that Chen let it grab a piece of fruit, its arms would reach down, gently grab the apple, and move it to another nearby box.
This hands-on chatbot is a step toward giving bots the versatile and flexible functionality demonstrated by programs like ChatGPT. It is hoped that artificial intelligence will eventually solve the long-standing problem of programming robots and allow them to do more than limited household tasks.
“At this point, it’s not controversial to say that foundational models are the future of robotics,” Chen said, using the term for large-scale, general-purpose machine learning models developed for a specific domain. The handy chatbot he showed me was powered by a model developed by Covariant called RFM-1 (Robot Foundation Model). Like the bot behind ChatGPT, Google Gemini, and other chatbots, it’s trained on large amounts of text, but it’s also trained on video and hardware control and movement data from tens of millions of examples of robot movement from physical labor. world.
Including the additional data produces a model that is fluent not only in language but also in movement and is able to connect the two. Not only can the RFM-1 chat and control the robot’s arms, it can also generate videos showing the robot doing different chores. When prompted, the RFM-1 will show how the robot should grab objects from a cluttered bin. “It can accept all these different modes that are important for robotics, and it can output any of them,” Chen said. “It’s kind of exciting.”
The model also showed that it can learn to control similar hardware not found in the training data. With further training, this could even mean that the same general model can operate humanoid robots, said Pieter Abbeel, Covariant co-founder and chief scientist and a pioneer in robotic learning. In 2010, he led a project to train a robot to fold towels (albeit slowly), and he also worked at OpenAI before it stopped doing robotics research.
Covariant, founded in 2017, currently sells software that uses machine learning to let robotic arms pick items from warehouse bins, but they are typically limited to the tasks they have been trained on. Abeel says models like the RFM-1 allow robots to more fluidly pivot their grippers to new tasks. He compared Covariant’s strategy to how Tesla uses data from the cars it has sold to train its self-driving algorithms. “It’s the same thing we’re doing now,” he said.
Abeel and his Covariant colleagues aren’t the only roboticists hoping that the power of the large language models behind ChatGPT and similar programs might revolutionize robotics. Programs such as RFM-1 have shown promising early results. But how much data might be needed to train models to give robots more general capabilities, and how that data is collected, remains an open question.