Meet the new generation of humanoids: smarter, faster, and ready for the real world
Meet the new generation of humanoids: smarter, faster, and ready for the real world
How advancements in AI are giving robots the ability to perform complex tasks autonomously.
“Pick up the box the color of Darth Vader's lightsaber and place it on top of the tallest pile,” Digit, Agility's bird-legged green humanoid, is instructed as he stands in a room with piles of boxes of various heights and colors. Digit stands motionless as the system processes the human voice command. Finally, he picks up the red box and carefully places it on the highest pile.
Picking up a box and placing it in a neat pile is not an impressive action in itself for a robot; however, understanding an enigmatic human command, correctly deciphering it, and then executing it are significant innovations. Digit owes part of its progress to the generative artificial intelligence revolution, which has also reached the field of robotics, upending expectations. "I've been asked what's the biggest thing in 2024 besides language modeling — it's robotics. Period," wrote Nvidia's senior AI scientist Jim Fan in December. "We're about three years away from a ChatGPT moment for physical AI agents," he explained.
Ever since Fan made this statement, it seems everyone is talking about the "ChatGPT moment of robotics," referring to the anticipated technological breakthrough that could push the field forward and finally fill our homes with intelligent humanoid robots to help with household chores, such as washing the floor, setting the table, or doing the laundry (but not folding it). "What has been happening in recent months is dramatic," explains Amir Bousani, CEO of R-Go Robotics, which recently entered into a partnership with Nvidia to equip the robot it is developing with spatial perception capabilities. "The physical world is more difficult than the internet," notes Dr. Oren Etzioni, founding CEO of the Allen Institute for Artificial Intelligence, "but the field of robots with general behavior capabilities is advancing much faster today."
The huge interest in humanoid robots, or humanoids, which Fan refers to, is evident in the constant announcements in the field: in February, the startup Figure raised $675 million from Jeff Bezos, Nvidia, and OpenAI for the development of humanoids. In March, Nvidia's CEO stood on stage at the company's developer conference alongside nine humanoids from different companies and announced that building models for robots is "one of the most exciting problems to solve in artificial intelligence"; in April, Elon Musk promised that he would launch the humanoid robot he is developing — Optimus — next year and predicted that by 2040 there will be a billion humanoids among us. Shortly after, the activities of Mentee Robotics, Amnon Shashua's company founded two years ago, which also develops humanoids, went public. In the background, Boston Dynamics released a video of its new humanoid — Atlas — this time powered by electricity; Agility announced that it is expected to begin selling Digit for warehouse work. And at the end of May, it became clear that OpenAI decided that its previous investment was insufficient and will re-establish the robotics department that it closed in July 2021. In total, research company Insights estimates that since 2020, $2.3 billion have flowed into startups that build humanoid robots.
The great importance of visual appearance
The term "robot" covers a variety of automated devices, from robotic arms used in manufacturing to drones, autonomous cars, and vacuum cleaners. Most of them incorporate artificial intelligence and are programmed to perform specific tasks in controlled environments or under human supervision. But the goal is always to build the most autonomous device that can adapt to its environment, learn new things independently, and make quick and reasoned decisions for diverse requests. No robot embodies this ambition more than humanoids, which have garnered huge hype today for two main reasons — one practical and the other imaginative. If we want robots to do everything we don't want to do, whether at home, in warehouses, or in factories, the robot must be adapted to the house, not the house to the robot; this is where legs and the human structure, in general, are best suited to the physical environment we have built for ourselves. On the other hand, humanoids are simply fascinating and thought-provoking — the kind of things that Isaac Asimov and Philip K. Dick imagined for us decades ago and that cinema illustrated through characters like C-3PO in Star Wars or Data from Star Trek.
"The visual appearance of a robot promises what it can do and how smart it is. It has to live up to this promise or more, or the robot will not be accepted," MIT roboticist Rodney Brooks, founder of iRobot, recently said. Brooks called this principle "the first rule of robotics," a paraphrase of the rules outlined by Asimov in a 1941 story. This principle is well understood by companies in the field, who present us with an amazing future in well-edited videos: whether it's Atlas rising on two legs or Optimus watering plants, these displays of purpose spark the imagination of the public and the interest of investors.
In practice, these demos are just that: demos. "When you see robotics going out into the field, you have to remember that someone has to code every corner and every edge case in the robot, everything that happens," notes Bousani. Musk himself reminded us of this in January after enthusiastically posting a video on X titled "Optimus folding laundry." The post and video might have mistakenly led one to believe that the robot was finally managing to autonomously perform boring housework! But sharp-eyed viewers noticed that a human hand sometimes appeared in the right corner, controlling the robot from afar, forcing Musk to add a clarification: "Important note: Optimus cannot yet do this autonomously, but it will definitely be able to do it completely autonomously and in an arbitrary environment." Musk is not alone, of course; Boston Dynamics released the first video of its humanoid in development more than seven years ago, and to this day, it has not marketed a single humanoid to the public.
The gap between appearance and expectations has persisted over the years because developing humanoid robots is a very complex task. But now it seems that the field is on the brink of a leap forward, and there is a general feeling that significant progress is at hand. The ones who have sparked this hope are language models like ChatGPT, which turned artificial intelligence into a technology that end consumers use directly for the first time. Riding this wave of hype, entrepreneurs now promise to bring the same capabilities from the virtual world to the physical world and develop "robots for general tasks." Imagine ChatGPT in the physical world: just as a bot can perform a wide variety of tasks from writing a poem, summarizing an article, or transcribing a conversation without prior knowledge from the person who started the virtual interaction, so the humanoid robot can respond to any request, even if it contains an enigmatic element and is in an unknown environment — such as putting the laundry in the big closet, bringing you a can of cold soda, emptying the dishwasher, or setting the table for a "pizza" night.
Figure 01
Manufacturer: Figure AI
Country: USA
Year of establishment: 2022
Height: 167 cm
Weight: 60 kg
Speed: 1.2 meters per second
Latest model launch: March 2024
Estimated value: $30,000-150,000
Digit
Manufacturer: Agility Robotics
Country: USA
Year of establishment: 2015
Height: 175 cm
Weight: 63 kg
Speed: 1.5 meters per second
Latest model launch: February 2024
Rented to Amazon at $10-12 an hour
MenteeBot
Manufacturer: Mentee Robotics
Country: Israel
Year of establishment: 2022
Height: 175 cm
Weight: 70 kg
Speed: 1.5 meters per second
Disclosure to the public: April 2024
Atlas
Manufacturer: Boston Dynamics
Country: USA
Year of establishment: 1992
Height: 150 cm
Weight: 89 kg
Speed: 2.5 meters per second
Latest model launch: March 2024
Optimus - Gen 2
Manufacturer: Tesla
Country: USA
Year of establishment: 2003
Height: 173 cm
Weight: 60 kg
Speed: 0.6 meters per second
Disclosure to the public: December 2023
Estimated value: $30,000
Unitree H1
Country: China
Year of establishment: 2016
Height: 180 cm
Weight: 47 kg
Speed: 3.3 meters per second
Latest model launch: March 2024
Estimated value: $90,000
The new training methods
Over the years, developments in the fields of electric and autonomous vehicles have helped move robotics forward. These advancements have increased the range of batteries, improved computer vision, and contributed significantly to understanding how to enable robots to perform tasks such as climbing stairs, distinguishing between objects, or balancing if they are slipping. But behind all the movements that we see humanoid robots perform lies an open secret — each movement is planned in detail, based on a long list of specific actions. What you see is the product of laboratory experiments until the robots are able to perform the planned choreography perfectly.
Such learning requires a lot of time and has difficulty accommodating all the edge cases a robot can encounter. In robotic systems that require enormous freedom of action, it is simply too complicated to build a humanoid that can work in uncontrolled environments. This is why robots that operate independently in the field and with very limited physical capabilities — drones, vacuum cleaners, and robotic waiters — are at the forefront of development. Humanoid robots are only now being able to take the first steps in very controlled environments like warehouses or laboratories.
But in recent years, new technologies have entered the field of robotics, promising to bring about significant change, starting with "reinforcement learning." This is an autonomous learning method in which the robot tries to complete the task assigned to it and receives "rewards" in its system if it succeeds, while suffering a "loss" if it fails. At the same time, the developers use different training methods such as "imitation learning," where they place special suits on humans and then study the entire set of physical actions the person performs to pass the training to the robot. Over time, new and huge databases are built, in which each individual movement is digitized and used to train future robots.
However, these learning methods, which accumulate experiences in huge databases, take a long time and have limited use. This is where the leap in artificial intelligence comes into play, enabling machines to perform tasks on their own even in situations they have never encountered. The secret behind this is hidden in language models like ChatGPT, which are nothing more than engines for generating words according to an initial command. They don't understand the meaning of words and can't take the knowledge they accumulate and adapt it to the physical world. But the breakthrough that allowed their development is applicable in the world of robotics as well. This is mainly the ability to take a huge amount of examples — in this case, examples of human speech — to analyze them and, with the help of learning algorithms, to understand the context and give coherent answers to endless types of questions.
Now, artificial intelligence companies are trying to take these abilities and use them for training robots in the physical world. For this to work, the AI will have to reach a level of understanding that allows it to be trained on the physical world, just as it learned human speech. To this end, it will have to scan a huge number of real-world examples, capture the meaning behind them, and understand the differences between objects and tasks. This will allow it to be trained by itself and pass on the knowledge to other robots. The technological leap is not supposed to enable any robot to perform any task, but, rather, to enable robots to move and adapt to new environments faster, as well as train each other to perform tasks — like ChatGPT's ability to interact with humans and understand the context of their conversations. If these capabilities can be harnessed for the development of humanoid robots, it will allow for a significant leap forward. The technological breakthrough is already being demonstrated in the robot learning platform, developed by Nvidia and used by many robotics companies today, which enables robots to train each other in virtual worlds and acquire more skills.
However, there is still a long way to go before the average robot can meet human expectations and realize the potential that humanoids promise. "Some startups claim they are building humanoid robots with general behavior capabilities," notes Dr. Etzioni, "but the truth is that we are still far from that." To reach the desired breakthrough, a whole set of capabilities and innovations will be required that are not yet available today. For example, the main challenge in the field remains improving energy and battery efficiency to support long-lasting operations, developing flexible materials for more human-like movement, and better sensors to mimic human senses like touch and hearing.
The road to true humanoid robots that can perform complex tasks autonomously is still under construction, but the recent progress in the field has brought this vision closer to reality. The synergy of advancements in artificial intelligence, robotics, and human-robot interaction is paving the way for a future where robots could become an integral part of daily life, assisting in both mundane and sophisticated tasks.
First published: 12:42, 25.08.24