Paper Review: Real-World Humanoid Locomotion with Reinforcement Learning

One of the problems with modern AI is that it is not able to train itself and do anything more than what a human has taught it. This is especially noticeable in robotics.

How do modern robots work?

I’ll say right away that I’m not an expert specifically in robotics, but I have some ideas about it. Most modern robots are trained in specific skills: for example, training a neural network so that the robot can pick up objects. Moreover, there are cases when a certain sequence of actions is strictly programmed by a person. It is clear that it is impossible to foresee all possible scenarios that the robot will have to face in reality. Therefore, ideally, the robot itself should decide what and how to do.

What did the scientists suggest?

Scientists from Berkeley have released a paper that presents a neural network that allows a robot weighing 45 kg and height 1.6 meters to perform various actions without human intervention. They managed to achieve this by simulating various environments: many scenarios are randomly generated, and the AI tries to predict what the robot should do best. The authors state that more than 10 billion environments were created per day to train the neural network.

0:00

/0:15

An example of a generated training environment

It is interesting that a lightweight transformer was chosen as the architecture of the neural network, which receives as input the state of the environment and the actions performed by the robot before. The model's predictions are the next actions the robot should take. All this is very reminiscent of language models that predict the next words from the previous ones.

Interesting moments

This work clearly shows how a person actually learns in most cases: he interacts with the environment and guesses what to do next.
The transformer model has only 1.6 million parameters, which is very small. Modern neural networks running on phones are much larger.
To generate scenarios and train the neural network, 4 NVIDIA A100 graphics cards were used – by current standards, this is very small

Demo

In the video above, we can see several demonstration examples of the proposed method. For example, the robot itself understood that when walking it is necessary to move its arms, and these movements must be synchronized with the legs – I repeat, no one taught it this. Another interesting thing is that the robot copes with unexpected situations: being hit with a stick and being hit by a yoga ball.

Paper Review: Real-World Humanoid Locomotion with Reinforcement Learning

How do modern robots work?

What did the scientists suggest?

Interesting moments

Demo

ReMix: Training Generalized Person Re-identification on a Mixture of Data

Paper Review: Matryoshka Representation Learning

Paper Review: TETRIS: Towards Exploring the Robustness of Interactive Segmentation

How OpenAI's Sora works: key insights

Paper Review: Weak-to-Strong Generalization by OpenAI