It was only a matter of time before the Google Gemini AI ended up in actual humanoid robots. The sci-fi fans will not like this.

The company has revealed a pair of Gemini 2.0-based AI models that it says will “lay the foundation for a new generation of helpful robots” that can “perform a wider range of real-world tasks than ever before.”

The first one is called Gemini Robotics and it doesn’t even need to be trained in a situation to understand it and act upon it. It “leverages Gemini’s world understanding to generalise to novel situations and solve a wide variety of tasks out of the box, including tasks it has never seen before in training.”

Google says this vision-language-action model is “intuitively interactive” and more dextrous than than previous models and “represents a substantial step in performance on all three axes, getting us closer to truly general purpose robots.”

A video shows Gemini Robotics responding to commands to “move pen to go with other pencils” or “pick up the basketball and slam dunk it.”

Google also shows how the model can respond to rapidly changing environments. When asked to put bananas in a clear container, the robot is able to carry out the tasks even when the human messes with it by moving the container all around the table.

Google’s blog shows Gemini Robotics playing tic-tac-toe, spelling out words from letters on the table, playing cards and packing a lunch.

The second model is called Gemini Robotics-ER (which stands for embodied reasoning), which the company says offers “advanced spatial understanding, enabling roboticists to run their own programs using Gemini’s embodied reasoning (ER) abilities.”

The company adds: “Gemini Robotics-ER excels at embodied reasoning capabilities including detecting objects and pointing at object parts, finding corresponding points and detecting objects in 3D.”