Last year, SIMA (Scalable Instructable Multiworld Agent) was introduced as a generalist AI capable of following basic instructions across various virtual environments. This marked a significant initial step in enabling AI to translate language into meaningful actions within complex 3D worlds.
SIMA 2 represents the next advancement in developing general and helpful AI agents. Through the integration of advanced Gemini models, SIMA has evolved from a simple instruction-follower into an interactive gaming companion. SIMA 2 can now follow human-language instructions, reason about its objectives, engage in conversations with users, and enhance its own performance over time.
This development signifies a substantial move towards Artificial General Intelligence (AGI), with considerable implications for the future of robotics and AI embodiment.
The Power of Reasoning
The initial SIMA version acquired over 600 language-following skills, such as “turn left,” “climb the ladder,” and “open the map,” across various commercial video games. It interacted with these environments similarly to a human player, observing the screen and using a virtual keyboard and mouse for navigation, without direct access to game mechanics.
SIMA 2 has advanced beyond mere instruction-following. By incorporating a Gemini model as its core, SIMA 2 can now process instructions with thought and reasoning, rather than simply reacting.


MineDojo: SIMA 1 (left) attempts to follow the instruction while SIMA 2 (right) successfully completes the task in a game it has never seen before.


ASKA: SIMA 1 (left) attempts to follow the instruction “Find a campfire” while SIMA 2 (right) successfully completes the task in a game it has never seen before.
SIMA 2’s architecture incorporates Gemini’s robust reasoning capabilities, enabling it to comprehend a user’s overarching objective, engage in complex reasoning to achieve it, and execute goal-oriented actions within games with proficiency.
SIMA 2 was trained using a combination of human demonstration videos with language labels and Gemini-generated labels. Consequently, SIMA 2 can now articulate its intentions to the user and outline the steps it is undertaking to achieve its goals.


Moving beyond simple instruction following: SIMA 2 can answer the user’s questions and also reasons about its own behavior as well as its environment.


Moving beyond simple instruction following: SIMA 2 can answer the user’s questions and also reasons about its own behavior as well as its environment.


Moving beyond simple instruction following: SIMA 2 can answer the user’s questions and also reasons about its own behavior as well as its environment.
Testing has revealed that interacting with the agent feels less like issuing commands and more like collaborating with a companion capable of reasoning about the task.
Through collaborations with existing and new game partners, SIMA 2 has been trained and evaluated across a broader range of games.
This is the power of Gemini brought to embodied AI: a world-class reasoning engine that can now perceive, understand, and take action in complex, interactive 3D environments.


SIMA 2 interprets abstract concepts and logical commands by reasoning about its environment and the user’s intent.


SIMA 2 interprets abstract concepts and logical commands by reasoning about its environment and the user’s intent.


SIMA 2 interprets abstract concepts and logical commands by reasoning about its environment and the user’s intent.


SIMA 2 interprets abstract concepts and logical commands by reasoning about its environment and the user’s intent.
A Leap in Generalization Performance
The integration of Gemini has resulted in enhanced generalization and reliability. SIMA 2 demonstrates a greater capacity to understand and execute complex and nuanced instructions compared to its previous version. This is particularly evident in unfamiliar games or situations, such as the Viking survival game ASKA, or MineDojo, a research implementation of Minecraft.
SIMA 2 can understand and accomplish long and complex tasks


SIMA 2 is successful at carrying out long and complex instructions.


SIMA 2 tackles a completely new game with no prior training, demonstrating impressive progress.


SIMA 2 is successful at carrying out long and complex instructions.


SIMA 2 is successful at carrying out long and complex instructions.
SIMA 2 understands multimodal prompts


User is drawing a sketch on the screen.


User is drawing a sketch on the screen.


User is drawing a sketch on the screen.
SIMA 2 can understand different languages and even emojis


See how it correctly interprets emojis to execute tasks.


See how it follows commands in different languages to execute tasks.
Furthermore, SIMA 2’s ability to transfer learned concepts—such as applying its understanding of “mining” in one game to “harvesting” in another—is crucial for achieving the broad generalization observed in human cognition. This capability brings SIMA 2’s performance considerably closer to that of a human player across diverse tasks.


SIMA 2 can generalise actions across multiple games, including games it wasn’t trained on (like MineDojo and ASKA).
Task completion success rates for SIMA 1, SIMA 2, and humans across evaluation tasks in all training game environments indicate that SIMA 2 significantly narrows the gap to human performance. It is important to note that SIMA 1’s performance is reported against a new, expanded, and more challenging set of evaluations, encompassing a broader range of environments and more complex instructions
Task completion success rates for SIMA 1 and SIMA 2 on held-out (never before seen during training) games: ASKA and MineDojo (a Minecraft research implementation).
The Ultimate Test: Playing in Newly-Imagined Worlds
To assess the full extent of SIMA 2’s generalization capabilities, it was combined with another innovative research project, Genie 3, which generates new, real-time 3D simulated worlds from a single image or text prompt.
When SIMA 2 was tasked with playing in these newly generated worlds, it successfully oriented itself, understood user instructions, and performed meaningful actions towards objectives, even though it had never encountered such environments previously. This showcased an exceptional level of adaptability.


SIMA 2 plays in newly generated worlds by Genie 3.


SIMA 2 plays in newly generated worlds by Genie 3.


SIMA 2 plays in newly generated worlds by Genie 3.


SIMA 2 plays in newly generated worlds by Genie 3.
Towards Scalable, Multitask Self-Improvement
A notable new capability of SIMA 2 is its capacity for self-improvement. Throughout its training, SIMA 2 agents have been observed performing increasingly complex and novel tasks, leveraging trial-and-error and Gemini-based feedback.
For instance, after initial learning from human demonstrations, SIMA 2 can learn in new games solely through self-directed play, developing skills in unfamiliar worlds without extra human-generated data. Subsequent training can then utilize SIMA 2’s own experience data to train a more advanced version of the agent. The agent’s self-improvement capability was also applied in newly created Genie environments, marking a significant step towards training general agents across varied, generated worlds.
The SIMA 2 self-improvement cycle begins with Gemini providing an initial task and an estimated reward for SIMA 2’s behavior. This information is then added to a bank of self-generated experience, which the agent uses for further training in subsequent generations. This process allows the agent to improve on previously failed tasks entirely independently of human-generated demonstrations and intervention.
This virtuous cycle of iterative improvement paves the way for a future where agents can learn and grow with minimal human intervention, becoming open-ended learners in embodied AI.


ASKA: Examples on the left show tasks where the initial SIMA 2 agent failed, while the right demonstrates SIMA 2’s self-improvement over generations of training, achieved without human feedback or gameplay data.


Genie 3 environment: The agent is improving over one generation of training in a genie 3 environment it has never seen before.
Looking to the Future: The Journey to General Embodied Intelligence
SIMA 2’s ability to operate across diverse gaming environments is a crucial proving ground for general intelligence, allowing agents to master skills, practice complex reasoning, and learn continuously through self-directed play.
While SIMA 2 marks a significant advance toward generalist, interactive, embodied intelligence, it remains a research project with current limitations that point to crucial areas for future development. The agents still encounter difficulties with very long-horizon, complex tasks demanding extensive, multi-step reasoning and goal verification. SIMA 2 also possesses a relatively short memory of its interactions, requiring a limited context window for low-latency engagement. Furthermore, executing precise, low-level actions through keyboard and mouse interfaces and achieving robust visual comprehension of intricate 3D scenes are ongoing challenges for the field.
This research provides a fundamental validation for a new path in action-oriented AI. SIMA 2 confirms that an AI trained for broad competency, leveraging diverse multi-world data and the powerful reasoning of Gemini, can successfully unify the capabilities of many specialized systems into one coherent, generalist agent.
SIMA 2 also offers a strong path toward application in robotics. The skills it learned – from navigation and tool use to collaborative task execution – are some of the fundamental building blocks for the physical embodiment of intelligence needed for future AI assistants in the physical world.
Responsible Development
SIMA 2 is an interactive, human-centered agent that offers engaging interactions, especially in how it articulates its own reasoning. Like all advanced and foundational technologies, SIMA 2 is being developed responsibly from its inception. This commitment is especially pertinent concerning its technical innovations, particularly its self-improvement capabilities.
During SIMA 2’s development, collaboration occurred with a Responsible Development & Innovation Team. As potential applications are explored, SIMA 2 is being released as a limited research preview, offering early access to a select group of academics and game developers. This strategy aims to collect essential feedback and interdisciplinary insights while investigating this new domain and enhancing understanding of risks and their appropriate mitigations. Further collaboration with the community is anticipated to ensure responsible development of this technology.

