Google’s Gemini AI Completes Pokémon Blue, But It’s Not a True Benchmark
Google’s Gemini 2.5 Pro has achieved an unusual milestone: completing Pokémon Blue, a beloved Game Boy title first released in 1996. The feat was shared by Google CEO Sundar Pichai on X, where he posted, “What a finish! Gemini 2.5 Pro just completed Pokémon Blue!”
The accomplishment was part of a livestream project known as Gemini Plays Pokémon, spearheaded not by Google itself but by independent software engineer Joel Z, who describes himself as a 30-year-old unaffiliated with the tech giant. Still, the project has received enthusiastic public support from Google executives. Logan Kilpatrick, product lead for Google AI Studio, had earlier noted Gemini’s progress, writing that it had already earned its fifth gym badge — well ahead of other models using different setups.
The idea of AI models playing Pokémon isn’t new. Earlier this year, Anthropic showcased its Claude model making strides in Pokémon Red, claiming the game was a strong test of its long-context reasoning capabilities. Inspired by this, Joel Z even cited “Claude Plays Pokémon” on Twitch as a motivating example for his own project.
While Gemini seems to have pulled ahead in the classic game challenge, Joel Z emphasized that this shouldn’t be viewed as a competitive benchmark. “You can’t really make direct comparisons — Gemini and Claude have different tools and receive different information,” he explained on his Twitch page.
That’s because both AI models rely on agent harnesses, which feed them game screenshots overlaid with helpful data, letting them interpret what’s happening and decide on their next move. These harnesses often include additional modules that execute the AI’s chosen actions, such as pressing buttons or navigating menus.
Joel Z acknowledged making “developer interventions” to support Gemini’s performance, but insisted these didn’t amount to cheating. He clarified that the interventions enhance Gemini’s reasoning but don’t involve giving it step-by-step instructions or game walkthroughs. “The only thing that comes even close,” he said, “was alerting Gemini to a quirk in the game involving the Lift Key — something fixed in later versions.”
He also noted that the project is still a work in progress, with the gameplay framework actively evolving.
So while Gemini’s completion of Pokémon Blue is a fun and flashy moment for Google’s AI, it’s best seen as an experimental showcase — not a definitive measure of model superiority.