• Blog
  • My-Account
    • Cart
    • Checkout
  • About US
Monday, November 3, 2025
  • Login
iTDAY
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games
No Result
View All Result
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games
No Result
View All Result
iTDAY
No Result
View All Result

Super Mario Emerges as an Unconventional Benchmark for AI Performance

Hana.haghani by Hana.haghani
2025-03-04
in Ai, Games, Technews
Reading Time: 2 mins read
0
A A
0
Home Ai

While many consider Pokémon a challenging benchmark for AI, a group of researchers contends that Super Mario Bros. may actually be even more difficult. The Hao AI Lab, based at the University of California San Diego, recently tested AI systems in live gameplay of Super Mario Bros. Their findings showed that Anthropic’s Claude 3.7 outperformed others, followed closely by Claude 3.5, while Google’s Gemini 1.5 Pro and OpenAI’s GPT-4 struggled.

It’s worth noting that the version of Super Mario Bros. used in the experiment wasn’t the original 1985 release; instead, it was run through an emulator integrated with a framework called GamingAgent, which allowed the AIs to control Mario.

Developed in-house by Hao, GamingAgent provided the AI with basic commands such as “move/jump left to dodge” when faced with obstacles or enemies, along with in-game screenshots. The AI produced control inputs in the form of Python code to maneuver Mario. The Hao team asserts that this setup required each AI model to “learn” how to execute complex movements and devise gameplay strategies. Interestingly, they discovered that reasoning models, such as OpenAI’s o1—which approach problems in a step-by-step manner—performed worse than non-reasoning models, even though the latter typically excel on other benchmarks. The researchers noted that one significant hurdle for reasoning models in real-time games like Super Mario Bros. is their decision-making speed; these models often take seconds to determine actions. In a game where timing is critical, even a split second can mean the difference between a successful jump and a fatal fall. For decades, games have served as a benchmark for AI capabilities. However, some experts express skepticism regarding the validity of correlating AI performance in games with broader technological advancements. Unlike real-world scenarios, games are typically abstract, simpler, and provide virtually limitless data for AI training.

The recent attention-grabbing gaming benchmarks have led Andrej Karpathy, a research scientist and founding member at OpenAI, to describe what he calls an “evaluation crisis.” In a post on X, he shared, “I don’t really know what [AI] metrics to look at right now. TLDR: my reaction is I don’t really know how good these models are right now.” At least we can enjoy watching AI play Mario.

ShareTweet
Hana.haghani

Hana.haghani

Related Posts

NLRB Dismisses Complaint Against Apple CEO Over Employee Rights
Ai

Tim Cook: Apple Will Buy to Accelerate AI Roadmap

by sadaf
2025-11-02
Meta Signs Three Solar Deals This Week, Hitting About 1 Gigawatt Capacity
Ai

Meta Signs Three Solar Deals This Week, Hitting About 1 Gigawatt Capacity

by sadaf
2025-11-02
Musk Says Altman Did Get His Roadster Refund — Altman Says Otherwise
Technews

Musk Says Altman Did Get His Roadster Refund — Altman Says Otherwise

by sadaf
2025-11-02
WhatsApp Goes Passwordless: Passkeys Now Secure Your Encrypted Backups
Apps

WhatsApp Goes Passwordless: Passkeys Now Secure Your Encrypted Backups

by sadaf
2025-11-02
Weavy Joins Figma in Push Toward Generative Content Workflows
Startups

Weavy Joins Figma in Push Toward Generative Content Workflows

by sadaf
2025-11-02
Gemini Advanced Comes to Jio Users With Free Cloud and AI Tools
Ai

Gemini Advanced Comes to Jio Users With Free Cloud and AI Tools

by sadaf
2025-11-02
Next Post
Tencent’s AI Bot Overtakes DeepSeek as China’s Top Choice on iPhones

Tencent’s AI Bot Overtakes DeepSeek as China’s Top Choice on iPhones

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
New AI-Powered Notification Organizer in Android 16

New AI-Powered Notification Organizer in Android 16

2025-07-08
PowerBeats Pro 2: Launch Date and Price Details Unveiled

PowerBeats Pro 2: Launch Date and Price Details Unveiled

2025-02-03
Samsung Galaxy Z Fold 7: The Thinnest, Lightest Foldable with Cutting-Edge AI and Camera Tech

Samsung Galaxy Z Fold 7: The Thinnest, Lightest Foldable with Cutting-Edge AI and Camera Tech

2025-07-10
Best Tablets of 2025: Top Picks You Can Buy Right Now

Best Tablets of 2025: Top Picks You Can Buy Right Now

2025-02-02
New OnePlus Open 2 leak hints at a camera feature other flagships lack

New OnePlus Open 2 leak hints at a camera feature other flagships lack

0
Xfinity, Metro customers face Samsung Galaxy S25 Ultra activation problems

Xfinity, Metro customers face Samsung Galaxy S25 Ultra activation problems

0
Starting tomorrow, Apple might have to raise iPhone prices in the U.S.

Starting tomorrow, Apple might have to raise iPhone prices in the U.S.

0
Four Years Later, 60fps Bloodborne Patch Gets Taken Down By Sony

Four Years Later, 60fps Bloodborne Patch Gets Taken Down By Sony

0
NLRB Dismisses Complaint Against Apple CEO Over Employee Rights

Tim Cook: Apple Will Buy to Accelerate AI Roadmap

2025-11-02
Meta Signs Three Solar Deals This Week, Hitting About 1 Gigawatt Capacity

Meta Signs Three Solar Deals This Week, Hitting About 1 Gigawatt Capacity

2025-11-02
Musk Says Altman Did Get His Roadster Refund — Altman Says Otherwise

Musk Says Altman Did Get His Roadster Refund — Altman Says Otherwise

2025-11-02
WhatsApp Goes Passwordless: Passkeys Now Secure Your Encrypted Backups

WhatsApp Goes Passwordless: Passkeys Now Secure Your Encrypted Backups

2025-11-02
iTDAY

ITDAY is a technology-focused platform covering the latest tech trends, news, and innovations in the worldwide. It likely provides articles, reviews, and insights on advancements in the tech industry.

© 2025 itDay - All rights reserved for the website of the latest technologies in the World.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games

© 2025 itDay - All rights reserved for the website of the latest technologies in the World.