• Blog
  • My-Account
    • Cart
    • Checkout
  • About US
Sunday, August 31, 2025
  • Login
iTDAY
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games
No Result
View All Result
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games
No Result
View All Result
iTDAY
No Result
View All Result

Super Mario Emerges as an Unconventional Benchmark for AI Performance

Hana.haghani by Hana.haghani
2025-03-04
in Ai, Games, Technews
Reading Time: 2 mins read
0
A A
0
Home Ai

While many consider Pokémon a challenging benchmark for AI, a group of researchers contends that Super Mario Bros. may actually be even more difficult. The Hao AI Lab, based at the University of California San Diego, recently tested AI systems in live gameplay of Super Mario Bros. Their findings showed that Anthropic’s Claude 3.7 outperformed others, followed closely by Claude 3.5, while Google’s Gemini 1.5 Pro and OpenAI’s GPT-4 struggled.

It’s worth noting that the version of Super Mario Bros. used in the experiment wasn’t the original 1985 release; instead, it was run through an emulator integrated with a framework called GamingAgent, which allowed the AIs to control Mario.

Developed in-house by Hao, GamingAgent provided the AI with basic commands such as “move/jump left to dodge” when faced with obstacles or enemies, along with in-game screenshots. The AI produced control inputs in the form of Python code to maneuver Mario. The Hao team asserts that this setup required each AI model to “learn” how to execute complex movements and devise gameplay strategies. Interestingly, they discovered that reasoning models, such as OpenAI’s o1—which approach problems in a step-by-step manner—performed worse than non-reasoning models, even though the latter typically excel on other benchmarks. The researchers noted that one significant hurdle for reasoning models in real-time games like Super Mario Bros. is their decision-making speed; these models often take seconds to determine actions. In a game where timing is critical, even a split second can mean the difference between a successful jump and a fatal fall. For decades, games have served as a benchmark for AI capabilities. However, some experts express skepticism regarding the validity of correlating AI performance in games with broader technological advancements. Unlike real-world scenarios, games are typically abstract, simpler, and provide virtually limitless data for AI training.

The recent attention-grabbing gaming benchmarks have led Andrej Karpathy, a research scientist and founding member at OpenAI, to describe what he calls an “evaluation crisis.” In a post on X, he shared, “I don’t really know what [AI] metrics to look at right now. TLDR: my reaction is I don’t really know how good these models are right now.” At least we can enjoy watching AI play Mario.

ShareTweet
Hana.haghani

Hana.haghani

Related Posts

Taco Bell’s AI Drive-Thru Experiment Fails Due to User Trolling and AI Errors
Ai

Taco Bell’s AI Drive-Thru Experiment Fails Due to User Trolling and AI Errors

by sadaf
2025-08-31
Paramount’s Pursuit of Call of Duty Film Rights Exposes New Hollywood Strategy
Android Games

Paramount’s Pursuit of Call of Duty Film Rights Exposes New Hollywood Strategy

by sadaf
2025-08-31
Is Apple’s Adaptive Power Mode Designed for the iPhone 17 Air?
Apple

Is Apple’s Adaptive Power Mode Designed for the iPhone 17 Air?

by sadaf
2025-08-31
TikTok Expands Messaging Features, Adds Voice Notes and Images to DMs
Apps

TikTok Expands Messaging Features, Adds Voice Notes and Images to DMs

by sadaf
2025-08-31
WhatsApp and Apple Patch Zero-Click Spyware Bug
Apps

WhatsApp and Apple Patch Zero-Click Spyware Bug

by sadaf
2025-08-31
Malaysia Launches Its First Edge AI Processor with SkyeChip’s MARS1000
Ai

Malaysia Launches Its First Edge AI Processor with SkyeChip’s MARS1000

by sadaf
2025-08-30
Next Post
Tencent’s AI Bot Overtakes DeepSeek as China’s Top Choice on iPhones

Tencent’s AI Bot Overtakes DeepSeek as China’s Top Choice on iPhones

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
New AI-Powered Notification Organizer in Android 16

New AI-Powered Notification Organizer in Android 16

2025-07-08
PowerBeats Pro 2: Launch Date and Price Details Unveiled

PowerBeats Pro 2: Launch Date and Price Details Unveiled

2025-02-03
Samsung Galaxy Z Fold 7: The Thinnest, Lightest Foldable with Cutting-Edge AI and Camera Tech

Samsung Galaxy Z Fold 7: The Thinnest, Lightest Foldable with Cutting-Edge AI and Camera Tech

2025-07-10
Xiaomi Watch S4 Review: Brilliant Display, Customization Power, and Solid Fitness Features Under €200

Xiaomi Watch S4 Review: Brilliant Display, Customization Power, and Solid Fitness Features Under €200

2025-05-26
New OnePlus Open 2 leak hints at a camera feature other flagships lack

New OnePlus Open 2 leak hints at a camera feature other flagships lack

0
Xfinity, Metro customers face Samsung Galaxy S25 Ultra activation problems

Xfinity, Metro customers face Samsung Galaxy S25 Ultra activation problems

0
Starting tomorrow, Apple might have to raise iPhone prices in the U.S.

Starting tomorrow, Apple might have to raise iPhone prices in the U.S.

0
Four Years Later, 60fps Bloodborne Patch Gets Taken Down By Sony

Four Years Later, 60fps Bloodborne Patch Gets Taken Down By Sony

0
Samsung Galaxy S26 Edge Won’t Get a Silicon-Carbon Battery, But Gains a Bigger Power Pack

Samsung Galaxy S26 Edge Won’t Get a Silicon-Carbon Battery, But Gains a Bigger Power Pack

2025-08-31
Taco Bell’s AI Drive-Thru Experiment Fails Due to User Trolling and AI Errors

Taco Bell’s AI Drive-Thru Experiment Fails Due to User Trolling and AI Errors

2025-08-31
Paramount’s Pursuit of Call of Duty Film Rights Exposes New Hollywood Strategy

Paramount’s Pursuit of Call of Duty Film Rights Exposes New Hollywood Strategy

2025-08-31
Is Apple’s Adaptive Power Mode Designed for the iPhone 17 Air?

Is Apple’s Adaptive Power Mode Designed for the iPhone 17 Air?

2025-08-31
iTDAY

ITDAY is a technology-focused platform covering the latest tech trends, news, and innovations in the worldwide. It likely provides articles, reviews, and insights on advancements in the tech industry.

© 2025 itDay - All rights reserved for the website of the latest technologies in the World.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games

© 2025 itDay - All rights reserved for the website of the latest technologies in the World.