• Blog
  • My-Account
    • Cart
    • Checkout
  • About US
Wednesday, September 3, 2025
  • Login
iTDAY
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games
No Result
View All Result
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games
No Result
View All Result
iTDAY
No Result
View All Result

Super Mario Emerges as an Unconventional Benchmark for AI Performance

Hana.haghani by Hana.haghani
2025-03-04
in Ai, Games, Technews
Reading Time: 2 mins read
0
A A
0
Home Ai

While many consider Pokémon a challenging benchmark for AI, a group of researchers contends that Super Mario Bros. may actually be even more difficult. The Hao AI Lab, based at the University of California San Diego, recently tested AI systems in live gameplay of Super Mario Bros. Their findings showed that Anthropic’s Claude 3.7 outperformed others, followed closely by Claude 3.5, while Google’s Gemini 1.5 Pro and OpenAI’s GPT-4 struggled.

It’s worth noting that the version of Super Mario Bros. used in the experiment wasn’t the original 1985 release; instead, it was run through an emulator integrated with a framework called GamingAgent, which allowed the AIs to control Mario.

Developed in-house by Hao, GamingAgent provided the AI with basic commands such as “move/jump left to dodge” when faced with obstacles or enemies, along with in-game screenshots. The AI produced control inputs in the form of Python code to maneuver Mario. The Hao team asserts that this setup required each AI model to “learn” how to execute complex movements and devise gameplay strategies. Interestingly, they discovered that reasoning models, such as OpenAI’s o1—which approach problems in a step-by-step manner—performed worse than non-reasoning models, even though the latter typically excel on other benchmarks. The researchers noted that one significant hurdle for reasoning models in real-time games like Super Mario Bros. is their decision-making speed; these models often take seconds to determine actions. In a game where timing is critical, even a split second can mean the difference between a successful jump and a fatal fall. For decades, games have served as a benchmark for AI capabilities. However, some experts express skepticism regarding the validity of correlating AI performance in games with broader technological advancements. Unlike real-world scenarios, games are typically abstract, simpler, and provide virtually limitless data for AI training.

The recent attention-grabbing gaming benchmarks have led Andrej Karpathy, a research scientist and founding member at OpenAI, to describe what he calls an “evaluation crisis.” In a post on X, he shared, “I don’t really know what [AI] metrics to look at right now. TLDR: my reaction is I don’t really know how good these models are right now.” At least we can enjoy watching AI play Mario.

ShareTweet
Hana.haghani

Hana.haghani

Related Posts

Xiaomi Expands Into Electric Vehicles and Semiconductors
Cars

Xiaomi Expands Into Electric Vehicles and Semiconductors

by sadaf
2025-09-02
Windows 11 August Update Brings AI Assistant and Smart Recovery Tools
Laptop

Windows 11 August Update Brings AI Assistant and Smart Recovery Tools

by sadaf
2025-09-02
Telegram Introduces Profile Playlists, Gift-Based Themes, and More
Apps

Telegram Introduces Profile Playlists, Gift-Based Themes, and More

by sadaf
2025-09-02
FTC Puts Gmail Under Spotlight Over Alleged Political Bias
Apps

FTC Puts Gmail Under Spotlight Over Alleged Political Bias

by sadaf
2025-09-02
From Forza to GTA: The Best Upcoming Racing and Driving Games You Can’t Miss
Games

From Forza to GTA: The Best Upcoming Racing and Driving Games You Can’t Miss

by Admin First
2025-08-31
Taco Bell’s AI Drive-Thru Experiment Fails Due to User Trolling and AI Errors
Ai

Taco Bell’s AI Drive-Thru Experiment Fails Due to User Trolling and AI Errors

by sadaf
2025-08-31
Next Post
Tencent’s AI Bot Overtakes DeepSeek as China’s Top Choice on iPhones

Tencent’s AI Bot Overtakes DeepSeek as China’s Top Choice on iPhones

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
New AI-Powered Notification Organizer in Android 16

New AI-Powered Notification Organizer in Android 16

2025-07-08
PowerBeats Pro 2: Launch Date and Price Details Unveiled

PowerBeats Pro 2: Launch Date and Price Details Unveiled

2025-02-03
Samsung Galaxy Z Fold 7: The Thinnest, Lightest Foldable with Cutting-Edge AI and Camera Tech

Samsung Galaxy Z Fold 7: The Thinnest, Lightest Foldable with Cutting-Edge AI and Camera Tech

2025-07-10
Xiaomi Watch S4 Review: Brilliant Display, Customization Power, and Solid Fitness Features Under €200

Xiaomi Watch S4 Review: Brilliant Display, Customization Power, and Solid Fitness Features Under €200

2025-05-26
New OnePlus Open 2 leak hints at a camera feature other flagships lack

New OnePlus Open 2 leak hints at a camera feature other flagships lack

0
Xfinity, Metro customers face Samsung Galaxy S25 Ultra activation problems

Xfinity, Metro customers face Samsung Galaxy S25 Ultra activation problems

0
Starting tomorrow, Apple might have to raise iPhone prices in the U.S.

Starting tomorrow, Apple might have to raise iPhone prices in the U.S.

0
Four Years Later, 60fps Bloodborne Patch Gets Taken Down By Sony

Four Years Later, 60fps Bloodborne Patch Gets Taken Down By Sony

0
Xiaomi Expands Into Electric Vehicles and Semiconductors

Xiaomi Expands Into Electric Vehicles and Semiconductors

2025-09-02
Windows 11 August Update Brings AI Assistant and Smart Recovery Tools

Windows 11 August Update Brings AI Assistant and Smart Recovery Tools

2025-09-02
Telegram Introduces Profile Playlists, Gift-Based Themes, and More

Telegram Introduces Profile Playlists, Gift-Based Themes, and More

2025-09-02
FTC Puts Gmail Under Spotlight Over Alleged Political Bias

FTC Puts Gmail Under Spotlight Over Alleged Political Bias

2025-09-02
iTDAY

ITDAY is a technology-focused platform covering the latest tech trends, news, and innovations in the worldwide. It likely provides articles, reviews, and insights on advancements in the tech industry.

© 2025 itDay - All rights reserved for the website of the latest technologies in the World.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games

© 2025 itDay - All rights reserved for the website of the latest technologies in the World.