• Blog
  • My-Account
    • Cart
    • Checkout
  • Contact US
  • About US
Saturday, June 21, 2025
  • Login
iTDAY
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games
No Result
View All Result
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games
No Result
View All Result
iTDAY
No Result
View All Result

OpenAI’s o3 AI model scores lower on a benchmark than the company initially implied

Hana.haghani by Hana.haghani
2025-04-21
in Ai, Technews
Reading Time: 2 mins read
0
A A
0
Home Ai

Concerns regarding OpenAI’s transparency and model evaluation practices have arisen due to a disparity between first- and third-party benchmark results for its o3 AI model. When OpenAI introduced o3 in December, they claimed it could accurately answer just over 25% of questions from the challenging FrontierMath problem set, greatly outperforming the competition, where the next-best model only managed about 2%.

During a livestream, Mark Chen, OpenAI’s chief research officer, stated, “Currently, all other products are below 2% on FrontierMath. Internally, we see o3 achieving over 25% under intensive testing conditions.” However, this percentage appears to represent an upper limit reached by a more robust version of o3 compared to the one OpenAI released publicly last week.

On Friday, Epoch AI, the organization behind FrontierMath, disclosed the outcomes of its independent benchmarking of o3, which showed a score around 10%, significantly below OpenAI’s reported peak performance.

This discrepancy does not necessarily imply that OpenAI was dishonest. The benchmarks published by the company in December align with the lower score observed by Epoch. Epoch also pointed out potential differences in their testing methodologies, and they utilized a more recent version of FrontierMath for their analysis. They noted, “The variance between our results and OpenAI’s may stem from their use of a stronger internal framework, more test-time computing resources, or variations in the subsets of FrontierMath used for evaluation.”

A post on X from the ARC Prize Foundation, which tested a pre-release version of o3, stated that the currently available o3 model “is a different model…designed for chat/product applications,” supporting Epoch’s findings. ARC Prize mentioned that “all released o3 compute tiers are smaller than the version we benchmarked,” indicating that larger compute tiers typically yield higher benchmark scores.

Wenda Zhou, a technical staff member at OpenAI, mentioned during a livestream that the production version of o3 is “more optimized for practical applications” and faster than the version presented in December, which may result in benchmark “disparities.” He remarked, “[W]e’ve made adjustments to enhance cost-efficiency and functionality. We still believe this is a superior model…answers will come faster, addressing a genuine issue with these models.”

Despite the public release of o3 not meeting OpenAI’s initial testing claims, it is somewhat overshadowed by the fact that their o3-mini-high and o4-mini models outperform o3 in FrontierMath. Additionally, OpenAI has plans to launch a more advanced variant, o3-pro, in the near future.

This situation underscores the importance of not taking AI benchmark claims at face value, especially from companies with vested interests in the outcomes. Benchmarking controversies are increasingly common within the AI sector as vendors strive to attract attention and recognition with their latest models.

In January, Epoch faced criticism for delaying the announcement of its funding from OpenAI until after o3’s release, leaving many academic contributors unaware of OpenAI’s involvement until it was publicly disclosed. More recently, Elon Musk’s xAI was accused of sharing misleading benchmark data for its latest AI model, Grok 3, and just this month, Meta acknowledged promoting benchmark scores from a version of a model that differed from what was ultimately made available to developers.

ShareTweet
Hana.haghani

Hana.haghani

Related Posts

JBL Launches New Bar Series Soundbars with Dolby Atmos Technology
Technews

JBL Launches New Bar Series Soundbars with Dolby Atmos Technology

by Hana.haghani
2025-06-10
WWDC 2025 Unveils Enhanced AirPods Capabilities with Professional Recording and Remote Camera Control
buds

WWDC 2025 Unveils Enhanced AirPods Capabilities with Professional Recording and Remote Camera Control

by Hana.haghani
2025-06-10
Apple Watch Compass App’s Backtrack Feature Provides Offline Navigation Safety for Outdoor Adventures
Technews

Apple Watch Compass App’s Backtrack Feature Provides Offline Navigation Safety for Outdoor Adventures

by Hana.haghani
2025-06-10
iOS 26 introduces refreshed Photos app interface following user concerns.
Apps

iOS 26 introduces refreshed Photos app interface following user concerns.

by Hana.haghani
2025-06-10
Mark Zuckerberg’s Vision for a New Superintelligence AI Team
Technews

Mark Zuckerberg’s Vision for a New Superintelligence AI Team

by Hana.haghani
2025-06-10
Anticipated Features of the Upcoming AirTag 2 from Apple
Gadjet

Anticipated Features of the Upcoming AirTag 2 from Apple

by Hana.haghani
2025-06-09
Next Post
Google, DOJ Go Back to Court to Fight Over Search Monopoly Fix

Google, DOJ Go Back to Court to Fight Over Search Monopoly Fix

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
PowerBeats Pro 2: Launch Date and Price Details Unveiled

PowerBeats Pro 2: Launch Date and Price Details Unveiled

2025-02-03
Galaxy S25 Ultra vs OnePlus 13: Possibly the most fun comparison for 2025

Galaxy S25 Ultra vs OnePlus 13: Possibly the most fun comparison for 2025

2025-02-03
Meta’s Ray-Ban smart glasses are having their Super Bowl moment

Meta’s Ray-Ban smart glasses are having their Super Bowl moment

2025-02-11
The 2025 Porsche 911 GT3: Incremental Upgrades to Perfection

The 2025 Porsche 911 GT3: Incremental Upgrades to Perfection

2025-02-05
Huawei Boosts Smartwatch Sales by Registering Them as Medical Devices

Huawei Boosts Smartwatch Sales by Registering Them as Medical Devices

2
New OnePlus Open 2 leak hints at a camera feature other flagships lack

New OnePlus Open 2 leak hints at a camera feature other flagships lack

0
Xfinity, Metro customers face Samsung Galaxy S25 Ultra activation problems

Xfinity, Metro customers face Samsung Galaxy S25 Ultra activation problems

0
Starting tomorrow, Apple might have to raise iPhone prices in the U.S.

Starting tomorrow, Apple might have to raise iPhone prices in the U.S.

0
Apple Introduces Revolutionary “Liquid Glass” Interface Across All Operating Systems

Apple Introduces Revolutionary “Liquid Glass” Interface Across All Operating Systems

2025-06-10
JBL Launches New Bar Series Soundbars with Dolby Atmos Technology

JBL Launches New Bar Series Soundbars with Dolby Atmos Technology

2025-06-10
WWDC 2025 Unveils Enhanced AirPods Capabilities with Professional Recording and Remote Camera Control

WWDC 2025 Unveils Enhanced AirPods Capabilities with Professional Recording and Remote Camera Control

2025-06-10
Apple Watch Compass App’s Backtrack Feature Provides Offline Navigation Safety for Outdoor Adventures

Apple Watch Compass App’s Backtrack Feature Provides Offline Navigation Safety for Outdoor Adventures

2025-06-10
iTDAY

ITDAY is a technology-focused platform covering the latest tech trends, news, and innovations in the worldwide. It likely provides articles, reviews, and insights on advancements in the tech industry.

© 2025 itDay - All rights reserved for the website of the latest technologies in the World.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Smartphone
  • Technews
    • Camera
    • Gadjet
    • Laptop
    • PC
    • Tablet
    • Wearable
  • PC
  • Podcast
  • Videos
  • Games

© 2025 itDay - All rights reserved for the website of the latest technologies in the World.