Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more
OpenAI is slowly inviting selected users to test a new set of reasoning modules called o3 and o3 mini, successors to the o1 and o1 mini models that are just into full release earlier this month.
OpenAI o3, so named to avoid copyright issues with phone company O2 and because CEO Sam Altman says the company has “a tradition of being really bad about names,” has consists of two models, o3-mini and o3. Altman announced the o3 series in the last day of OpenAI's “12 Days of OpenAI” live streams, and said that they will first release to third-party researchers for safety test.
“We look at this as the beginning of the next level of AI, where you can use these models to do increasingly complex tasks that require a lot of reasoning,” Altman said. . “For the last day of this event we thought it would be fun to go from one frontier model to the next.”
Altman said during the live stream that the company plans to release o3-mini by the end of January and o3 “shortly after that.”
Altman also said that the o3 model was “amazing in terms of coding” and that the benchmarks shared by OpenAI support it, surpassing even the O1's performance on programming tasks.

• Specific Code Performance: o3 outperforms o1 by 22.8 percentage points on SWE-Bench Verified and achieves a Codeforces rating of 2727, outperforming the OpenAI Chief Scientist score of 2665.
• Masters of Mathematics and Science: o3 scores 96.7% on the AIME 2024 exam, missing only one question, and achieves 87.7% on GPQA Diamond, far exceeding the performance of a human expert.
• Boundary Criteria: The model sets new records on challenging tests such as EpochAI's Frontier Math, solving 25.2% of problems where no other model exceeds 2%. On the ARC-AGI test, o3 triples the score of o1 and exceeds 85% (as tested live by the ARC Prize team), representing a milestone in conceptual reasoning.
Discussion alignment
Along with these advancements, OpenAI reaffirmed its commitment to safety and alignment.
The company introduced new research on consulting alignmenttechnique that is central to o1's strongest and most aligned model to date.
This method integrates human-written safety specifications into the models, allowing them to reason clearly about these policies before generating responses.
The strategy aims to solve common safety challenges in LLMs, such as vulnerability from jailbreak attacks and excessive rejection of illegal stimuli, by equipping the models with chain-of-thought (CoT) reasoning. This process allows the models to retrieve and dynamically apply safety specifications at decision time.
Perceptual alignment improves on previous methods such as Reinforcement Learning from Human Feedback (RLHF) and Constitutive AI, which rely only on safety specifications for label generation rather than embedding the policies directly into the models.
By tuning LLMs to safety-related propositions and their associated specifications, this approach creates models capable of policy-driven reasoning without relying heavily on data with human labels.
Results shared by OpenAI researchers in a new, non-peer-reviewed paper point out that this approach increases performance on safety criteria, reduces harmful outcomes, and ensures better adherence to content and style guidelines.
Key findings highlight the advancement of the o1 model over its predecessors such as the GPT-4o and other modern models. Consideration alignment enables the o1 series to excel in resisting jailbreaks and providing safe terminations while reducing excessive rejection of abnormal proposals. In addition, the method enables out-of-circuit generalization, showing robustness in multilingual and encrypted jailbreak scenarios. These improvements align with OpenAI's goal to make AI systems safer and more interpretable as their capabilities grow.
This research will also play a key role in aligning o3 and o3-mini, ensuring their capabilities are powerful and accountable.
How to apply to access the o3 and o3-mini test
Applications for early access are now open on the OpenAI website and will close on January 10, 2025.
Applicants must fill out an online form which asks them for several different pieces of information, including links to previously published papers and their code sources on Github, and choose which models – o3 or o3-mini – they want to test , as well as what they plan. for use.
Selected researchers will be able to access o3 and o3-mini to explore their capabilities and contribute to safety assessments, although the OpenAI form warns that o3 will not be available for several weeks.

Researchers are encouraged to develop robust assays, create controlled demonstrations of high-risk potentials, and test models in situations not possible with widely accepted tools.
This initiative builds on the company's established practices, including rigorous internal safety testing, collaboration with organizations such as the US and UK AI Safety Institutes, and the Preparedness Framework.
The application process will ask for details such as research focus, past experience, and links to previous work. OpenAI will review applications on an ongoing basis, with selection beginning immediately.
A new step forward?
The introduction of o3 and o3-mini marks a leap forward in AI performance, particularly in areas that require advanced reasoning and problem-solving capabilities.
With their unique results on coding, math, and conceptual benchmarks, these models highlight the rapid progress being made in AI research.
By inviting the wider research community to collaborate on safety testing, OpenAI aims to ensure that these capabilities are used wisely.
Check out the stream below:
Source link