The new artificial intelligence (AI) model is just that. Achieving success on a human level in tests designed to measure “General Intelligence”
On December 20, OpenAI's o3 system scored 85% of ARC-AGI BenchmarkThis is higher than the previous best AI score of 55% and on par with the average human score. They also score well on very difficult math tests.
The creation of artificial general intelligence, or AGI, is a stated goal of all major AI research laboratories. At first glance, OpenAI appears to have taken a step toward this goal.
While skepticism remains Many AI researchers and developers feel that something has changed. To many, the prospect of AGI seems more real, urgent, and closer than expected, right?
General characteristics and intelligence
To understand the meaning of the o3 results, you need to understand what the ARC-AGI test is. Technically, it is a test. The AI system's “sample performance” in adapting to new things – how many examples of new situations does the system need to see to see how it behaves?
AI systems like ChatGPT (GPT-4), which are not very efficient at sampling, are “trained” on millions of human text samples, creating probabilistic “rules” about what words are most likely to combine. the most
The results are quite good in general tasks. It is a bad thing in an unusual job. Because there is little information (fewer examples) about those tasks
Until AI systems can learn from small samples and adapt to the performance of more samples, The system is used for highly repetitive tasks and tasks where occasional failures can be tolerated.
Ability to solve previously unknown or new problems. accurately from a limited data sample It's called the ability to generalize information. It is widely considered to be a fundamental component of intelligence.
Grids and patterns
ARC-AGI benchmark test for efficient sample adaptation using small square grid problems. As in the example below, the AI needs to come up with a format that turns the table on the left into a table on the right.

Ark Award
Each question has three examples to learn from, and then the AI system must come up with a rule that “generates” from three examples to the fourth.
These are similar to the IQ tests you may sometimes remember from your school days.
Weak regulation and adaptation
We don't know exactly how OpenAI does it, but the results suggest that the o3 model is highly adaptable. From just a few examples You will find rules that can be summarized.
to find a pattern We should not make unnecessary assumptions. or more specific than we really need to be in theoryIf you can specify a rule that The “weakest” that can do what you want. You will maximize your ability to adapt to new situations.
What do we mean by the weakest rule? The technical definition is complicated. But weaker rules are often the ones that can be done. Explained in simpler text.–
In the example above A plain English expression of the rule might look like this: “Any shape with a protruding line moves to the end of that line and 'masks' any other overlapping shapes.”
Searching for a chain of thoughts?
Although we don't know how OpenAI achieves this, But it seems unlikely that they deliberately optimized the o3 system to find weak rules. However, to be successful in the ARC-AGI task, they must be found.
We know that OpenAI started with a general-purpose version of the o3 model. (Unlike most other models, it can spend more time “thinking” about difficult questions.) The model was then trained specifically for the ARC-AGI test.
French AI researcher Francois Chollet designed the benchmark. believe o3 Search through various “chains of ideas” that describe the steps to solving a problem. The “best” is then selected based on loosely defined rules or “heuristics.”
This is “not dissimilar” to how Google's AlphaGo system searches for different possible sequences of moves. To defeat the world champion Go
You can think of these thought chains like programs that fit an example. Of course, if it's like Go-playing AI, then it needs heuristics or loose rules. To decide which program is best
There are probably thousands of programs created that appear equally legitimate. That heuristic might be “Choose the weakest point” or “Choose the easiest point”
However, if they are like AlphaGo, they just let the AI create the heuristics. This is how AlphaGo Google trains a model to score different motion sequences. Is it better or worse than other types?
What we don't know yet
The question is, is this closer to AGI? If that's how the o3 works, the base model might not be much better than previous generations.
The concepts that models learn from language may not be more generalizable than before. Instead, we may see a more generalizable “chain of ideas.” This is found through a special step of training heuristics specialized for this test. As always, the proof is in the pudding.
Almost everything about o3 remains unknown. OpenAI has limited exposure to some media releases and initial testing to researchers. laboratory and a number of AI security institutes.
Truly understanding o3's potential will require extensive work. Including assessment Understanding the distribution of capabilities How often do they fail? And how often is it successful?
When o3 finally launches We'll have a much better idea of whether it's as adaptable as other humans.
If so It could have a huge and revolutionary economic impact. and usher in a new era of accelerated, self-improving intellectual development. We will need new benchmarks for AGI and serious consideration of how it should be regulated.
If not, it will still be an impressive result. However, daily life will remain much the same.
Michael Timothy BennettPhD student Faculty of Computer Science Australian National University and Elijah PerrierResearcher, Stanford Center for Responsible Quantum Technology, Stanford University
This article was republished from Conversation Under Creative Commons license. Read Original article–