Google,Exploring The Jungle Between My Wife’s Crotch OpenAI, DeepSeek, et al. are nowhere near achieving AGI (Artificial General Intelligence), according to a new benchmark.
The Arc Prize Foundation, a nonprofit that measures AGI progress, has a new benchmark that is stumping the leading AI models. The test, called ARC-AGI-2 is the second edition ARC-AGI benchmark that tests models on general intelligence by challenging them to solve visual puzzles using pattern recognition, context clues, and reasoning.
This Tweet is currently unavailable. It might be loading or has been removed.
According to the ARC-AGI leaderboard, OpenAI's most advanced model o3-low scored 4 percent. Google's Gemini 2.0 Flash and DeepSeek R1 both scored 1.3 percent. Anthropic's most advanced model, Claude 3.7 with an 8K token limit (which refers to the amount of tokens used to process an answer) scored 0.9 percent.
The question of how and when AGI will be achieved remains as heated as ever, with various factions bickering about the timeline or whether it's even possible. Anthropic CEO Dario Amodei said it could take as little as two to three years, and OpenAI CEO Sam Altman said "it's achievable with current hardware." But experts like Gary Marcus and Yann LeCun say the technology isn't there yet and it doesn't take an expert to see how fueling AGI hype is advantageous to AI companies seeking major investments.
The ARC-AGI benchmark is designed to challenge AI models beyond specialized intelligence by avoiding the memorization trap — spewing out PhD-level responses without an understanding of what it means. Instead it focuses on puzzles that are relatively easy for humans to solve because of our innate ability to take in new information and make inferences, thus revealing gaps that can't be resolved by simply feeding AI models more data.
"Intelligence requires the ability to generalize from limited experience and apply knowledge in new, unexpected situations. AI systems are already superhuman in many specific domains (e.g., playing Go and image recognition)" read the announcement.
SEE ALSO: I compared Sesame to ChatGPT voice mode and I'm unnerved"However, these are narrow, specialized capabilities. The 'human-ai gap' reveals what's missing for general intelligence - highly efficiently acquiring new skills."
To get a sense of AI models' current limitations, you can take the ARC-AGI test for yourself. And you might be surprised by its simplicity. There's some critical thinking involved, but the ARC-AGI test wouldn't be out of place next to the New York Timescrossword puzzle, Wordle, or any of the other popular brain teasers. It's challenging but not impossible and the answer is there in the puzzle's logic, which is something the human brain has evolved to interpret.
OpenAI's o3-low model scored 75.7 percent on the first edition of ARC-AGI. By comparison, its 4 percent score on the second edition shows how difficult the test is, but also how there's a lot more work to be done with reaching human level intelligence.
Topics Google OpenAI
Previous:Bargaining For the Common Good
The proposal to do away with net neutrality is worse than you thinkBadass dude fearlessly takes care of snake on a trainColonial Pipeline reportedly paid millions for slowGoogle wants to make changing your compromised passwords easierHBO Max plans Harry Potter trivia special for the film anniversary'Mass Effect: Legendary Edition' is still great and still problematicGabby Douglas reveals that she, too, was abused by Olympic team doctorTesla owner keeps using Autopilot from backseat—even after being arrestedGoogle IO 2021: Unlock BMW cars with Android's digital car keyGabby Douglas reveals that she, too, was abused by Olympic team doctorGoogle IO 2021: Maps getting indoor Live View and detailed mapsShania Twain arrives to halftime show on a dogsled in a blizzard because, well, she's ShaniaEverything to know about Disney+'s 'Loki' before it airsEbay's new adult item ban makes absolutely zero senseEufy security cameras suddenly start showing live feeds to strangersGoogle rolls out AR versions of Simone Biles, Megan Rapinoe, and Naomi Osaka in searchVenmo is pausing some payments being sent to Palestinian relief fundsDon't install Android 12 beta on your OnePlus phone just yetFamily getting you down? Check out these dogs watching the National Dog Show.Eufy security cameras suddenly start showing live feeds to strangers Tencent’s Honor of Kings and PUBG Mobile made nearly $200 million in September · TechNode HoYoverse fails in attempts to bypass App Store’s 30% "Apple Tax" · TechNode Luxshare producing three iPhone 15 models, preparing for Apple Vision Pro production · TechNode Temu initiates 5‰ service fee for merchants · TechNode Douyin upgrades one ByteDance denies it is abandoning VR business Pico · TechNode EV maker WM Motor suspends in US tightens export restrictions on AI chips to China · TechNode BYD reveals price for first Yangwang premium model · TechNode Former Xpeng Motors purchasing head investigated for corruption · TechNode China and the EU discuss AI and cross BYD supplier RoboSense reports monthly shipments of 20,000+ lidar sensors · TechNode US may limit US companies from engaging with China’s entities on RISC Porn apps disguised as learning apps on China’s iOS App Store · TechNode China's eight Moutai and Dove’s co Chinese EV maker Rox Motor Tech announces $1 billion funding round · TechNode Florasis apologizes over eyebrow pencil controversy a week after Li Jiaqi’s apology · TechNode Alibaba Pictures to buy live events producer Damai · TechNode TikTok launches a new feature to label AI
3.0398s , 10130.21875 kb
Copyright © 2025 Powered by 【Exploring The Jungle Between My Wife’s Crotch】,Miracle Information Network