Will Smith is eating spaghetti and other weird AI standards set in 2024.


When a company releases a new AI video generator; Not long ago, someone used it to create a video of actor Will Smith eating spaghetti.

It's become a meme and a standard: See if a new video generator can practically down a bowl of Smith noodles. Smith himself Mock Stream of an Instagram post in February.

Pasta with Will Smith is one of many. Weird “illegal” standards Set to take the AI ​​community by storm in 2024, a 16-year-old developer created an app that gave Minecraft AI control and tested its ability to design buildings. Elsewhere, a British programmer has created a platform that plays interactive AI games like Pictionary and Connect 4.

It's not like there aren't any academic tests of AI's performance. So why is the weirdness exploding?

LLM Dictionary
Image creditsPaul Calcraft

For one, Industry-standard AI standards don't speak too much to the average person. Companies often cite their AI's prowess in math Olympiad exams; or Ph.D. They identify possible solutions to level problems. But most people — yours truly included — use chatbots for things like: Responding to emails and basic research..

Crowdsourced performance measures are not necessarily better or more informative.

Take for example, Chatbot ArenaIt's a public standard that many AI enthusiasts and developers adhere to. Chatbot Arena allows anyone on the web to rate how well an AI can perform at specific tasks, such as creating a web app or creating an image. But raters tend to be unrepresentative — most come from AI and tech circles — and cast their votes based on hard personal preferences.

LMSYS
Chatbot Arena interface.Image creditsLMSYS

As Wharton management professor Ethan Mollick recently pointed out. Post Another problem with many AI industry benchmarks in X: They don't compare a system's performance to that of a normal human.

“It's a real shame that there aren't 30 different standards from different organizations — medicine, law, counseling quality, etc. — and people are using systems for these things,” Mollick wrote.

Connect 4, Weird AI benchmarks like Minecraft and Will Smith are sure eating spaghetti No empirical — or generally knowable. Just because an AI nails the Will Smith test doesn't mean it's going to produce a good burger.

Mcbench
Remember the essay. There is no model like the Claude 3.6 Sonnet.Image creditsAdonis Singh

One expert I spoke to about AI standards suggested that the AI ​​community is focused on the downstream effects of AI instead of its capabilities in narrow domains. That's good sense. But I have a feeling that the weird standards aren't going away anytime soon. Not only are they entertaining, but who doesn't love watching AI build Minecraft castles? – But it's easy to understand. And my colleague Max Zeff. Wrote about recently.The industry continues to struggle with integrating complex technology like AI into easy-to-digest marketing.

The only question in my mind is, What new standards will be prevalent in 2025?





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *