AI sucking watch


Today, artificial intelligence can create images that have light. Write a novel, do your homework and even Predicting protein structure– However, new research revealed that it often fails in the basic work: when scoring

Researchers at the University of Edinbera have tested the ability of many large language models known to be in the form of AI-the children of AI that can interpret and create questions related to many types in accordance with the images related to different watches or calendars. Their education is about to happen in April and Current host On the Arxiv server, the Preprint shows that LLM has problems with these basic work.

“The ability to interpret and reasoning from time from the image is important for the application in the real world, from the schedule of the event to independent systems.” The researchers wrote in education. “Despite progress in many languages ​​(MLLMS), most of the work focuses on detection of objects, lectures, or understanding of the scene, resulting in temporary decentralization.”

The GPT-4O and GPT-O1 test team of Openai; Gemini 2.0 of Google Deepmind; Claude 3.5 Sonnet of anthropology; Meta's Llama 3.2-11B Vision Instruct; QWEN2-VL7B of Alibaba ordered; And Modelbest's Minicpm-V-2.6. They enter the different models of analog-Tim watches, newly strange numbers, different colors and even some missing parts, including a 10-year calendar.

For the researchers, the researcher asked LLMS, WWill the hat appear on the clock in the specified image? For the research calendar, asking simple questions such as WThe day of the week is the new year? And more difficult questions including WThe hat is the 153th day of the year.

“Analog watch reading and understanding of the calendar involves complex awareness procedures: they want to recognize with detailed images. (Such as mobile positions, cell day layout) and the use of unimportant numbers (Such as calculating the compensation date) “explained

Overall, the AI ​​system does not work well. They read the time on analog clock less than 25% of the time they struggled with a watches with a fur, furry, and hand -style hand, as they do with a watches without a few seconds.

Google's Gemini-2.0 gain at the team's watches, while the GPT-O1 is accurate in the 80% calendar of better results than the competition. But even then, the most successful MLLM in the calendar still made a mistake about 20% of the time.

“Most people can tell the time and use the calendar at a young age. Our discovery focuses on an important gap in the ability of AI to carry out basic skills for people, “Rohit Saxena, the co -author of the study and a PhD student at the Information School of the University of Edinberra, said at the university at the university. statement– “These shortages must be resolved if the AI ​​system will be combined with applications that are sensitive and reality, such as automatic time and helping technology.”

Therefore, while AI can finish homework But do not count that it is stuck with any time



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *