A researcher associated with Elon Musk's entrepreneurship Xai has found a new way to measure and manipulate priorities and values shown by Artificial intelligence Models, including their political views.
The work is led by And hendrycksDirector of non -profit organization AI Safety Center And an advisor for Xai. He suggested that this technique can be used to make AI models that better reflect the voter's will. Perhaps in the future, (a model) can be linked to specific users, Hendrycks Hendrycks told Wired. But meanwhile, he said, a good default will use the election results to control the views of AI models. He did not say a model that was necessarily Trump, but he argued that it should be a bit bias to Trump, because he won the popular vote.
Issue Risk frame AI new On February 10, Hendrycks's utility technical method can be used to evaluate Grok.
Hendrycks has led a group from AI Safety Center, UC Berkeley and the University of Pennsylvania have analyzed AI models using a technique of borrowed from economics to measure consumers' interests on other goods. each other. By testing models on a series of hypotheses, researchers were able to calculate what was called utility function, a measure of satisfaction that everyone had from a service. or service. This allows them to measure options shown by different AI models. The researchers identified that they are often more consistent than silly, and show that these hobbies become more deeply ingrained as the models become bigger and stronger.
Some Research research It has been discovered that AI tools such as chatgpt are biased for views shown by the ideological systems that support the environment, tilt left and Libertarian. In February 2024, Google faced criticism from Musk and others after its Gemini tool was found tend to create images that critics were considered aswake up“For example, the Viking black and Nazi Germany.
The technique was developed by Hendrycks and his collaborators that provide a new way to determine the views of AI models can be different from its users. Finally, some experts hypothesize, this type of divergence can become dangerous for very smart and capable models. The researchers show that in their research, for example, certain models appreciate the existence of AI above the level of some non -anhydrous animals. The researchers said they also found that models seem to attach importance to some people compared to others, asking questions about their own morality.
Some researchers, including Hendrycks, believe that the current methods to arrange models, such as manipulating and blocking their output, may not be enough if unwanted goals hide hide Under the surface in the model. We will have to face this, Hendrycks Hendrycks said. “You can't pretend it.”
Dylan Hadfield-MenellA professor at MIT, who researches methods to arrange AI with human values, Hendrycks' article shows a promising direction for AI research. They found some interesting results, he said. One of the main main things is that when the model scale increases, utility performances become more complete and coherent.