April 22, 2025
Work towards human goals – the study shows how misalignment can measure

Work towards human goals – the study shows how misalignment can measure

Ideally, artificial intelligence agents want to help people, but what does it mean when people want contradictory things? My colleagues and I have found a way to measure the goals of a group of people and AI agents.

The alignment problem -to ensure that AI systems act according to human values ​​has become more urgent when the AI ​​skills grow exponentially. But in the real world to align the AI ​​to humanity because everyone has their own priorities. For example, a pedestrian may want to go on the brakes if an accident is likely to appear, but a passenger in the car could prefer to avoid.

By considering examples like this, we have developed a score for misalignment based on three key factors: the people and AI agents involved, their specific goals for different topics and how important every problem is for them. Our model of the misalignment is based on a simple insight: A group of people and AI agents are most oriented when the targets of the group are most compatible.

In simulations, we found that a misalignment reached its peak when the goals are evenly distributed on agents. That makes sense – if everyone wants something else, the conflict is highest. If most agents share the same goal, the misalignment drops.

Why is it important

Most AI security research deals with the orientation as an all-or-not-property property. Our framework shows that it is more complex. The same AI can be aligned with humans in a context, but is wrongly aligned in another.

This is important because it helps AI developers to be more precise what they mean under aligned AI. Instead of vague destinations, such as For example, researchers and developers can speak more clearly about certain contexts and roles for AI. For example, a AI recommendation system -those who may like product suggestions can be brought into harmony with the aim of the retailer, to increase sales, but with the aim of the customer, to live on his means, be incorrectly aligned.

For political decision -makers, evaluation frames such as ours offer a way to measure misalignment in systems that are used and create standards for alignment. For AI developers and security teams, it offers a framework to reconcile competing interest groups.

If you have a clear understanding of the problem, people can be better able to solve it.

Which other research happens

In order to measure the orientation, our research assumes that we can compare what people want with what AI wants. Human value data can be collected through surveys, and the area of ​​social choice offers useful instruments to interpret them for the AI ​​orientation. Unfortunately, it is much more difficult to learn the goals of the AI ​​agents.

Today’s intelligent AI systems are large voice models, and their Black Box -Nature makes it difficult to learn the goals of AI agents such as Chatgpt they operate. Interpretability research could help by revealing the inner “thoughts” of the models, or researchers could design AI that initially thinks transparently. But at the moment it is impossible to know whether a AI system is really aligned.

What’s next

At the moment we realize that sometimes goals and preferences do not completely reflect what people want. In order to tackle more difficult scenarios, we are working on approaches to align AI to experts in moral philosophy.

In the future we hope that developers will implement practical instruments to measure and improve the orientation in various human population groups.

The research mandate is a brief perspective on interesting academic work.

This article will be released from the conversation, a non -profit, independent news organization that brings you facts and trustworthy analyzes to help you understand our complex world. It was written by: Aidan Kierans, University of Connecticut

Read more:

Aidan Kierans took part in the Openai Red Teaming network as an independent contractor. His research described in this article was partially supported by the NSF program for fairness in the AI ​​in cooperation with Amazon. Every opinion, results and conclusions or recommendations in this material are his own and do not necessarily reflect the views of the National Science Foundation or Amazon. Kierans has also received research financing from the Future of Life Institute.

Leave a Reply

Your email address will not be published. Required fields are marked *