April 21, 2025
What is reinforcement learning? A AI researcher explains an important method for teaching machines – and how it relates to the training of your dog

What is reinforcement learning? A AI researcher explains an important method for teaching machines – and how it relates to the training of your dog

Understanding intelligence and creating intelligent machines are great scientific challenges of our time. The ability to learn from experience is a cornerstone of intelligence for machines and living things alike.

In a remarkably predictive report from 1948, Alan Turing – the father of modern computer science – suggested the construction of machines that have intelligent behavior. He also discussed the “education” of such machines “using rewards and punishments”.

Ultimately, Turing Ideas led to the development of strengthening learning, a branch of artificial intelligence. Learning to strengthen designs intelligent agents by training them to maximize the rewards when they interact with their surroundings.

As a researcher for machine learning, I find it reasonable that the pioneers of the strengthening learning Andrew Barto and Richard Sutton were awarded the ACM Turing Award 2024.

What is reinforcement learning?

Animal trainers know that animal behavior can be influenced by the reward of desirable behaviors. A dog trainer gives the dog a treat if he really makes a trick. This increases the behavior, and the dog is rather correct next time. Learning for reinforcement is borrowed from animal psychology.

Learning for reinforcements, however, is about training computer agents, not about animals. The agent can be a software agent like a chess game program. The agent can also be a embodied unit, like a robot who learns to do homework. Similarly, the surroundings of an agent can be virtual, such as the chess board or the designed world in a video game. But it can also be a house where a robot works.

Just like animals, a means of aspects of its environment can perceive and take measures. A chess can access the chess board configuration and carry out movements. A robot can feel its surroundings with cameras and microphones. It can use its engines to move in the physical world.

Agents also have goals that their human designers program in them. The goal of a chess is to win the game. The goal of a robot could be to help his human owner with housework.

The problem of reinforcement in the AI ​​is to design agents that achieve their goals by perceiving and acting in their environments. Learning for reinforcement makes a brave claim: all goals can be achieved by referring to a numerical signal, which is referred to as a reward and the agent maximizes the total amount of the rewards available from it.

Researchers do not know whether this claim is actually true due to the multitude of possible goals. Therefore, it is often referred to as a reward hypothesis.

Sometimes it is easy to select a reward signal that corresponds to a goal. For a chess, the reward +1 for a win, 0 for a draw and -1 for a loss can be. It is less clear how to design a reward signal for a helpful robot assistant in the household. Nevertheless, the list of applications in which researchers were able to design good reward signals for reinforcement learning.

A great success of the strengthening learning was in the board game. The researchers thought that GO was much more difficult for machines than chess. The company Deepmind, now Google Deepmind, used reinforcement learning to create alphago. Alphago defeated top go player Lee Sedol 2016 in a game with five games.

A newer example is the use of the strengthening learning to make chatbots like chatt more helpful. Learning to reinforce is also used to improve the argumentation functions of chatbots.

Origins of the strengthening learning

None of these successes could have been foreseen in the 1980s. Then, as Barto and his then PH.D. Student Sutton proposed reinforcement learning as a general problem -solving framework. They were inspired not only by animal psychology, but also from the area of ​​control of control, the use of feedback to influence the behavior of a system and the optimization, a branch of mathematics in which it examines how the best choice is selected under a series of available options. They provided the research community of mathematical foundations that passed the test of the time. They also created algorithms that have now become standard tools in the area.

It is a rare advantage for a field when pioneers take the time to write a textbook. Luminating examples such as “The Nature of Chemical Binding” by Linus Pauling and “The Art of Computer Programming” by Donald E. Knuth are unforgettable because they are only a few. Sutton and Barto’s “Reinforcement learning: an introduction” was first published in 1998. A second edition was published in 2018. Your book influenced a generation of researchers and quoted more than 75,000 times.

Learning for reinforcement has also had an unexpected influence on neurosciences. The neurotransmitter dopamine plays a key role in reward -driven behaviors in humans and animals. Researchers have used specific algorithms developed in reinforcement learning to explain experimental results in the dopamine system of humans and animals.

The basic work, vision and advocacy group of Barto and Sutton have contributed to strengthening learning. Your work has inspired a large group of research, affects real applications and has attracted enormous investments from technology companies. I am sure that researchers for learning to reinforce will continue to see further ahead when they are on their shoulders.

This article will be released from the conversation, a non -profit, independent news organization that brings you facts and trustworthy analyzes to help you understand our complex world. It was written by: Ambuj Tewari, Michigan University

Read more:

Ambuj Tewari receives funds from NSF and NIH.

Leave a Reply

Your email address will not be published. Required fields are marked *