A team of computer scientists, including two researchers from Poland, has developed a method that allows reinforcement learning (RL) models to use neural networks with up to thousands of layers, a scale previously considered impossible.
The work was recognized at the leading artificial intelligence conference NeurIPS, where it received one of the event’s top paper awards.
The research involved doctoral candidate Michał Bortkiewicz and Professor Tomasz Trzciński of the Warsaw University of Technology and was led by Professor Benjamin Eysenbach of Princeton University. The study was selected as one of five award-winning papers at the Neural Information Processing Systems (NeurIPS) conference, widely regarded as the most prestigious scientific meeting in artificial intelligence.
This year’s conference received more than 20,000 paper submissions, of which about 5,000 were accepted.
Reinforcement learning is one of the main approaches in machine learning, alongside supervised, unsupervised, and self-supervised learning. RL systems learn by interacting with an environment and receiving rewards or penalties. The approach has been used in systems such as AlphaGo, which defeated a human champion in the game of Go, as well as in complex video games and applications including drug discovery, protein design, and decision support in economics.
Until now, reinforcement learning models relied on very shallow neural networks, typically containing only a few layers. Attempts to significantly increase their depth consistently failed, as training became unstable and ineffective. The international research team has now shown that RL models can be scaled to as many as 1,024 layers, opening the way to much more complex internal representations and learning capabilities.
A LEAP IN AI DEVELOPMENT. OVER THE MAZE WALL
In experimental tasks, the researchers demonstrated that deeper RL models trained with their new approach achieved dramatically better results than earlier systems. Using a method known as contrastive reinforcement learning (CRL), the models showed more than a 50-fold increase in successful task completion in simulated navigation environments.
In one example, a model equipped with 256 layers learned a strategy that allowed it to jump over maze walls instead of navigating around them, reaching its goal more efficiently. According to the authors, this behavior illustrates the ability of deeper models to discover solutions that were inaccessible to shallower architectures.
The researchers note that even relatively simple RL models have already outperformed humans in complex games and contributed to scientific discovery. Scaling these systems to much greater depth raises the prospect of more powerful applications. The CRL algorithm has been made publicly available.
DOES AN ONION HAVE LAYERS? DOES AN RL MODEL HAVE LAYERS?
Professor Tomasz Trzciński told the Polish Press Agency (PAP) that layers are a fundamental part of neural network architecture, enabling successive stages of information processing. Increasing the number of layers allows a model to represent more complex relationships before producing an action or decision.
“The more layers we have, the more complex the operations that can occur between input and final result,” Trzciński says. “By increasing the number of layers, and therefore the depth of the network, the model is able to learn more complex concepts and build a richer representation of the world before taking action.
“In the case of the maze task, the model has more degrees of freedom than just stepping left or right – it can jump, bend, and stretch. These are additional capabilities that allow it to find new, creative solutions,” Trzciński adds.
Scaling neural networks has been successfully applied in other areas of machine learning, particularly in large language models such as GPT. In reinforcement learning, however, models historically remained limited to networks with as few as two to five layers.
“When attempts were made to add additional layers, the algorithm would get lost and the model stopped training. The consensus was that RL models were like that: they had to have shallow networks and nothing could be done about it,” Trzciński says.
As part of his doctoral research, Bortkiewicz showed that this limitation could be overcome by incorporating techniques from self-supervised learning (SSL), which are commonly used to pretrain large language models. SSL relies on auxiliary tasks that encourage models to learn internal structure in data before being optimized for a final objective.
The researchers conclude that reinforcement learning and self-supervised learning can be combined within a single framework, rather than treated as competing approaches. “The relatively small change we discovered results in such enormous, groundbreaking achievements,” Trzciński points out.
Despite the increase in network depth, the researchers report that the models do not consume more energy per output, as the improved representations allow them to reach solutions more efficiently.
“Our research shows that it is good to question established paths and think ‘outside the box’. Even in Poland, where funding for science and basic research is insufficient and not comparable to other developed countries, it is possible to ask pertinent questions and challenge the status quo to change the world and discover things no one has thought of before,” Trzciński concludes.
Looking ahead, he said the approach could support the development of new medical treatments and more adaptive artificial intelligence systems.
“I would also like to see how these methods enable the development of AI models that can self-improve, for example, to creatively generate new ideas and lead to the next stages of scientific development,” says Professor Tomasz Trzciński. (PAP)
Ludwika Tomala (PAP)
lt/ agt/ bar/
tr. RL