Researchers have made significant advances in using deep reinforcement learning (DRL), a type of AI Technology, to safeguard computer networks.

In a rigorous simulation context, deep reinforcement learning has been shown to successfully prevent adversaries from achieving their goals against sophisticated cyberattacks up to 95% of the time. These results show the potential of autonomous AI to play a role in proactive cyber protection.

Researchers from the Pacific Northwest National Laboratory (PNNL) Department of Energy published a study summarizing their findings.
The first step was to create a simulation platform to evaluate multistage attack scenarios with different adversaries. Creating such a dynamic attack-defence simulation environment was an accomplishment in itself. Using this environment, researchers can test different AI-based defence strategies under close monitoring conditions to evaluate their effectiveness.

These tools are necessary for assessing how well deep reinforcement learning algorithms function. For cyber security professionals, the technique is developing into a potent decision-support tool that can act as a defence agent with the capacity to learn, adapt to rapidly changing situations, and make judgments on its own. Deep reinforcement learning increases defenders’ ability to coordinate sequential decision-making strategies in their everyday encounters with attackers. Meanwhile, other forms of artificial intelligence are commonly used to identify intrusions or filter spam messages.

Deep reinforcement learning is a powerful tool that enhances cyber security by enabling early detection of changes in the cyber landscape and the ability to prevent a cyberattack before it occurs. This technique helps to create more intelligent defences that can adapt to new threats and minimize the risk of successful attacks.

AI Technology: Deep reinforcement learning (DRL)

Samrat Chatterjee, a data scientist who presented the team’s work, stated that “an effective AI agent for cyber security needs to detect, perceive, act, and adapt, based on the information it can receive and on the effects of actions that it enacts.” Deep reinforcement learning has a lot of potential in this area because of the multitude of system states and action options it can handle.

Deep reinforcement learning (DRL), which combines reinforcement learning and deep learning, is particularly effective in making a sequence of decisions in a complex environment. It works by providing positive rewards in the form of numerical values for good decisions that lead to desirable outcomes. At the same time, negative costs are used to discourage bad decisions that result in unfavourable outcomes.
It’s comparable to how people pick up various skills. For example, when a child completes their duties, they may be rewarded with a desired playdate; when they don’t, they may receive negative reinforcement, such as confiscating a digital device.

The same idea underlies reinforcement learning, according to Chatterjee. “The agent has a range of options for acts. Each action generates feedback, whether favourable or unfavourable, which is stored in its memory. Exploring new possibilities and making use of the past are interrelated. The objective is to develop an agent that can make wise decisions.”

AI Technology

Open AI Gym and MITRE ATT&CK

The group built a customized, controlled simulation environment using an open-source software framework called Open AI Gym to assess the benefits and drawbacks of four deep reinforcement learning techniques.

They also incorporated seven tactics and fifteen techniques utilized by three adversaries and the MITRE ATT&CK framework. Defenders had 23 mitigation techniques at their disposal to slow down or stop an attack.

During the various phases of the attack, the adversary employed several strategies, including reconnaissance, execution, persistence, defence evasion, command and control, collection, and exfiltration, which is when data is transferred out of the system. The attack was considered successful if the enemy could complete the exfiltration step.

According to Chatterjee, “Our algorithms function in a competitive environment—a struggle with an adversary determined to penetrate the system. “It’s a multistage attack, where the adversary can travel down a number of different attack paths, changing them as they go from reconnaissance to exploitation. Our task is to demonstrate how deep reinforcement learning-based defences may successfully prevent such an attack.

DQN (Deep Q-Network)

Researchers developed four deep reinforcement learning algorithms, including DQN, to train defensive agents and three versions of the actor-critic approach. After the agents were trained using data from simulated cyberattacks, they were tested against attacks they hadn’t encountered during the training. The results showed that DQN had the best performance among the four algorithms.

  • Less sophisticated attacks: DQN stopped 79 percent of attacks halfway through the attack phases and 93 percent by the final stage (based on varied levels of opponent skill and perseverance).
  • Attacks with moderate sophistication: DQN prevented 82 percent of them in the middle and 95 percent before the end.
  • Most complex attacks: DQN significantly outperformed the other three algorithms, stopping 57 percent of attacks at the halfway point and 84 percent by the final stage.

According to Chatterjee, the project aims to develop an autonomous defence agent that can anticipate an adversary’s next move, prepare for it, and react most effectively to protect the system. The ultimate goal is to create an intelligent defence system that can adapt to new threats and minimize the risk of successful attacks.

Despite the advancements, no one is prepared to fully depend on an AI system for cyber defence. Instead, a DRL-based cybersecurity system would need to collaborate with people, according to coauthor and former PNNL employee Arnab Bhattacharya.

In contrast to Bhattacharya’s assertion, “AI can be good at fighting against a specific tactic but isn’t as strong at comprehending all the options an enemy may take.” “Human cyber analysts cannot be replaced by AI any time soon. Human direction and feedback are crucial.”

Authors of the AAAI workshop paper, in addition to Chatterjee and Bhattacharya, include Mahantesh Halappanavar from PNNL and former PNNL scientist Ashutosh Dutta. Office of Science at DOE provided funding for the project.


Rhyno delivers a range of activities that combine to fully protect your infrastructure and data from cybercriminals, anywhere and everywhere, 24/7/365.


About Rhyno Cybersecurity Services

Rhyno Cybersecurity is a Canadian-based company focusing on 24/7 Managed Detection and Response, Penetration Testing, Enterprise Cloud, and Cybersecurity Solutions for small and midsize businesses.

Our products and services are robust, innovative, and cost-effective. Underpinned by our 24x7x365 Security Operations Centre (SOC), our experts ensure you have access to cybersecurity expertise when you need it the most.

This website uses cookies to improve your online experience. By continuing, we will assume that you are agreeing to our use of cookies. For more information, visit our Cookie Policy.

Privacy Preference Center