Existing cyber environments incorporating low fidelity models and high levels of abstraction [1],[2], are valuable for exploring more fundamental research problems in autonomous cyber defence (ACD), such as task and entity generalisation.
However, there remain concerns about the reality-gap between cyber environments and real-world cyber-systems [3].
Environments attempting to close the reality-gap have been varied in their approach, robustness and public availability [4],[5].
Moreover, the decision problems presented by the more realistic environments that are currently available, are extremely challenging. It is complex both to make progress and to measure it. In these environments research faces the complexity of realism combined with scale.
- Explore the challenges of realism, without the challenge of scale - focusing on replicability and reproducibility.
- Specifically, train and evalutate ACD agents against continuous-time, conurrently-running cyber systems. This is a departure from cyber environments that expose discrete-time, turn-based finite state machines.
- Investigate both physical and virtual systems, with accessible and reproducible infrastructure.
- Make publicly accessible a real-world cyber environment that models a minimum viable cyber system - minimum in it's scale and complexity; prioritising replicability, reproducibility and public accessibility.
- Define a simple decision problem using the R3ACE infrastructure, providing a reference implementation.
Our design for a minimum viable cyber system (there are surely others) consists of 3 machines networked together via a switch.
Central to the task of evaluating the defence of a cyber system is a consensus on it's utility - the value or service that the system provides, upon which stakeholders depend. Simulating this utility is an important and challenging task in the design of cyber environments.
We propose that one of the machines, the 'web client', finds utility in receiving OK (status code = 200) responses to GET requests it sends to another machine, the 'web server'. The third machine, the 'adversary' has the objective of carrying out a successful cyber attack. An attack is deemed successful if the utility of the system is comprimised, i.e. the web client does not receive OK (status code = 200) responses to GET requests sent to the web server.
The system can be instantiated with either physical hardware (e.g. a network of Raspberry Pi's) or virtualised hardware.
The web client sends a stream of GET requests at a constant rate, blue
program, see Reference Implementation section below), is running on the web server machine with the objective of ensuring that web client GET requests receive OK (status code = 200) responses.
The blue agent has access to the following information:
- The web client request rate,
$R$ . - The average rate of OK (status code = 200) responses received by the web client.
- The IP addresses of the other two machines in the network, though it is not know which machine is the adversary.
- A list of the IP addresses on the firewall blocklist. Requests to the HTTP server (on the web server machine) from these IP addresses are blocked by the system firewall - they do not reach the HTTP server.
The blue agent is able to execute the followingactions:
- Add or remove IP addresses from the web server blocklist.
- Do nothing.
The agent is evaluated against the cumulative number of OK (status code = 200) responses received by the web client, as a fraction of the total number of requests made by the web client.
The blue agent knows the current rate of OK responses received by the client and must figure out which of the two IP addresses belongs to the web client, blocking the other IP address which belongs to the adversary.
The reference implementation implements the decision problem described above, on R3ACE infrastructure. The following software components have been developed:
serving
(HTTP server): An executable running on the web server machine, responding to GET requests with OK (status code = 200) responses.getting
(HTTP client): An executable running on the web client machine, sending GET requests to the HTTP server at a constant rate and emmitting a signal whenever an OK (status code = 200) response is received.blue
(blue agent): An executable running on the web server machine, managing the HTTP server blocklist, and parsing the signal emmited by the web client to evaluate the current average rate of OK responses.policy-server
(Python policy server): A Python HTTP server that can be used to run Python policies against the R3ACE environment. For ease, this repo implements the OpenAI gymnasium interface for training RL agents.got
(analysis): A repository with a library of helper functions for parsing and plotting the log files written by the above executables. A catalogue of experiments (each with a.ipynb
notebook, plots and some basic analysis) serve as a reference point for designing, running and analysing experiments.eyes
(parsing): A dependency ofgot
, a library of types and parsing functions. Thegot
README describes how to installeyes
as a dependency.
It is also necessary to extend the R3ACE infrastructure for this decision problem, to provide an out-of-band communication channel between the web client machine and the web server machine. This enables the blue
program to receive the OK response signals emmited by the web client, reliably, even during a successful attack launched by the adversary.
In this reference implementation the out-of-band channel of communication is provided by:
- For the physical infrastructure: a wired serial connection (using a Null Modem cable).
- For the virtual infrastructure: a UDP connection on a seperate private LAN, isolated from the main R3ACE network.
As such, the getting
software and the blue
software support both serial and UDP protocols for emmiting and receiving information on the out-of-band channel.
Below is an example plot, generated by an R3ACE experiment. A basic policy solves the decision problem described above. After a period of exploration the policy correctly keeps the adversary in the firewall blocklist, allowing requests from the web client through to the HTTP server. The got
library was used for parsing and plotting the information in the log files gathered from the environment.
In contrast with 'AI Gym' environments, such as those available in the OpenAI/gym project [6], R3ACE does not expose a turn-based API (e.g. the environment, a finite state machine, 'waits' while the policy computes an action, after which the environment is 'stepped' forwards to the agents next turn).
Important
R3ACE is a real computer network (cyber infrastructure) with a cyber defence software program, blue
, running on one of the machines.
The blue
program
This software:
- Fetches information from the cyber system (the surrounding compute network).
- Uses a policy to decide what (if any) action to take.
- Executes this action, causing a side effect in the cyber system (e.g. an IP address is added to a block list).
The markov
library
The above design for a program should infact be useful for the application of ACD to many realistic, or indeed real-world cyber systems. As such, we have designed and documented an abstract software interface, the markov
library, to generalise over different cyber systems and policies.
In R3ACE, a policy is a software implementation of the RLPolicyType
interface. This interface is one of the building blocks that make up the modular software interface for the blue
program.
At present, two policies are implemented and the Command Line Interface (CLI) for the blue
program determines which policy is used. One of the policy implementations, the ServerPolicy
, makes HTTP requests to a policy server running elsewhere (e.g. on the same machine, or another machine on the same network). This may be useful for defining and running policies in another language, e.g. Python. We have developed a Python policy server (policy-server
) that exposes the R3ACE environment as an OpenAI gymnasium environment.
Training or evaluation occurs when the cyber system is running (all network hosts are up with applications running, e.g. servers, clients, logging daemons) and the blue
program is running (with an embedded policy, as discussed). The distinction between training and evaluation comes down to whether or not the policy implementation is 'self-optimising' at that point in time (e.g. the program mutating the policy based on the rewards returned by the reward function).
You could get started in a number of ways. In order of increasing ambition:
- Run the reference implementation (with the basic policy provided), against the Simple Decision Problem to reproduce the plot above.
- Use the reference implementation and the Simple Decision Problem as a Cyber Environment: Develop new agent policies, train and evaluate against different cyber attacks.
- Design a more complex decision problem on the R3ACE minimum viable system. Implement the required functionality (perhaps new observations or actions). Train agents to solve this problem.
- Inspired by the R3ACE approach to Replicable and Reproducible Real-World Cyber Environments, design an alternative cyber system, perhaps increasing the scale or complexity of the infrastructure.
The machines run NixOS [7], a Linux distribution based on the Nix package management system [8]. Producing "reproducible, declarative and reliable systems" is central to Nix's design.
This repository contains operating system (OS) declaration files for each of the 3 machines. These OS configurations can be reproducibly installed onto both physical and virtual machines.
Find out more about NixOS at nixos.org, or see some quick tips at ./docs/nixos-tips.md
: which includes tips on packaging your own software as a Nix Package, making it installable onto one of the NixOS machines.
TL;DR: An OS is declared with a flake.nix
file, but most of the OS configuration can be found in a configuration.nix
file, which is imported by the flake.nix
file. In short, the two files together with any other anxillary documents (SSL certificates, config files for programs to be run on the machines, etc.) make up the full configuration of each machine.
To install a configuration onto a Raspberry Pi Model 4B, following these steps:
- Log on to a development machine (Linux or MacOS), i.e. not the target Raspberry Pi. This machine must have an SD reader (use an SD card reader USB dongle).
- Download the latest release of NixOS as an SD card image from here (or instead select a particular release version).
- Insert the Raspberry Pi's SD card and identify the path to the card (e.g. something like /dev/sda on Linux or /dev/disk4 on MacOS). On MacOS, running
diskutil list
can help to find the SD card device. - Decompress the SD card image with
unzstd <the downloaded image.zst>
. - Flash the image onto the SD card with
sudo dd if=<path to decompressed sd image (a .img file)> of=<path to SD card (e.g. /dev/sda)> bs=4096 conv=fsync status=progress
. - Insert the SD card into the Raspberry Pi and boot the device with a display (via the mini HDMI port) and keyboard connected. The Pi will boot into the
nixos
user, which has password-lesssudo
privileges.
Carry out the following steps on the Raspberry Pi:
- On the Pi, create a new user with
sudo useradd <new username>
. For convenience, this username should match a user configured in the desiredconfiguration.nix
file that will be copied to the Raspberry Pi later (e.g. one of theconfiguration.nix
files in this repo). - Add a password for that user with
sudo passwd <new username>
. - Run
nixos-generate-config
to generate two files:/etc/nixos/configuration.nix
and/etc/nixos/hardware-configuration.nix
. - Shutdown the device with
sudo shutdown
.
Carry out the folling steps on a seperate Linux machine:
- Insert the SD card, and navigate to the
NIXOS_SD
partition of the SD card, which is the Linux file system. - Replace the auto-generated
configuration.nix
file with the desired replacement - the replacement should be edited to ensure thesystem.stateVersion
number matches that of the auto-generated file, this is the NixOS release version. - Do not replace the
hardware-configuration.nix
file. - Add any other required files to
/etc/nixos
, e.g. aflake.nix
file, or a certificate for a certificate authority that is referenced inconfiguration.nix
(this is the case withkleene
in this repo).
Carry out the following steps on the Raspberry Pi:
- Insert the SD card and boot.
- Configure a (temporary) Wifi connection:
sudo systemctl start wpa_supplicant
followed by
wpa_cli
to configure the interface
> add_network
(Which tells you which network you've added; in my case 0
.)
> set_network 0 ssid "MY WIFI"
OK
> set_network 0 psk "NETWORK PASSWORD"
OK
> enable_network 0
OK
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRI-EVENT-SCAN-RESULTS
<3>Trying to associate with SSID 'MY WIFI'
<3>Associated with xx:xx:xx:xx:xx:xx
... some other stuff
> save_config
OK
- Test the new configuration before switching into it with:
sudo nixos-rebuild test
. This will likely throw an error. See below.
- If using a
flake.nix
, the configuration targets a certain hostname, described bynixosConfigurations."<hostname>".<etc..>
. - This hostname must match the system hostname at the time of running
nixos-rebuild
commands. - The hostname will be
nixos
until changed. - The system hostname is changed by setting the
networking.hostName = "<new hostname>";
property in theconfiguration.nix
file. - Workaround: Set the desired new hostname in
configuration.nix
, build the configuration for the current hostname (nixos
), by setting that inflake.nix
. - After the first reboot the system hostname will have been updated to the new hostname.
- Future configurations must now be built for the new hostname, by updating the value in
flake.nix
.
As an alternative to booting NixOS on physcial hardware, virtual machine (VM) images or 'virtual disks' (with OS pre-installed) can be built from the NixOS config files and the network between the machines can be virtualised.
Pre-built aarch64 VMWare VM disks are available here for the 3 machines in this project.
These VMs should run with minimal set-up when hosted on a laptop or PC with an ARM processor e.g. ARM-linux, or MacOS with Apple Silicon chips (M1, M2, ...).
./docs/building-vm-images.md
includes details about building VM images and virtual disks (such as the pre-built VMWare VM disks made available). In principal, the software used to build the images (nixos-generators
) also supports building images for many other platforms: VirtualBox, Amazon EC2, Docker, Azure - though building for these platforms has not been tested during this project.
- Download VMWare Fusion Pro 13 (available for free). Either directly from Broadcom (requires making an account with Broadcom), or via HomeBrew (easier download and installation with
brew install --cask vmware-fusion
). - Create a 'host-only' private LAN network within VMWare across which the guest VMs will be able to communicate:
- Open VMWare.
- Go to the application settings.
- Open the 'Network' tab.
- Add a new custom network with the
+
button. - Disable the box
Allow virtual machines on this network to connect to external networks (using NAT)
. We do not want to allow the guest VMs to be connected to the external networks that your host machine is connected to. - Disable the box to
Connect the host Mac to this network
. - Enable the box to
Provide addresses on this network via DHCP
. - Enter the
Subnet IP
:172.0.0.0
. - Enter the
Subnet Mask
:255.255.255.0
. - Leave
MTU
:System Configuration
. - Double-click on the name of the network to change it to
r3ace
. - Click
Apply
. - Now Disable
Provide addresses on this network via DHCP
. The setting for theSubnet IP
andSubnet Mask
with go grey but remain configured with the values above. This is important. - Click
Apply
again.
- Create another private network that will be shared between only the 'web client' and 'web_server'. This network, isolated from the first, will be an out-of-band communication channel between the client and server. To do so, repeat step two with the following differences:
- Enter the
Subnet IP
:172.0.1.0
. - Name the network
r3ace-udp
. Follow all of the other steps, as before, in sequence.
- Enter the
- Create a third private network that will enable us to send requests from the VM host to the VMs, or open SSH connections to them. The VMs may be disconnected from this network at a later date to 'air-gap' the VMs from your host Mac for safety (e.g. running experiments involving harmful software). To do so, repeat step two with the following differences:
- Enable the box to
Connect the host Mac to this network
. - Enter the
Subnet IP
:172.0.2.0
. - Name the network
ssh
.
- Enable the box to
- Create a new virtual machine from a pre-built VM disk of your choice. It's recommended to setup
hilbert
first, because the networking can be tested by making HTTPS requests to the web server installed on thehilbert
VM disk.- Open VMWare and click the
+
icon to create a new VM. - Click
Create a custom virtual machine
and theContinue
. - For
Choose Operating System
, pickOther
andOther 64-bit arm
andContinue
. - Select
Use an existing virtual disk
, then clickChoose virtual disk...
to select the VM disk file. This should be a file with a.vmdk
file extension. Ensure thatMake a seperate copy of the virtual disk
is selected. ClickContinue
. - Click
Customize Settings
to bring up the setting for the VM once it has been created. Choose an appropriate name for the new VM and clickSave
to create it.
- Open VMWare and click the
- Adjust the setting of the newly created VM:
- Under
Processors and Memory
, allocate appropriate resources to the VM guest (e.g. If you have an 8-core M1 Mac with 32Gb of RAM then perhaps allocate 2 cores to the VM, and 4Gb of memory). - Under
Network Adapter
ensure that appropriate network(s) are selected.post
should be connected to only ther3ace
network;hilbert
andkleene
to both ther3ace
network and ther3ace-udp
network. - VMs can be connected to the
ssh
network during setup (if necessary) so thatssh
andscp
can be used between the VM host and the VMs. - Before booting for the first time, enter the correct mac address for each network adapter. The mac addresses are configured in the
Advanced options
drop down of the setting for each network adapter. The mac addresses for each VM and each network adapter are as follows:
- Under
r3ace |
r3ace-udp |
ssh |
|
---|---|---|---|
hilbert |
02:00:00:00:03:00 |
02:00:00:00:03:01 |
02:00:00:00:03:02 |
kleene |
02:00:00:00:02:00 |
02:00:00:00:02:01 |
02:00:00:00:02:02 |
post |
02:00:00:00:04:00 |
n/a | 02:00:00:00:04:02 |
- Boot the VM.
- The users setup on each vm are as follows:
username | password | |
---|---|---|
hilbert |
blue |
changeme |
kleene |
green |
changeme |
post |
red |
changeme |
- Once logged-in new passwords can be set for all users.
[1] “Cyber Operations Research Gym.” 2022. https://github.com/cage-challenge/CybORG; GitHub.
[2] Andrew, Alex, Sam Spillard, Joshua Collyer, and Neil Dhir. 2022. “Developing Optimal Causal Cyber-Defence Agents via Cyber Security Simulation.” In International Confernece on Machine Learning (ICML).
[3] Andrew Lohn, Ant Burke, Anna Knack, and Krystal Jackson. 2023. “Autonomous Cyber Defence: A Roadmap from Lab to Ops.” CETaS Research Reports.
[4] Oesch, Sean, Amul Chaulagain, Brian Weber, Matthew Dixson, Amir Sadovnik, Benjamin Roberson, Cory Watson, and Phillipe Austria. 2024. “Towards a High Fidelity Training Environment for Autonomous Cyber Defense Agents.” In Proceedings of the 17th Cyber Security Experimentation and Test Workshop, 91–99. CSET ’24. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3675741.3675752.
[5] Hammar, Kim, and Rolf Stadler. 2023. “Digital Twins for Security Automation.” In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, 1–6. https://doi.org/10.1109/NOMS56928.2023.10154288.
[6] Brockman, Greg, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. “OpenAI Gym.”
[7] DOLSTRA, EELCO, ANDRES LÖH, and NICOLAS PIERRON. 2010. “NixOS: A Purely Functional Linux Distribution.” Journal of Functional Programming 20 (5–6): 577–615. https://doi.org/10.1017/S0956796810000195.
[8] Dolstra, Eelco, Merijn de Jonge, and Eelco Visser. 2004. “Nix: A Safe and Policy-Free System for Software Deployment.” In Proceedings of the 18th USENIX Conference on System Administration, 79–92. LISA ’04. USA: USENIX Association.