Update README.md

mprhode · Jan 31, 2022 · 6af4514 · 6af4514
1 parent 5d8e62d
commit 6af4514
Showing 1 changed file with 43 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1 +1,43 @@
-# cyborg-submission-CUAB
+# Cage-submission
+
+Due to red agents’ behaviours not changing mid-episode, and the fact that they are predictable, we thought that fingerprinting the agent we are facing and then assigning it to a trained model made the most sense. If multiple red agents could exist in the environment in parallel, or if the red agents could change behaviour mid-episode, or if noise was added (Green Agent), then we would have applied hierarchical RL or utilised an RNN (which we expect to do in the second version of the challenge).
+In addition, due to the action space being small (the blue agent cannot perform multiple actions at once, i.e restore multiple hosts for instance), we felt that reinforcement learning was appropriate, however in reality the action spaces for the defender (and attacker) would be too large for our approach.
+
+As a result, we trained two models using DDQN for B_line and Meander. We also experimented with regular Q-learning for B_line after reducing the action and observation spaces, this was successful, but is not included in this submission as it does not add any value. This approach was however interesting to analyse the largest and smallest Q-values to confirm our suspicions.
+
+Finally, it should be noted that we have not considered the Misinform action because it was not in the initial release. This made sense as the Green Agent does not figure in the evaluation.
+
+# Agents
+
+We built three agents:
+1. A Sleep blue agent 
+2. A DDQN blue agent
+3. A Main blue agent which fingerprints the red agents and assigns a blue agent
+
+The agents can be found in the Agents folder.
+
+# DDQN
+
+The DDQN implementation was taken from the following Github page https://github.com/philtabor/Deep-Q-Learning-Paper-To-Code/tree/master/DDQN where it has not been modified except for the model architecture (we opted for a MLP instead of CNN). The architecture is as follows for both models:
+
+        self.fc1 = nn.Linear(input_dims[0], 64)
+        self.fc2 = nn.Linear(64, 64)
+        self.fc3 = nn.Linear(64, n_actions)
+
+Where fc1 and fc2 have ReLU activations.
+
+We trained two models: one for B_line and one for Meander. These are stored in the Models folder.
+
+The train.py and utils.py files are included in the root directory for completeness but are not called in the evaluation.
+
+# Evaluation
+
+The Evaluation folder contains the evaluation.py file and an .md file discussing our approach's strengths and weakne
+# Dependencies
+
+Pytorch 
+Challenge version: 1.2
+
+# Thank you
+
+We would like to thank the organisers of the challenge, and we look forward to version 2.