This repository contains a custom-built simulator for decentralized federated learning systems, developed as part of our master's degree thesis between February 2024 and October 2024. The simulator is implemented in Python and is designed to replicate the behavior of a blockchain-assisted federated learning system in a fully decentralized environment. It supports the simulation of various configurations, allowing experimentation with different consensus mechanisms, validation techniques, and aggregation methods.
The simulator aims to facilitate research and analysis in decentralized federated learning, providing a powerful tool to study vulnerabilities, test defensive mechanisms, and evaluate system performance under diverse configurations.
- Flexible Configuration: Use a JSON configuration file to customize datasets, node behaviors, consensus algorithms (e.g., PoW, PoS, committee-based), and validation mechanisms.
- Malicious Node Behavior: Simulate common malicious trainer behaviors such as label flipping, additive noise, and targeted data poisoning.
- Dataset Management: Partition datasets into IID or N-IID subsets, allowing for diverse training scenarios.
- Consensus Algorithms: Explore the impact of Proof-of-Work, Proof-of-Stake, and committee-based consensus algorithms on federated learning.
- Validation and Aggregation: Test multiple validation and aggregation mechanisms to evaluate their effectiveness in improving system robustness.
|-- datasets/ # Examples of publicly available datasets pre-processed for use with the simulator
|-- docs/ # Docs
|-- models/ # Examples of neural network architectures and initial weights
|-- examples/ # Examples of simulations (e.g., JSON configurations, output files, and analyses performed by means of logger_to_graph.py)
|-- src/ # Source code
|-- shared/ # Baseline modules; incomplete as they are not specialized for any specific consensus algorithm
|-- pos/ # Extensions of the shared modules, specialized for the Proof-of-Stake consensus algorithm
|-- pow/ # Extensions of the shared modules, specialized for the Proof-of-Work consensus algorithm
|-- committee/ # Extensions of the shared modules, specialized for the 'Committee-based' consensus algorithm
|-- __init__.py
|-- main.py
|-- dataset_creator.ipynb # Notebook for manipulating datasets to prepare them for use with the simulator
|-- datasets_models_attacks_visualizer.ipynb # Notebook that shows the core ideas behind the simulator. It shows the manipulations needed to use certain datasets, the creation of neural networks and the core behavior of some malicious attacks
|-- model_creator.ipynb # Notebook for creating neural network architectures and initial weights required for simulations conducted for our thesis
|-- label_flipping_score.py # Script to evaluate the effectiveness of label-flipping attacks on the global model trained during a simulation
|-- targeted_poisoning_score.py # Script to evaluate the effectiveness of targeted data poisoning (e.g., backdoor attacks) on the global model trained during a simulation
|-- logger_to_graph.py # Script to generate visual insights from simulation log files
|-- LICENSE # License file for the repository
|-- README.md # Documentation for the repository
git clone https://github.com/federicocaroli/FedBlockParadox.git
cd FedBlocKParadox
Install the required Python packages, including specialized NVIDIA libraries:
python -m pip install --extra-index-url https://pypi.nvidia.com \
numpy==1.25.2 scipy==1.11.4 matplotlib==3.9.1 tabulate==0.9.0 \
psutil==5.9.5 datasets==2.19.2 flwr_datasets==0.2.0 flwr==1.9.0 \
pympler==1.1 tensorrt-bindings==8.6.1 tensorrt-libs==8.6.1 \
tensorflow[and-cuda]==2.15.0 setproctitle==1.3.3
Ensure you have a valid JSON configuration file for the simulation. Example configuration files are available in the examples/
directory.
Execute the simulator using the main
script and specify your configuration file:
python -m src.main "config_path" \
2>./tmp.txt 1>./tmp.txt
- Replace
config_path
with your specific configuration file. - Note: tmp.txt will contain general logs generated by various Python modules. The actual log file path is specified in the JSON configuration file.
- Review the log file to analyze the simulation's progress and outcomes.
- Use visualization scripts like
logger_to_graph.py
to gain insights. - If the simulations involve malicious nodes performing label flipping or targeted data poisoning attacks, evaluate their impacts using
label_flipping_score.py
ortargeted_poisoning_score.py
.
You can use Docker to simplify dependency management and ensure consistency across environments. Follow the steps below to run the simulator using Docker.
Clone the repository (if you haven’t already) and navigate to its directory:
git clone https://github.com/federicocaroli/FedBlockParadox.git
cd FedBlockParadox
Build the Docker image from the provided Dockerfile
:
docker build -t fedblockparadox_image .
The Docker Compose file provided in this repository is configured to use an external volume (output
) for storing simulation results and log files. To use this feature:
- Ensure the volume exists:
docker volume create output
Ensure you have a valid JSON configuration file for the simulation. You can modify the example located in the container_based_examples/
directory.
Run the Docker container with the required configuration file. For instance:
docker run -it -d -v ./container_based_examples/config.json:/usr/src/app/config.json fedblockparadox_image
- Flags Explanation:
-it
: Runs the container interactively with a terminal.-d
: Runs the container in detached mode, leaving it running in the background.-v ./container_based_examples/config.json:/usr/src/app/config.json
: Mounts the configuration file from the host system to the container at/usr/src/app/config.json
.
After the simulation completes:
- Logs and results are saved in the configured output volume.
- Use scripts like
logger_to_graph.py
to visualize the results.
- Update dependencies and paths as needed.
- Modify scripts to reflect your specific experimental setup.
- Customize datasets and models to fit your use case.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Feel free to fork this repository, submit issues, or create pull requests.
Happy researching and exploring new possibilities! 😊