This is the backend system for the GPU server monitoring system. The system consists of the following components:
- API server
- DB script
- Data collection script
The API server is a Flask server that provides the following endpoints:
/merged_data
/history?n=<node_name>
/node?n=<node_name>
$ python server.py
The DB script is a script that merges the data collected by the data collection script and stores it in JSON files.
$ python db.py
In order to configure the servers displayed on the dashboard, please edit config.yaml
.
Example:
server:
host1:
gpu: RTX A6000
status: operational
host2-down:
gpu: Tesla A100
status: down
The data collection script get_server_info.py
is a script that collects the data from the servers and stores it in JSON files.
You need to run this script on each server.
$ python get_server_info.py -d