Skip to content

Commit

Permalink
Add tabular stats content for DLI (#3219)
Browse files Browse the repository at this point in the history
Add tabular stats content.

### Description

Add tabular stats content since it was missed in the copy with the image
stats.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.
  • Loading branch information
nvkevlu authored Feb 12, 2025
1 parent 98489fb commit 32295a8
Show file tree
Hide file tree
Showing 11 changed files with 755 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ def define_parser():
parser.add_argument("-o", "--stats_output_path", type=str, nargs="?", default="statistics/stats.json")
parser.add_argument("-j", "--job_dir", type=str, nargs="?", default="/tmp/nvflare/jobs/image_stats")
parser.add_argument("-w", "--work_dir", type=str, nargs="?", default="/tmp/nvflare/workspace/image_stats")
parser.add_argument("-co", "--export_config", action="store_true", help="config only mode, export config")

return parser.parse_args()

Expand All @@ -38,7 +37,6 @@ def main():
output_path = args.stats_output_path
job_dir = args.job_dir
work_dir = args.work_dir
export_config = args.export_config

statistic_configs = {"count": {}, "histogram": {"*": {"bins": 20, "range": [0, 256]}}}
# define local stats generator
Expand All @@ -54,10 +52,9 @@ def main():
sites = [f"site-{i + 1}" for i in range(n_clients)]
job.setup_clients(sites)

if export_config:
job.export_job(job_dir)
else:
job.simulator_run(work_dir, gpu="0")
job.export_job(job_dir)

job.simulator_run(work_dir, gpu="0")


if __name__ == "__main__":
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@
"id": "7e972070",
"metadata": {},
"source": [
"The file [image_stats_job.py](code/image_stats_job.py) uses the StatsJob to generate a job configuration in a Pythonic way. With the default arguments, the job will be exported to `/tmp/nvflare/jobs/image_stats` and then the job will be run with the FL simulator with the `simulator_run()` command with a work_dir of `/tmp/nvflare/workspace/image_stats`."
"The file [image_stats_job.py](code/image_stats_job.py) uses `StatsJob` to generate a job configuration in a Pythonic way. With the default arguments, the job will be exported to `/tmp/nvflare/jobs/image_stats` and then the job will be run with the FL simulator with the `simulator_run()` command with a work_dir of `/tmp/nvflare/workspace/image_stats`."
]
},
{
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7c4be7b0",
"metadata": {},
"source": [
"# NVFlare Federated Statistics Visualization"
]
},
{
"cell_type": "markdown",
"id": "987e6028",
"metadata": {},
"source": [
"#### Dependencies\n",
"\n",
"To run the examples, you will need to install the following dependencies:\n",
"* numpy\n",
"* pandas\n",
"* wget\n",
"* matplotlib\n",
"* jupyter\n",
"* notebook\n",
"\n",
"These are captured in [requirements.txt](../../requirements.txt)."
]
},
{
"cell_type": "markdown",
"id": "665dc17e",
"metadata": {},
"source": [
"## Tabular Data Statistics Visualization\n",
"In this example, we demonstate how to visualize the results from the statistics of tabular data. The visualization requires json, pandas, matplotlib modules as well as nvflare visualization utlities. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c44a0217",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import json\n",
"import pandas as pd\n",
"\n",
"from nvflare.app_opt.statistics.visualization.statistics_visualization import Visualization"
]
},
{
"cell_type": "markdown",
"id": "30c79d1a",
"metadata": {},
"source": [
"First, copy the resulting json file to demo directory. In this example, the resulting file is called `adults_stats.json`. Then load json file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "44f6bed2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"with open('adults_stats.json', 'r') as f:\n",
" data = json.load(f)"
]
},
{
"cell_type": "markdown",
"id": "c5cdbcc0",
"metadata": {},
"source": [
"Initialize the Visualization utilities:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93c62d5e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"vis = Visualization()"
]
},
{
"cell_type": "markdown",
"id": "1b0f21fd",
"metadata": {},
"source": [
"### Overall Statistics"
]
},
{
"cell_type": "markdown",
"id": "b49588c2",
"metadata": {},
"source": [
"vis.show_statis() will show the statistics for each features, at each site for each dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ab771712",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"\n",
"vis.show_stats(data = data)"
]
},
{
"cell_type": "markdown",
"id": "4986dd14",
"metadata": {},
"source": [
"### Select features statistics using white_list_features \n",
"You can optionally select to show only specified features via the white_list_features argument. In the following, only three features are selected instead of all the features:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "563a8bb7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"vis.show_stats(data = data, white_list_features= ['Age', 'fnlwgt', 'Hours per week'])"
]
},
{
"cell_type": "markdown",
"id": "95e42829",
"metadata": {},
"source": [
"### Histogram Visualization\n",
"You can use `vis.show_histograms()` to visualize the histogram. Before doing that, you can set some iPython display settings to make the graph display in a full cell. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fcdfb197",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from IPython.display import display, HTML\n",
"display(HTML(\"<style>.container { width:100% depth:100% !important; }</style>\"))"
]
},
{
"cell_type": "markdown",
"id": "3e86860e",
"metadata": {},
"source": [
"The following command displays histograms for numeric features. The result shows both the main plot and sub-plots:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3dd3821",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"vis.show_histograms(data = data)"
]
},
{
"cell_type": "markdown",
"id": "49b74579",
"metadata": {},
"source": [
"# Display Options\n",
"Similar to other statistics, you can use white_list_features to select features to display on histograms. you can also use display_format=\"percent\" to allow all dataset and sites to be displayed in the same scale. You can set \n",
"\n",
"* display_format: \"percent\" or \"sample_count\"\n",
"* white_list_features: feature names\n",
"* plot_type : \"both\" or \"main\" or \"subplot\"\n",
"\n",
"#### Show percent display format with selected features\n",
"In the following, only the feature \"Age\" is displayed, in \"percent\" display_format, with \"both\" as the plot_type (since that is the default setting)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e07b9266",
"metadata": {},
"outputs": [],
"source": [
"vis.show_histograms(data = data, display_format = \"percent\", white_list_features= ['Age'])"
]
},
{
"cell_type": "markdown",
"id": "cddf21af",
"metadata": {},
"source": [
"#### Display main plot_type with selected features\n",
"In this example, two features are displayed in \"sample_counts\" display_format, with \"main\" plot_type"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "038e238e",
"metadata": {},
"outputs": [],
"source": [
"vis.show_histograms(data, \"sample_counts\", ['Age', 'Hours per week' ], plot_type=\"main\")"
]
},
{
"cell_type": "markdown",
"id": "b06195ac",
"metadata": {},
"source": [
"#### Selected features with subplot plot_type\n",
"In next example, one feature is displayed in \"sample_counts\" display_format, with \"subplot\" plot_type"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8958e124",
"metadata": {},
"outputs": [],
"source": [
"vis.show_histograms(data, \"sample_counts\", ['Age', 'Hours per week' ], plot_type=\"subplot\")"
]
},
{
"cell_type": "markdown",
"id": "2f330eb6",
"metadata": {},
"source": [
"### Tip: Avoid repeated calculation\n",
"If you intend to plot the histogram main plot and subplot separately, repeatedly calling `show_histograms()` with different plot_types is not efficicent, as it repeatedly calculates the same set of Dataframes. To do it efficiently, you can use the following functions instead of `show_histograms()` to avoid the duplicated calculations. If you intend to show both plots, then `show_histograms()` should be used."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "395315a4",
"metadata": {},
"outputs": [],
"source": [
"feature_dfs = vis.get_histogram_dataframes(data, display_format=\"percent\")\n",
" \n",
"vis.show_dataframe_plots(feature_dfs, plot_type=\"main\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "nvflare_example",
"language": "python",
"name": "nvflare_example"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 32295a8

Please sign in to comment.