-
Notifications
You must be signed in to change notification settings - Fork 185
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add tabular stats content for DLI (#3219)
Add tabular stats content. ### Description Add tabular stats content since it was missed in the copy with the image stats. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated.
- Loading branch information
Showing
11 changed files
with
755 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+137 KB
...atistics/federated_statistics_with_tabular_data/code/df_stats/demo/df_stats.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+234 KB
...tistics/federated_statistics_with_tabular_data/code/df_stats/demo/hist_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+213 KB
...atistics/federated_statistics_with_tabular_data/code/df_stats/demo/stats_df.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
293 changes: 293 additions & 0 deletions
293
..._statistics/federated_statistics_with_tabular_data/code/df_stats/demo/visualization.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,293 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7c4be7b0", | ||
"metadata": {}, | ||
"source": [ | ||
"# NVFlare Federated Statistics Visualization" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "987e6028", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Dependencies\n", | ||
"\n", | ||
"To run the examples, you will need to install the following dependencies:\n", | ||
"* numpy\n", | ||
"* pandas\n", | ||
"* wget\n", | ||
"* matplotlib\n", | ||
"* jupyter\n", | ||
"* notebook\n", | ||
"\n", | ||
"These are captured in [requirements.txt](../../requirements.txt)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "665dc17e", | ||
"metadata": {}, | ||
"source": [ | ||
"## Tabular Data Statistics Visualization\n", | ||
"In this example, we demonstate how to visualize the results from the statistics of tabular data. The visualization requires json, pandas, matplotlib modules as well as nvflare visualization utlities. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c44a0217", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import json\n", | ||
"import pandas as pd\n", | ||
"\n", | ||
"from nvflare.app_opt.statistics.visualization.statistics_visualization import Visualization" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "30c79d1a", | ||
"metadata": {}, | ||
"source": [ | ||
"First, copy the resulting json file to demo directory. In this example, the resulting file is called `adults_stats.json`. Then load json file:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "44f6bed2", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"with open('adults_stats.json', 'r') as f:\n", | ||
" data = json.load(f)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c5cdbcc0", | ||
"metadata": {}, | ||
"source": [ | ||
"Initialize the Visualization utilities:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "93c62d5e", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"vis = Visualization()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "1b0f21fd", | ||
"metadata": {}, | ||
"source": [ | ||
"### Overall Statistics" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b49588c2", | ||
"metadata": {}, | ||
"source": [ | ||
"vis.show_statis() will show the statistics for each features, at each site for each dataset:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "ab771712", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"vis.show_stats(data = data)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4986dd14", | ||
"metadata": {}, | ||
"source": [ | ||
"### Select features statistics using white_list_features \n", | ||
"You can optionally select to show only specified features via the white_list_features argument. In the following, only three features are selected instead of all the features:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "563a8bb7", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"vis.show_stats(data = data, white_list_features= ['Age', 'fnlwgt', 'Hours per week'])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "95e42829", | ||
"metadata": {}, | ||
"source": [ | ||
"### Histogram Visualization\n", | ||
"You can use `vis.show_histograms()` to visualize the histogram. Before doing that, you can set some iPython display settings to make the graph display in a full cell. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "fcdfb197", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from IPython.display import display, HTML\n", | ||
"display(HTML(\"<style>.container { width:100% depth:100% !important; }</style>\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3e86860e", | ||
"metadata": {}, | ||
"source": [ | ||
"The following command displays histograms for numeric features. The result shows both the main plot and sub-plots:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f3dd3821", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"vis.show_histograms(data = data)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "49b74579", | ||
"metadata": {}, | ||
"source": [ | ||
"# Display Options\n", | ||
"Similar to other statistics, you can use white_list_features to select features to display on histograms. you can also use display_format=\"percent\" to allow all dataset and sites to be displayed in the same scale. You can set \n", | ||
"\n", | ||
"* display_format: \"percent\" or \"sample_count\"\n", | ||
"* white_list_features: feature names\n", | ||
"* plot_type : \"both\" or \"main\" or \"subplot\"\n", | ||
"\n", | ||
"#### Show percent display format with selected features\n", | ||
"In the following, only the feature \"Age\" is displayed, in \"percent\" display_format, with \"both\" as the plot_type (since that is the default setting)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e07b9266", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"vis.show_histograms(data = data, display_format = \"percent\", white_list_features= ['Age'])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "cddf21af", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Display main plot_type with selected features\n", | ||
"In this example, two features are displayed in \"sample_counts\" display_format, with \"main\" plot_type" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "038e238e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"vis.show_histograms(data, \"sample_counts\", ['Age', 'Hours per week' ], plot_type=\"main\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b06195ac", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Selected features with subplot plot_type\n", | ||
"In next example, one feature is displayed in \"sample_counts\" display_format, with \"subplot\" plot_type" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8958e124", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"vis.show_histograms(data, \"sample_counts\", ['Age', 'Hours per week' ], plot_type=\"subplot\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "2f330eb6", | ||
"metadata": {}, | ||
"source": [ | ||
"### Tip: Avoid repeated calculation\n", | ||
"If you intend to plot the histogram main plot and subplot separately, repeatedly calling `show_histograms()` with different plot_types is not efficicent, as it repeatedly calculates the same set of Dataframes. To do it efficiently, you can use the following functions instead of `show_histograms()` to avoid the duplicated calculations. If you intend to show both plots, then `show_histograms()` should be used." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "395315a4", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"feature_dfs = vis.get_histogram_dataframes(data, display_format=\"percent\")\n", | ||
" \n", | ||
"vis.show_dataframe_plots(feature_dfs, plot_type=\"main\")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "nvflare_example", | ||
"language": "python", | ||
"name": "nvflare_example" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.2" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.