From 59bc72549f5cbddaa0f3d2852e232bbff7c24562 Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Thu, 13 Jun 2024 05:03:04 -0400 Subject: [PATCH 1/4] docs: import multi-backend documentation Signed-off-by: Jinzhe Zeng --- doc/freeze/freeze.md | 4 +++- doc/model/sel.md | 18 +++++++++++++++++- doc/test/model-deviation.md | 2 +- doc/third-party/gromacs.md | 2 +- doc/third-party/lammps-command.md | 1 + doc/train/training-advanced.md | 2 +- doc/train/training.md | 18 +++++++++++++++++- doc/troubleshooting/howtoset_num_nodes.md | 21 ++++++++++++++++++++- 8 files changed, 61 insertions(+), 7 deletions(-) diff --git a/doc/freeze/freeze.md b/doc/freeze/freeze.md index 5bd63a4840..c3800917a6 100644 --- a/doc/freeze/freeze.md +++ b/doc/freeze/freeze.md @@ -1,6 +1,7 @@ # Freeze a model -The trained neural network is extracted from a checkpoint and dumped into a protobuf(.pb) file. This process is called "freezing" a model. The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc). To freeze a model, typically one does +The trained neural network is extracted from a checkpoint and dumped into a model file. This process is called "freezing" a model. +To freeze a model, typically one does ::::{tab-set} @@ -11,6 +12,7 @@ $ dp freeze -o model.pb ``` in the folder where the model is trained. The output model is called `model.pb`. +The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc). ::: diff --git a/doc/model/sel.md b/doc/model/sel.md index 8455c242a9..4908954618 100644 --- a/doc/model/sel.md +++ b/doc/model/sel.md @@ -6,10 +6,26 @@ All descriptors require to set `sel`, which means the expected maximum number of To determine a proper `sel`, one can calculate the neighbor stat of the training data before training: +::::{tab-set} + +:::{tab-item} TensorFlow {{ tensorflow_icon }} + ```sh -dp neighbor-stat -s data -r 6.0 -t O H +dp --tf neighbor-stat -s data -r 6.0 -t O H ``` +::: + +:::{tab-item} PyTorch {{ pytorch_icon }} + +```sh +dp --pt neighbor-stat -s data -r 6.0 -t O H +``` + +::: + +:::: + where `data` is the directory of data, `6.0` is the cutoff radius, and `O` and `H` is the type map. The program will give the `max_nbor_size`. For example, `max_nbor_size` of the water example is `[38, 72]`, meaning an atom may have 38 O neighbors and 72 H neighbors in the training data. The `sel` should be set to a higher value than that of the training data, considering there may be some extreme geometries during MD simulations. As a result, we set `sel` to `[46, 92]` in the water example. diff --git a/doc/test/model-deviation.md b/doc/test/model-deviation.md index 441d1aabc6..097b987c6c 100644 --- a/doc/test/model-deviation.md +++ b/doc/test/model-deviation.md @@ -59,7 +59,7 @@ One can also use a subcommand to calculate the deviation of predicted forces or dp model-devi -m graph.000.pb graph.001.pb graph.002.pb graph.003.pb -s ./data -o model_devi.out ``` -where `-m` specifies graph files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results is dumped. Here is more information on this sub-command: +where `-m` specifies model files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results is dumped. Here is more information on this sub-command: ```bash usage: dp model-devi [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}] diff --git a/doc/third-party/gromacs.md b/doc/third-party/gromacs.md index c9779611e7..32531dcf7b 100644 --- a/doc/third-party/gromacs.md +++ b/doc/third-party/gromacs.md @@ -105,7 +105,7 @@ Then, in your working directories, we have to write `input.json` file: Here is an explanation for these settings: -- `graph_file` : The graph file (with suffix .pb) generated by `dp freeze` command +- `graph_file` : The [model file](../backend.md) generated by `dp freeze` command - `type_file` : File to specify DP atom types (in space-separated format). Here, `type.raw` looks like ``` diff --git a/doc/third-party/lammps-command.md b/doc/third-party/lammps-command.md index 63f9d8e3bd..89c89b24fe 100644 --- a/doc/third-party/lammps-command.md +++ b/doc/third-party/lammps-command.md @@ -70,6 +70,7 @@ pair_style deepmd models ... keyword value ... pair_style deepmd graph.pb pair_style deepmd graph.pb fparam 1.2 pair_style deepmd graph_0.pb graph_1.pb graph_2.pb out_file md.out out_freq 10 atomic relative 1.0 +pair_style deepmd graph_0.pb graph_1.pth out_file md.out out_freq 100 pair_coeff * * O H pair_style deepmd cp.pb fparam_from_compute TEMP diff --git a/doc/train/training-advanced.md b/doc/train/training-advanced.md index a0f6759256..075556eb16 100644 --- a/doc/train/training-advanced.md +++ b/doc/train/training-advanced.md @@ -170,7 +170,7 @@ One can set other environmental variables: | DP_AUTO_PARALLELIZATION | 0, 1 | 0 | Enable auto parallelization for CPU operators. | | DP_JIT | 0, 1 | 0 | Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow supports JIT. | -## Adjust `sel` of a frozen model +## Adjust `sel` of a frozen model {{ tensorflow_icon }} One can use `--init-frz-model` features to adjust (increase or decrease) [`sel`](../model/sel.md) of a existing model. Firstly, one needs to adjust [`sel`](./train-input.rst) in `input.json`. For example, adjust from `[46, 92]` to `[23, 46]`. diff --git a/doc/train/training.md b/doc/train/training.md index 5b7bbd32a8..5e8f8db498 100644 --- a/doc/train/training.md +++ b/doc/train/training.md @@ -8,10 +8,26 @@ $ cd $deepmd_source_dir/examples/water/se_e2_a/ After switching to that directory, the training can be invoked by +::::{tab-set} + +:::{tab-item} TensorFlow {{ tensorflow_icon }} + ```bash -$ dp train input.json +$ dp --tf train input.json ``` +::: + +:::{tab-item} PyTorch {{ pytorch_icon }} + +```bash +$ dp --pt train input.json +``` + +::: + +:::: + where `input.json` is the name of the input script. By default, the verbosity level of the DeePMD-kit is `INFO`, one may see a lot of important information on the code and environment showing on the screen. Among them two pieces of information regarding data systems are worth special notice. diff --git a/doc/troubleshooting/howtoset_num_nodes.md b/doc/troubleshooting/howtoset_num_nodes.md index 532fa39e66..d5800d380b 100644 --- a/doc/troubleshooting/howtoset_num_nodes.md +++ b/doc/troubleshooting/howtoset_num_nodes.md @@ -72,13 +72,32 @@ There is no one general parallel configuration that works for all situations, so Here are some empirical examples. If you wish to use 3 cores of 2 CPUs on one node, you may set the environmental variables and run DeePMD-kit as follows: +::::{tab-set} + +:::{tab-item} TensorFlow {{ tensorflow_icon }} + +```bash +export OMP_NUM_THREADS=3 +export DP_INTRA_OP_PARALLELISM_THREADS=3 +export DP_INTER_OP_PARALLELISM_THREADS=2 +dp --tf train input.json +``` + +::: + +:::{tab-item} PyTorch {{ pytorch_icon }} + ```bash export OMP_NUM_THREADS=3 export DP_INTRA_OP_PARALLELISM_THREADS=3 export DP_INTER_OP_PARALLELISM_THREADS=2 -dp train input.json +dp --pt train input.json ``` +::: + +:::: + For a node with 128 cores, it is recommended to start with the following variables: ```bash From 1c5c114873819f76b5437d413e5f3ae595b7bd92 Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Thu, 13 Jun 2024 06:35:28 -0400 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Jinzhe Zeng --- doc/test/model-deviation.md | 2 +- doc/train/training-advanced.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/test/model-deviation.md b/doc/test/model-deviation.md index 097b987c6c..98c0edf227 100644 --- a/doc/test/model-deviation.md +++ b/doc/test/model-deviation.md @@ -59,7 +59,7 @@ One can also use a subcommand to calculate the deviation of predicted forces or dp model-devi -m graph.000.pb graph.001.pb graph.002.pb graph.003.pb -s ./data -o model_devi.out ``` -where `-m` specifies model files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results is dumped. Here is more information on this sub-command: +where `-m` specifies model files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results are dumped. Here is more information on this sub-command: ```bash usage: dp model-devi [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}] diff --git a/doc/train/training-advanced.md b/doc/train/training-advanced.md index 075556eb16..5051d981e8 100644 --- a/doc/train/training-advanced.md +++ b/doc/train/training-advanced.md @@ -172,7 +172,7 @@ One can set other environmental variables: ## Adjust `sel` of a frozen model {{ tensorflow_icon }} -One can use `--init-frz-model` features to adjust (increase or decrease) [`sel`](../model/sel.md) of a existing model. Firstly, one needs to adjust [`sel`](./train-input.rst) in `input.json`. For example, adjust from `[46, 92]` to `[23, 46]`. +One can use `--init-frz-model` features to adjust (increase or decrease) [`sel`](../model/sel.md) of an existing model. Firstly, one needs to adjust [`sel`](./train-input.rst) in `input.json`. For example, adjust from `[46, 92]` to `[23, 46]`. ```json "model": { From e935163563f53670e7641caebfc752373ee29756 Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Thu, 13 Jun 2024 06:40:01 -0400 Subject: [PATCH 3/4] Update train-fitting-dos.md Signed-off-by: Jinzhe Zeng --- doc/model/train-fitting-dos.md | 45 ++++++++++++++++++++++++++++++---- 1 file changed, 40 insertions(+), 5 deletions(-) diff --git a/doc/model/train-fitting-dos.md b/doc/model/train-fitting-dos.md index 7b68525a45..60261a5871 100644 --- a/doc/model/train-fitting-dos.md +++ b/doc/model/train-fitting-dos.md @@ -1,7 +1,7 @@ -# Fit electronic density of states (DOS) {{ tensorflow_icon }} +# Fit electronic density of states (DOS) {{ tensorflow_icon }} {{ pytorch_icon }} {{ dpmodel_icon }} :::{note} -**Supported backends**: TensorFlow {{ tensorflow_icon }} +**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, DP {{ dpmodel_icon }} ::: Here we present an API to DeepDOS model, which can be used to fit electronic density of state (DOS) (which is a vector). @@ -82,10 +82,26 @@ To prepare the data, we recommend shifting the DOS data by the Fermi level. The training command is the same as `ener` mode, i.e. +::::{tab-set} + +:::{tab-item} TensorFlow {{ tensorflow_icon }} + +```bash +dp --tf train input.json +``` + +::: + +:::{tab-item} PyTorch {{ pytorch_icon }} + ```bash -dp train input.json +dp --pt train input.json ``` +::: + +:::: + The detailed loss can be found in `lcurve.out`: ``` @@ -117,13 +133,32 @@ The detailed loss can be found in `lcurve.out`: In this earlier version, we can use `dp test` to infer the electronic density of state for given frames. +::::{tab-set} + +:::{tab-item} TensorFlow {{ tensorflow_icon }} + +```bash + +dp --tf freeze -o frozen_model.pb + +dp --tf test -m frozen_model.pb -s ../data/111/$k -d ${output_prefix} -a -n 100 +``` + +::: + +:::{tab-item} PyTorch {{ pytorch_icon }} + ```bash -$DP freeze -o frozen_model.pb +dp --pt freeze -o frozen_model.pth -$DP test -m frozen_model.pb -s ../data/111/$k -d ${output_prefix} -a -n 100 +dp --pt test -m frozen_model.pth -s ../data/111/$k -d ${output_prefix} -a -n 100 ``` +::: + +:::: + if `dp test -d ${output_prefix} -a` is specified, the predicted DOS and atomic DOS for each frame is output in the working directory ``` From 6ca1ce295d1c2c21f55a3cb4c26d3691667d6a20 Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Thu, 13 Jun 2024 12:32:26 -0400 Subject: [PATCH 4/4] Update doc/model/train-fitting-dos.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Jinzhe Zeng --- doc/model/train-fitting-dos.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/model/train-fitting-dos.md b/doc/model/train-fitting-dos.md index 60261a5871..4c4366a1e1 100644 --- a/doc/model/train-fitting-dos.md +++ b/doc/model/train-fitting-dos.md @@ -159,7 +159,7 @@ dp --pt test -m frozen_model.pth -s ../data/111/$k -d ${output_prefix} -a -n 100 :::: -if `dp test -d ${output_prefix} -a` is specified, the predicted DOS and atomic DOS for each frame is output in the working directory +if `dp test -d ${output_prefix} -a` is specified, the predicted DOS and atomic DOS for each frame are output in the working directory ``` ${output_prefix}.ados.out.0 ${output_prefix}.ados.out.1 ${output_prefix}.ados.out.2 ${output_prefix}.ados.out.3