This repository is the code implementation of the paper DeepPhysiNet: Bridging Deep Learning and Atmospheric Physics for Accurate and Continuous Weather Modeling.
The current branch has been tested under PyTorch 2.x and CUDA 12.1, supports Python 3.7+, and is compatible with most CUDA versions.
-
Download and make label dataset
-
Parameters details in config file
-
Model inference on grid and station level
-
More comprehensive validation and code testing
- Introduction
- TODO
- Table of Contents
- Installation
- Dataset Preparation
- Model Training
- Model Inference
- Citation
- License
- Contact
- Linux or Windows
- GDAL 3.0 or higher, recommended 3.6.2
- Python 3.7+, recommended 3.10
- PyTorch 2.0 or higher, recommended 2.1
- CUDA 11.7 or higher, recommended 12.1
- cfgrib 0.9 or higher, recommended 0.9.10
- xarray 2023.12.0
- NETCDF4 1.6.5 or higher, recommended 1.6.5
- Metpy 1.0 or higher, recommended 1.6
We recommend using Miniconda for installation. The following command will create a virtual environment named DeepPhysiNet
and install PyTorch,GDAL and other libraries.
Note: If you have experience with Conda, pytorch, GDAL and have already installed them, you can skip to the next section. Otherwise, you can follow these steps to prepare.
Step 0: Install Miniconda.
Step 1: Create a virtual environment named DeepPhysiNet
and activate it.
conda create -n DeepPhysiNet python=3.10 -y
[conda] activate DeepPhysiNet
Step 2: Install GDAL:
conda install gdal==3.6.2
Step 3: Install PyTorch.
Linux:
pip install torch torchvision torchaudio
Windows:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Step 4: Install MMCV.
pip install -U openmim
mim install "mmcv>=2.0.0"
Step 5:Install cfgrib.
conda install -c conda-forge cfgrib
Step 6: Install other dependencies.
pip install -r requirement.txt
The IFS data we use for training is collected from TIGGE and is in grib format. The variables for training is listed as follows:
The reanalysis data we use as ground truth comes from ERA5.
Here I will give a detailed instructions to prepare training dataset, including data downloading, convert, extraction and others. If you want to use your own dataset, just ensure that the organization of the dataset meets the requirements as follows:
- Each variable listed in the above table is saved in a separate tiff image file.
- The number of levels equals to the number of image channels.
- Training data are organized by years.
Input data from TIGGE
Step 1: Download
You can follow the instruction from TIGGE.
Also, you can just access the data through the following adrresses:
Please select the same study area same in the paper with boundary with
The downloaded data are in the format of Grib. You should save the dataset with level of surface and pressure to a separate path, respectively. Below is an example.
${GRIB_DATASET_ROOT} # Dataset root directory, for example: /home/username/data/NWPU
├── pressure
│ ├── pressure_202107.grib
│ ├── pressure_202108.grib
│ └── pressure_202109.grib
└── surface
├── surface_202107.grib
├── surface_202108.grib
└── surface_202109.grib
Step 2 Data Conversion and Extraction
For convenience, convert the downloaded files with Grib format to NC format.
python tools/cvt_grib_to_nc.py --data_path $GRIB_DATASET_ROOT/pressure --result_path $NC_DATASET_ROOT --pressure --num_threads 0
python tools/cvt_grib_to_nc.py --data_path $GRIB_DATASET_ROOT/surface --result_path $NC_DATASET_ROOT --num_threads 0
Then extract the variables for training and inference.
python tools/extract_variable_from_nc.py --data_path $NC_DATASET_ROOT --result_path $Variable_ROOT --pressure --num_threads 0
python tools/extract_variable_from_nc.py --data_path $NC_DATASET_ROOT --result_path $Variable_ROOT --num_threads 0
Step 3 Calculate extra variable and Statistic
The input variable of air intensity should be calculated:
python tools/calc_rho.py --data_path $Variable_ROOT --num_threads 0
Now, we have collected all variables required for training, but before training, we need to calculate each variabel's mean and standard variation for normalization.
python tools/calc_mean_std.py --data_path $Variable_ROOT --result_path $Variable_ROOT --num_threads 0
Also, we need to check the input data and generate a temp file for training for convenience.
python tools/generate_input_map.py --data_path $Variable_ROOT --result_file $Variable_ROOT/input_map.pickle --start_time 2007-01-01-00:00:00 --end_time 2020-12-31-12:00:00
Other data can be found in auxiliary data.
Labels from ERA5
Step 1: Download
ERA5 data can be downloaded from ERA5.
Please select the format of NetCDF and same study area same in the paper with boundary with
Also note that the target resolution should be
step 2: Variables Extraction
Use the following commands to extract variables from ERA5 as training labels.
python tool/extract_variable_from_ERA5.py --data_path $ERA_ROOT --result_path $LABEL_ROOT --start_time 2020-01-01-00:00:00 --end_time 2020-12-31-12:00:00 --num_threads 2
step 3: Calculate extra variable
The input variable of air intensity should be calculated:
python tools/calc_rho.py --data_path $LABEL_ROOT --num_threads 0
We provide the configuration files used in the paper, which can be found in the configs
folder.
Below we provide an analysis of some of the main parameters.
Parameter Parsing:
python tools/train.py configs/xxx.py # xxx.py is the configuration file you want to use
TODO
If you use the code or performance benchmarks of this project in your research, please refer to the bibtex below to cite.
@misc{li2024deepphysinet,
title={DeepPhysiNet: Bridging Deep Learning and Atmospheric Physics for Accurate and Continuous Weather Modeling},
author={Wenyuan Li and Zili Liu and Keyan Chen and Hao Chen and Shunlin Liang and Zhengxia Zou and Zhenwei Shi},
year={2024},
eprint={2401.04125},
archivePrefix={arXiv},
primaryClass={physics.ao-ph}
}
This project is licensed under the Apache 2.0 license.
If you have any other questions or suggestions, please contact Wenyuan Li ([email protected] or [email protected]).