-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix restart and clean up pcgsoi #108
Comments
One GFS-GDAS experiment from Russ.
|
Without restart: < The total amount of wall time = 2213.345014 With restart:
For this run using the restart file used an additional 53sec. |
Looking at PCGSOI.f90 a bit more, I realized that the calling of the control2state routines to create sval and sbias at the beginning of the routine were unnecessary at the beginning of the main pcgsoi loop. Note that the control2state operators are linear, so sval and sbias can be updated within stpcalc. Also, this allows us to get rid of the xhat array. There are 2 calls to control2state after the end of the loop which also can be removed. While the results should be the same, there are some small round off differences. These changes result in a significant improvement in the wall time. At the end of the first outer iteration (50 inner iterations) this is the comparison of the results.
At the end of the second outer iteration (150 iterations) the results are still very close.
The wall time and some of the other measures are nicely reduced.
< The total amount of wall time = 4424.262244 Note that both of these runs wrote a restart file. I would expect the wall times to be reduced by the same amount for both runs if the write was removed. Both runs were made on Orion. Russ made a test on the operational machine (Venus) without writing a restart file. Results from Russ below: The suggested test has been run on Venus for 2021020912. The gsi was run in the proposed operational v16 configuration: 1000 pe, 250 nodes, ptile=4, threads=7. The restart global_gsi.x was run with iguess = -1 (no restart). The v16 gfs gsi runs 2 outer loops, 50 & 100 iterations. Here are the wall times for the gfs gsi.
The wall time reduction is impressive. Here are the final "cost,grad" printout from the two runs master: cost,grad,step,b,step? = 2 100 1.218883268288697116E+06 1.066994509267928493E+01 2.022357024387213986E+00 8.657558371966026511E-01 good These terms are very similar after 150 total iterations The same cycle was repeated but using gdas dumps and the gdas configuration (50, 150 iterations). Here are the results:
master: cost,grad,step,b,step? = 2 150 1.452479838349013124E+06 1.828125936910788285E+00 1.132785455596231117E+00 9.418740811004355784E-01 good Another impressive reduction in wall time with very similar fort.220 numbers on the last iteration. To put these numbers in perspective, below are the min,max range of 12Z gfs and gdas analysis gsi wall times from v15 operations over the first 9 days of February 2021.
The restart branch gets the v16 gdas analysis wall time within 5 minutes of v15. The v15 gfs analysis runs 50, 150 iterations. Hence, the longer v15 wall time with respect to v16. If we were to run the v16 gfs analysis with 50, 150 iterations, we'd likely see a similar within 5 minute result for the restart branch, too. |
Additional changes were made to optimize the reading of the ensemble. Several changes were made to reduce unnecessary data motion. Also, it was noted that the pole points were determined after reading in the fields, distributing to the processors, converting to real and then calculating pole point values. Instead, this can be done more efficiently by creating pole point values right after reading it in. This calculation is done on the read in single precision numbers (using double precision arithmetic). Small differences are noted in the pole points (first line north pole, second south pole). e.g., 208.156585693359 208.156583815813 1.512769813416526E-006 1.512769825721497E-006 4.136569131674150E-008 4.136569120571920E-008 U,V (U north pole, V north pole, U south pole, V south pole) -75.6385803222656 8.52435111999512 -5.14269161224365 5.76587772369385 Comparisons are for north and south pole and different variables. Also, received message from Manual Pondeca saying that the gram-schmidt stuff was no longer necessary. Removed from branch. |
WCOSS_D Regression Tests Run the authoritative master at 9c1fc15 and forked restart at 5618311 on Mars. The following tests Passed.
The remaining tests returned Failed. Examination of the failed cases show that for each the master and restart global_gsi.x compute identical total cost functions. Identical refers to the cost function value printed in
While the final total penalties for global_4denvar_T126 remain identical to 19 printed digits, the gradient values differ as shown below
As John noted, differences are expected and are relatively small. |
GitHub Issue #108. Fix restart and clean up pcgsoi.
This update is to fix the restart process, optimize it and clean up pcgsoi.
With the inclusion of ensembles, the restart option was broken. It was relatively straightforward to get it to work, however the cost of the straightforward fix was too high. The addition to the run time and the size of the restart (gesfile) was very large.
I realized that the equality By=x is maintained by the GSI. For that reason, it was only necessary to save yhat and not xhat. In the previous code, both xhat and yhat were save and distributed. So by only saving yhat, the file size was cut in half and the distribution of the data to the various processors cut in half.
Also, the bias correction coefficients are not saved, because for the gfs to gdas process, the radiance bias correction does not have the same number of values because of the monitored data. Also, the aircraft bias correction may not be in the same order or have the same number of values due to the later cut-off of the gdas. Note it is easy to turn on the radiance bias correction coefficients by setting writebias = .true. in jfunc.f90. If the number of bias correction coefficients is different between the write and read, the bias correction coefficient guess will be turned off.
In pcgsoi.f90, the restart guess file is used to define a search direction for the first iteration. When no bias correction coefficients are used in the guess, the original grad values are used. It is not clear yet the best way to use this option. It may be best to add an additional outer iteration with the first outer iteration being just a few iterations to update the guess and use it for the quality control. Or it may be best just to use the guess to reduce the number of iterations necessary.
The code was also updated to remove much of the duplicated code for using precondition or not based on the diag_precon flag. The results should be the same (within roundoff) if diag_precon is false and the diagonal preconditioning is equal to a constant. Note changes to this constant will only result in a commensurate change to the stepsize. The default when diag_precon is equal to false is to use the step_start value. By choosing the step_start value well, the stepsizes should be kept close to 1.
Changes are put into branch restart on jderber's fork.
The text was updated successfully, but these errors were encountered: