This repository contains code, data, and figures that support:
Eskew, E.A., E. Clancey, D. Singh, S. Situma, L. Nyakarahuka, M. K. Njenga, and S. L. Nuismer. Projecting climate change impacts on inter-epidemic risk of Rift Valley fever across East Africa.
Models of inter-epidemic Rift Valley fever (RVF) relied on a suite of spatially-explicit predictor variables. All predictors were ultimately processed to a resolution of 2.5 arcminutes, but here we provide details about the sourcing and native resolution of all predictors:
-
Hydrology
-
Lake data from HydroLAKES (shapefile of lakes globally)
-
River data from HydroRIVERS (shapefile of rivers globally)
-
-
Soils
- Multiple variables from SoilGrids (250 m resolution [~8 arseconds])
-
Topography
-
Elevation data from SRTM (1 arcsecond resolution)
-
Slope was calculated using the elevation data described above
-
-
Disease detection
- Travel time to healthcare data from Weiss et al. 2020, Nature Medicine (30 arcsecond resolution)
-
Livestock density
- Cattle, goat, and sheep density data from Gridded Livestock of the World version 4 (5 arcminute resolution)
-
Human population density
-
Historical human population data from WorldPop (30 arcsecond resolution)
-
Projected human population data from Wang et al. 2022, Scientific Data (30 arcsecond resolution)
-
-
Precipitation and temperature
To help explain the project scripts, the overall workflow is as follows:
get_SoilGrids_data.R
programmatically downloads the soil predictor data. All other predictor data were manually downloaded from the online resources described above
-
process_all_predictors.R
processes all predictor data into rasters of 2.5 arcminute resolution. This script calls the variousprocess_*_data.R
scripts that each handle a certain type of predictor data. Note that these scripts do need to be called in the order prescribed byprocess_all_predictors.R
so that intermediate files are available, as needed -
generate_predictor_flat_files.R
takes the 2.5 arcminute raster predictor files and generates flat CSV files describing the predictor data for each grid cell across the study region. Predictor data in this format are necessary for downstream modeling. Note that these flat predictor files are generated for both historical and future climate conditions
-
generate_absence_data.R
generates the background (i.e., pseudo-absence) data for use in inter-epidemic RVF modeling. Produces thedata_*_pseudoabsences.csv
files in the data/outbreak_data subdirectory -
extract_outbreak_absence_predictors.R
uses the predictor flat files to generate a data frame with predictor data for all observed inter-epidemic RVF outbreak events as well as the background points. Produces theoutbreak_*_predictors.csv
files in the data/outbreak_data subdirectory
fit_model.R
fits and saves an XGBoost model of the disease outbreak and background data. These objects are saved in the data/saved_objects subdirectory
-
model_postprocessing.R
uses saved XGBoost model objects to generate ROC curve, variable importance, and partial dependence plots. Also calculates the cutoff value that maximizes the true skill statistic (TSS) for use in downstream analyses -
generate_prediction_rasters.R
uses saved XGBoost model objects to generate prediction rasters showing the relative likelihood of RVF across the study region. These prediction rasters are generated for all months of the calendar year using predictor data describing historical climate (1970-2000), historical weather (2008-2021), and future climate conditions. Summary data is written to prediction_raster_summary.csv -
model_validation.R
calculates grid cell-level RVFV force of infection (FOI) and combines these estimates with RVF relative likelihood values from the prediction raster layers to validate our model's predictive ability. Also generates the accompanying figure. Data written to serology_data_for_validation.csv -
calculate_pop_at_risk.R
combines predicted RVF relative likelihood values from the prediction rasters with estimates of future human population to calculate the future population at risk. Estimates written to human_pop_at_risk.csv