-
Notifications
You must be signed in to change notification settings - Fork 26
Common Fails and How to Fix Them
Covered in PR https://github.com/JGCRI/gcamdata/pull/390.
The line mutate(value = value / first(value, order_by = year))
worked fine on BBL's machine (Mac). However, despite passing local Test Package and Check Package, once BBL opened the PR, the Travis tests failed via the driver aborting when trying to run the chunk (Mac and Linux).
ACS checked out branch, had same issue (PC). Determined it was the order_by = year
argument in first()
. Fixed by removing order_by = year
and adding a line arrange(year)
before the mutate
in question. Driver runs on ACS machine, Test and Check pass fine. Travis tests pass.
The gcamdata
package needs to know about the relationship of data objects and chunks to each other, so that it can provide tracing and graphing services to the user. But this information is only available after the entire data system is run, which is expensive and slow. As a workaround, the package keeps an internal data object (it's just a data frame) named GCAM_DATA_MAP
that records all this information. If you make changes to the data system--to precursors, data flow, output names, etc.--then what the data system actually produces gets out of sync with GCAM_DATA_MAP
. In this case, Travis will throw the (relatively informative) error above. All you need to do is source("data/generate_package_data.R)
and commit the new GCAM_DATA_MAP
object.
Failure: matches old data system output (@test_oldnew.R#XX) ----------------- length(oldf) == 1 isn't true. Either zero, or multiple, comparison datasets found for XXXX.csv
Check to make sure you added the file from the old data system into the comparison data folder (tests/testthat/comparison_data/
).
Check that the @export
tag is in the file header. If not, add then Document, Build and Reload, etc. Or, prefix your chunk name with gcamdata:::
, e.g. debug(gcamdata:::my_chunk_name)
.
If you add FLAG_LONG_YEAR_FORM
the test code attempts to reshape your data to a 'wide' format, with years in individual columns. To do so there need to be columns named year
and value
.
When HISTORICAL_YEARS
and/or FUTURE_YEARS
are shifted, the tables being joined no longer have matching years. One of the tables is likely unresponsive to the shift or doesn't contain data for the shifted years. You may need to interpolate the data in this table to all years (approx_fun
will be of use).
WARNING: This Fail is no longer relevant since #1073 where we explicitly specify column types in the CSV file. See Input Files for details on how to do this.
This is a failure of readr:read_csv
. There's some column(s) that looks like an integer for the first 1000+ rows, but then has decimal places. Whatever that column is, change the first row's value to add a ".0". (Hint: you can highlight multiple cells in Excel and click "increase decimal," which is handy if there are many of them.) This will force read_csv
to see it as a float, not an integer (it's scanning only the first 1000 lines to determine column types). Or re-sort the file in some way so that a decimal value is that column is within the first 1000 rows. More info if you're interested: https://github.com/tidyverse/readr/issues/645.
Error in get_data(all_data, "common/iso_GCAM_regID") : Data system: couldn't find common/iso_GCAM_regID
This error was thrown due to calling stop_after_mychunk
. If you want to debug your chunk, you need to stop_before_yourchunk
, so that you ONLY have the chunk inputs. stop_after_yourchunk
assumes your chunk runs and produces outputs.
You may get this error message in a chunk where you generated a chunk output from the Level 2
output of some other chunk, such as:
L270.CreditInput_elec %>%
select(-coefficient) ->
# Since none of the operations made any modifications to the actual data
# our new output retains a reference to the original, including it's meta
# data such as title!
L2233.DeleteCreditInput_elec
To fix this, add strip_attributes = TRUE
when loading required inputs with get_data()
:
L270.CreditInput_elec <- get_data(all_data, "L270.CreditInput_elec", strip_attributes = TRUE)
Another work around is to force it to make a copy. The simplest way is to include a call to mutate
in the pipeline, even if it doesn't actually modify anything:
L270.CreditInput_elec %>%
# a mutate will force a copy and drop meta data from L270.CreditInput_elec
# which we do not care to bring along
mutate(sector.name = sector.name) %>%
select(-coefficient) ->
L2233.DeleteCreditInput_elec
This is most often because (i) a particular dataset is created that doesn't, under timeshift conditions, have all the necessary years, and then later (ii) something tried to left_join_error_no_match
it. Boom! Fix it at the location of (i), i.e. upstream, by making sure all the necessary years are there (extrapolating if necessary). Here's an example from module_energy_L232.industry
, which was causing a timeshift failure in module_water_L232.water.demand.manufacturing
downstream:
# extrapolate to all model years if necessary
complete(nesting(region, GCAM_region_ID), year = MODEL_YEARS) %>%
arrange(region, GCAM_region_ID, year) %>%
group_by(region, GCAM_region_ID) %>%
mutate(calOutputValue = approx_fun(year, calOutputValue, rule = 2)) %>%
ungroup %>%
To run under timeshift conditions, for debugging and to examine intermediate products, uncomment the last lines in constants.R
.
When modifying gcamdata
in a way that requires updating / adding a new "Header" or potentially merging two branches which have each done this you may run into a "Circular Fail". Where you need to update the package data to reflect the updated LEVEL2_DATA_NAMES
to be able to run the driver()
without error however you can not source("data-raw/generate_package_data.R)
because in there are calls to the driver()
to update GCAM_DATA_MAP
and PREBUILT_DATA. In principal we should be able to decouple updating each of these data thus eliminating the circular logic. However in the meantime the following steps should successfully work around the problem:
- Comment out the
driver()
calls to updateGCAM_DATA_MAP
andPREBUILT_DATA
:
GCAM_DATA_MAP <- driver(return_data_map_only = TRUE)
PREBUILT_DATA <- driver(write_outputs = FALSE,
write_xml = FALSE,
return_data_names = c(
# outputs of module_energy_LA101.en_bal_IEA
"L101.en_bal_EJ_R_Si_Fi_Yh_full",
"L101.en_bal_EJ_ctry_Si_Fi_Yh_full",
"L101.in_EJ_ctry_trn_Fi_Yh",
"L101.in_EJ_ctry_bld_Fi_Yh",
# output of module_energy_LA111.rsrc_fos_Prod
"L111.RsrcCurves_EJ_R_Ffos",
# output of module_energy_LA118.hydro
"L118.out_EJ_R_elec_hydro_Yfut",
# outputs of module_energy_LA121.oil
"L121.in_EJ_R_unoil_F_Yh",
"L121.in_EJ_R_TPES_crude_Yh",
"L121.in_EJ_R_TPES_unoil_Yh"
))
- Instead set them to the previously calculated values:
GCAM_DATA_MAP <- gcamdata:::GCAM_DATA_MAP
PREBUILT_DATA <- gcamdata:::PREBUILT_DATA
- Run
source("data-raw/generate_package_data.R)
which will then updateLEVEL2_DATA_NAMES
and leave the others as they were before. - Reload / re-install
gcamdata
- Remove the statements from Step 2 and un-comment the calls from Step 1 then
source("data-raw/generate_package_data.R)
to fully update all the package data. - Don't forget to reload / re-install
gcamdata
once more to get this fully updated package data for when you try to run thedriver()
next.
We have encountered a situation where the library runs (from within RStudio after opening gcamdata.Rproj
), XMLs are built in xml directory, but the XMLs do not reflect the underlying data. This can occur if there is more than one instance of gcamdata built on a system. Evidently the system can sometime read data from a previously installed location instead of the location in which the system is run.
To solve this problem use devtools::load_all()
, rebuild and document (under the more menu in upper left). Running driver()
should now result in XMLs that reflect the data in the same directory path as gcamdata.Rproj
.