Skip to content

Common Fails and How to Fix Them

enlochner edited this page Sep 1, 2021 · 20 revisions

Not a "Fail" but a problem that keeps you from even getting to the point of being able to fail:

Covered in PR https://github.com/JGCRI/gcamdata/pull/390. The line mutate(value = value / first(value, order_by = year)) worked fine on BBL's machine (Mac). However, despite passing local Test Package and Check Package, once BBL opened the PR, the Travis tests failed via the driver aborting when trying to run the chunk (Mac and Linux). ACS checked out branch, had same issue (PC). Determined it was the order_by = year argument in first(). Fixed by removing order_by = year and adding a line arrange(year) before the mutate in question. Driver runs on ACS machine, Test and Check pass fine. Travis tests pass.

Failed -------------------

GCAM_DATA_MAP precursors doesn't match. Rerun generate_package_data to update.

The gcamdata package needs to know about the relationship of data objects and chunks to each other, so that it can provide tracing and graphing services to the user. But this information is only available after the entire data system is run, which is expensive and slow. As a workaround, the package keeps an internal data object (it's just a data frame) named GCAM_DATA_MAP that records all this information. If you make changes to the data system--to precursors, data flow, output names, etc.--then what the data system actually produces gets out of sync with GCAM_DATA_MAP. In this case, Travis will throw the (relatively informative) error above. All you need to do is source("data/generate_package_data.R) and commit the new GCAM_DATA_MAP object.

Conflicting files: "R/sysdata.rda". Rerun generate_package_data to update.

Failure: matches old data system output (@test_oldnew.R#XX) ----------------- length(oldf) == 1 isn't true. Either zero, or multiple, comparison datasets found for XXXX.csv

Check to make sure you added the file from the old data system into the comparison data folder (tests/testthat/comparison_data/).

Unable to Debug after failing tests

Check that the @export tag is in the file header. If not, add then Document, Build and Reload, etc. Or, prefix your chunk name with gcamdata:::, e.g. debug(gcamdata:::my_chunk_name).

FLAG_LONG_YEAR_FORM specified in XXX but no 'year' and 'value' columns present

If you add FLAG_LONG_YEAR_FORM the test code attempts to reshape your data to a 'wide' format, with years in individual columns. To do so there need to be columns named year and value.

Timeshift Error in left_join_error_no_match(...): left_join_no_match: NA values in new data columns

When HISTORICAL_YEARS and/or FUTURE_YEARS are shifted, the tables being joined no longer have matching years. One of the tables is likely unresponsive to the shift or doesn't contain data for the shifted years. You may need to interpolate the data in this table to all years (approx_fun will be of use).

Warning: parsing failures when reading in a file

WARNING: This Fail is no longer relevant since #1073 where we explicitly specify column types in the CSV file. See Input Files for details on how to do this.

This is a failure of readr:read_csv. There's some column(s) that looks like an integer for the first 1000+ rows, but then has decimal places. Whatever that column is, change the first row's value to add a ".0". (Hint: you can highlight multiple cells in Excel and click "increase decimal," which is handy if there are many of them.) This will force read_csv to see it as a float, not an integer (it's scanning only the first 1000 lines to determine column types). Or re-sort the file in some way so that a decimal value is that column is within the first 1000 rows. More info if you're interested: https://github.com/tidyverse/readr/issues/645.

Error: get_data in debug mode after running the driver

Error in get_data(all_data, "common/iso_GCAM_regID") : Data system: couldn't find common/iso_GCAM_regID This error was thrown due to calling stop_after_mychunk. If you want to debug your chunk, you need to stop_before_yourchunk, so that you ONLY have the chunk inputs. stop_after_yourchunk assumes your chunk runs and produces outputs.

Error in add_title(., ...) : Not allowed to overwrite current title ...

You may get this error message in a chunk where you generated a chunk output from the Level 2 output of some other chunk, such as:

L270.CreditInput_elec %>%
  select(-coefficient) ->
  # Since none of the operations made any modifications to the actual data
  # our new output retains a reference to the original, including it's meta
  # data such as title!
  L2233.DeleteCreditInput_elec

To fix this, add strip_attributes = TRUE when loading required inputs with get_data():

L270.CreditInput_elec <- get_data(all_data, "L270.CreditInput_elec", strip_attributes = TRUE)

Another work around is to force it to make a copy. The simplest way is to include a call to mutate in the pipeline, even if it doesn't actually modify anything:

L270.CreditInput_elec %>%
  # a mutate will force a copy and drop meta data from L270.CreditInput_elec
  # which we do not care to bring along
  mutate(sector.name = sector.name) %>%
  select(-coefficient) ->
  L2233.DeleteCreditInput_elec

Timeshift failures

This is most often because (i) a particular dataset is created that doesn't, under timeshift conditions, have all the necessary years, and then later (ii) something tried to left_join_error_no_match it. Boom! Fix it at the location of (i), i.e. upstream, by making sure all the necessary years are there (extrapolating if necessary). Here's an example from module_energy_L232.industry, which was causing a timeshift failure in module_water_L232.water.demand.manufacturing downstream:

      # extrapolate to all model years if necessary
      complete(nesting(region, GCAM_region_ID), year = MODEL_YEARS) %>%
      arrange(region, GCAM_region_ID, year) %>%
      group_by(region, GCAM_region_ID) %>%
      mutate(calOutputValue = approx_fun(year, calOutputValue, rule = 2)) %>%
      ungroup %>%

To run under timeshift conditions, for debugging and to examine intermediate products, uncomment the last lines in constants.R.

Can not update package data due to circular failures

When modifying gcamdata in a way that requires updating / adding a new "Header" or potentially merging two branches which have each done this you may run into a "Circular Fail". Where you need to update the package data to reflect the updated LEVEL2_DATA_NAMES to be able to run the driver() without error however you can not source("data-raw/generate_package_data.R) because in there are calls to the driver() to update GCAM_DATA_MAP and PREBUILT_DATA. In principal we should be able to decouple updating each of these data thus eliminating the circular logic. However in the meantime the following steps should successfully work around the problem:

  1. Comment out the driver() calls to update GCAM_DATA_MAP and PREBUILT_DATA:
GCAM_DATA_MAP <- driver(return_data_map_only = TRUE)

PREBUILT_DATA <- driver(write_outputs = FALSE,
                        write_xml = FALSE,
                        return_data_names = c(
                          # outputs of module_energy_LA101.en_bal_IEA
                          "L101.en_bal_EJ_R_Si_Fi_Yh_full",
                          "L101.en_bal_EJ_ctry_Si_Fi_Yh_full",
                          "L101.in_EJ_ctry_trn_Fi_Yh",
                          "L101.in_EJ_ctry_bld_Fi_Yh",

                          # output of module_energy_LA111.rsrc_fos_Prod
                          "L111.RsrcCurves_EJ_R_Ffos",

                          # output of module_energy_LA118.hydro
                          "L118.out_EJ_R_elec_hydro_Yfut",

                          # outputs of module_energy_LA121.oil
                          "L121.in_EJ_R_unoil_F_Yh",
                          "L121.in_EJ_R_TPES_crude_Yh",
                          "L121.in_EJ_R_TPES_unoil_Yh"
                        ))
  1. Instead set them to the previously calculated values:
GCAM_DATA_MAP <- gcamdata:::GCAM_DATA_MAP
PREBUILT_DATA <- gcamdata:::PREBUILT_DATA
  1. Run source("data-raw/generate_package_data.R) which will then update LEVEL2_DATA_NAMES and leave the others as they were before.
  2. Reload / re-install gcamdata
  3. Remove the statements from Step 2 and un-comment the calls from Step 1 then source("data-raw/generate_package_data.R) to fully update all the package data.
  4. Don't forget to reload / re-install gcamdata once more to get this fully updated package data for when you try to run the driver() next.

Does not produce correct XMLs if multiple instances of gcamdata are on a system

We have encountered a situation where the library runs (from within RStudio after opening gcamdata.Rproj), XMLs are built in xml directory, but the XMLs do not reflect the underlying data. This can occur if there is more than one instance of gcamdata built on a system. Evidently the system can sometime read data from a previously installed location instead of the location in which the system is run.

To solve this problem use devtools::load_all(), rebuild and document (under the more menu in upper left). Running driver() should now result in XMLs that reflect the data in the same directory path as gcamdata.Rproj.

Clone this wiki locally