-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch 2.0.5 (csv import/export) #119
Conversation
…bug fixes to rhosetter.py.
…ic examples that use the PySP module of the Pyomo package. -Output files now obey the current format printed by the PySP module. -The model-directory option was changed to model-location, given that the first is deprecated. -Updated console output files to reflect changes. -Replaced references to old names of various variables to the ones currently in use in Swtich.
…sponding outputs folder).
There has been an extra 1 value in the Hydro_RoR row of upgrade_dat/hydro_simple/inputs/proj_build_costs.tab for a long time. In addition, the Hydro and Hydro_RoR rows were delimited with spaces (often multiple) instead of tab characters. These problems also occurred in the original examples/hydro_simple directory, which has been repeatedly upgraded, creating more malformed files along the way. Spaces were also used in some of the other example files, creating similar problems. These were not caught previously because the following sequence happened: Pyomo was happy to load the original version of these files (2.0.0b0), because it allows space or tab delimiters. It also ignores extra fields on each row, so it just used the first four fields (including the extra 1) and ignored the fifth field. So this example ran with the wrong values for Hydro_RoR in the third and fourth columns until now. The upgrade scripts for later versions read these files using pandas.read_csv with separator set to '\t'. For all the rows delimited with spaces, this meant that the upgrade script read the whole row into the first field and filled the other fields with nulls. Then when the script wrote the file back, it preserved the spaces between the values that had been pulled into the first field, and added on extra '.' values for the no-value fields. Subsequent upgrade scripts ran fine, because they still read all the space-delimited stuff into the first field, and read the '.' values into other fields. Switch was also happy to run with these files at each step, because Pyomo used the spaces as field delimiters and ignored the extra '.' values at the end of the rows. However, when we change to .csv inputs, this no longer works. The upgrade script could read these files in with '\t' delimiters, but then most of the values end up in the first column, and they can't be read correctly by Pyomo when it reads these files expecting ',' delimiters. The solution is as follows: (1) update the upgrade scripts to use r'\s+' (any whitespace) as delimiters when reading .tab files (pre 2.0.5), so that they will then write out correctly formatted, \t-delimited .tab files without extra columns. This fixes the upgrade tests. (2) remove the extra "1" value from the Hydro_RoR rows and update the total_cost.txt to reflect this (in examples and tests).
…i models (with warning)
This commit updates Switch to use two-row .csv files instead of .dat files for non-indexed parameters. The first row holds a list of parameter names and the second row holds their values. The upgrade script for 2.0.5 also converts .dat files into .csv files. This means non-indexed inputs can be viewed and edited with the same tools as all other inputs, and also means users don't need to learn the ampl/pyomo .dat data description language. The following files have been converted to .csv: financials.dat, trans_params.dat, spillage_penalty.dat, spinning_reserve_params.dat, lost_load_cost.dat, hydrogen.dat However, Switch still uses gen_multiple_fuels.dat, which defines indexed sets (future work will convert this to .csv too).
I just finished testing & pushing the implementation. Although this discussion isn't fully resolved, its' thankfully a minor edit for modules to call load_key_value_inputfile directly, vs load_aug(..., singleton=True) calls that call load_key_value_inputfile. My point was that we're not using, nor plan to use most of those behaviors for key/value files. If we were using those things, I'd agree with your general sentiment to avoid duplicating code. For .dat files load_aug ignores all parameters except optional: (i.e. auto_select/params arguments, autodetection of optional params, soon --input-aliases). The index column auto-selection is also irrelevant for this context. Given that 90% of those behaviors are irrelevant for this context, and the goal of load_aug is to provide a convenient wrapper for underlying calls to load, it seems a less-than ideal entry point. I agree that the interface to load_aug could use streamlining & updating. I like your idea for params to appear exactly once either in mandatory_params or optional_params, and to accept the same style of arguments for both. auto_select defaulting to true sounds nice in theory, but I'd want to double-check the codebase to look for any exceptions. I don't know if anyone else is using load_aug with custom modules not in our codebase. So, when we do those API changes to load_aug, I suggest implementing them in a new function with a better name, leaving load_aug the way it is, and having it issue a deprecation warning saying we'll be dropping load_aug in release 2.1. Just a reminder, semantic versioning recommends bumping the minor version number when we add new features and bump the patch version number for bugfixes. Given what's going into this release, I'm on the fence about calling it 2.0.5 or 2.1. Re: more automation of documentation & input file mapping |
I started replying to your earlier comment on commit e1f7999, but then had to go into a meeting. I think we want singleton parameter files to support It is only by accident that we didn't support those capabilities with .dat files, and I don't think there's any argument for why we should support them for most .csv files but not the singleton .csv files. I've certainly seen a mix of optional and mandatory arguments implemented in .dat files, and it's just as plausible for users to want to apply aliases to these files as any others. Really the singleton .csv files are just data like the other .csvs, and should be treated the same way as others. We're just requiring them to be rotated and adding code to support that to soothe your concern about viewing wide vs tall csvs. It makes the code, documentation and learning curve a little messier, but I'm ok with that because we have to tell people something about singleton parameters anyway. So it might as well be "put singleton parameters in a vertical csv with headers of 'name' and 'value'" instead of "put singleton parameters in a csv with parameter names on the first row and values on the second row." But I don't want to go further and say, "by the way, you also have to use this different function to load them, and it doesn't accept the same arguments or offer the same options as the normal load_aug function." |
This is increasingly a departure from the original I was personally happy with I see one-line-per-argument as a convenience to both end-users and developers who have to read and debug input files. For developers who are writing new modules, I think it would be clearest to either provide a transparent wrapper around the standard DataPortal.load() function (like your initial implementation), or write a separate function to do simple parsing (like my current implementation). Mashing up the concepts adds confusion and is unappealing. If you feel really strongly about this, then we can just revert my contributions and go with Pyomo's crappy format for stuffing simple parameters in wide and shallow csv files. I've seen other pyomo models that abused this with dozens of parameters in an extremely awkward file and I wished to avoid that slippery slope into lower usability. So far, the only one of our modules that does that is your hydrogen model which I have yet to use, so it won't be a direct encumbrance to me or users that I support. All of the core modules only have a handful of params, which aren't that bad in practice for wide format. FTR, Pyomo supports usable and scalable specifications of simple parameters in |
I don't mind departing from .dat files, because my goal here is to provide more simplicity, rather than more power. I want it to be as easy as possible to learn (and then predict) how to create and read input files. We are currently not there -- I've been using this for 4 years and I still have to copy and modify an existing load_aug() call every time I want a new one, and I still get it wrong half the time. And then I have to refresh my memory of ampl's .dat language anytime I want to create a file for a singleton param (OK, I remember that much, but I still have to double-check by looking at an existing file), then go and figure out what function to call to read those. So I am going for several principles here:
Those principles point to always using They also lead pretty directly to the wide csvs for singletons, but it's not a huge detour to use vertical csvs for singletons instead (or maybe even allow both, heaven help us). But using |
I'm hearing you've had a lot of chronic frustration with input files and anything to improve that process would have significant value to you. I'm a little surprised that you are bothered by copying+pasting+editing prior examples; to me that's an invaluable technique for speeding up development as well as reducing cognitive load, repetitive stress from typing, and the rates of typo-based errors. It sounds like reverting to the wide and flat tables is probably the least bad solution given how much this bothers you and your perspective on API calls. The more I think about overloading load_aug with a custom parser for key/value csv files, the less I like it. The goal of that was to provide a thin wrapper around I appreciate your explanations of the principles you are working with. I have a different take on those, but it's helpful to understand your reasoning. For example, I appreciated that I didn't have to explicitly specify the contents of a key/value file when I wanted to load data from it, and specifying those (as in the wide format) is a little cumbersome. To me, key/value inputs are different than indexed tabular data; while mashing them together into a single format & Anyway, the least bad compromise solution seems to be wide csv format that DataPortal can directly parse. And moving forward, we'll be doing an overhaul of our DataPortal wrapper ( |
Yes, that sounds workable. I can't say the copy-paste caused me a whole lot of trouble, but having to do it was a sign that something wasn't right. Almost everything in these calls is invariant, so it should be as easy as I currently create all the singleton input files from dictionaries with values assigned in my get_scenario_data.py scripts, so it doesn't make a difference to me whether they are vertical or wide. Eventually, certain of these (e.g., the hydrogen parameters) might come from a database, where I might also have a wide table to hold them. Then it's an easy export to a wide .csv. Or I suppose people might want to create a tall table in the database, so I guess that's the same debate. I can sort of see the argument for yaml or some other data format for key-value pairs, but then we're effectively back to the .dat files again. And I find it easier to say "if there's no index, just make a csv with one header row, one data row and no index column" rather than "if there's no index, put the key-value pairs in a .dat/.yaml/tall-csv file". I think I'm going to release commit 6c3feb as 2.0.5 (after all my pre-release tests), so people can begin installing it today for the tutorial. I'll leave the tall-csv commits as a separate feature branch. And at some point we can rebase the I know semantic versioning says you should update minor version numbers for feature releases, but pretty much everything we release is a feature upgrade. So we would quickly move up to 2.5, 2.23, 2.71, etc. That may be fine, but it would require us to talk about "Switch 2" instead of "Switch 2.0". This may just be a consequency of (hopefully) having a faster release process than Python (where we'll be talking about "3.7" for quite a while yet). On the other hand, I don't think anyone thinks about which version they're using of pandas, numpy or matplotlib -- they just want the latest. |
Sounds good. I just pushed those rearrangements to various git branches.
I haven't deleted Re: Semantic versioning I reviewed the commits since 2.0.0, and most of them were bug fixes, although there were a few new minor features thrown in. My 5 outstanding pull requests do qualify as new features. We could fudge things and say those features are too minor to bump the minor version, but I don't recommend doing that indefinitely. I just re-reviewed https://semver.org/ and the section about deprecating functionality. Given that we are changing the input file specifications and effectively deprecating .tab & .dat functionality, I think setting this version to 2.1 instead of 2.0.5 would be a much clearer signal about the impacts; at least to people who understand semantic versioning and use it as a shorthand. At the moment, I think our audience of professional developers is small, so not following semantic versioning guidelines in this case probably isn't that big of a deal in practice. |
PS. After the |
This pull request bumps some of the ideas from #115 down the road to 2.0.6, and fast-tracks conversion to .csv as Switch 2.0.5. This is being moved ahead quickly because I will be doing training with significant new users on Aug. 22, and it was really messy to explain which inputs and outputs should be .tab vs. .csv, .tsv or .txt. Now all inputs and outputs are .csv.
This also includes some of the code previously slated for 2.0.5 (up through 7/1/19). However, commits to the
next_version
branch after then will need to be merged later. This was done because some of those changes need further discussion before we can move ahead. Unfortunately, this leaves some uncontroversial changes out too. Maybe we can cherry pick some of those into this release?