Feedback on the new parameter_value
schema
#3045
Replies: 10 comments 3 replies
-
Duration I believe ambiguous durations are used only by SpineOpt. Duration is used to define e.g. for how long the model should run. It can also be used to define the length of time steps. For example, in an investment model, the duration of the investment time step can be one year (annual investments). Duration works as long as there is always a real timestamp to which the duration relates to. I think that's the only way SpineOpt uses Durations (and hence SpineOpt can deal with it). Other models tend to use less rigid time system. This is also the case in ines-spec where timestamps (time series using datetime) is used for historical data. This makes sense because one can then correctly consider leap years and such things actually present in real data. But when modelling future years using those historical profiles, it would be problematic to use datetimes of those future years, since the leap year (or weekday) will not often match (with the profile data from history). So, when dealing with future data, ines-spec uses 'periods' where each period has a start_time (using datetime) and 'years_presented', which is basically the same as Duration in years. I'm currently using float for that, since, as you say, Duration is ambiguous. The model that gets the data from ines-spec has to deal with time stamps that are from the past and model years that are from the future (but that's what the data really is about). I think the best solution on the model side is to drop formal datetime, replace the year, and use these forged datetime strings as indexes. Your results can then show a leap year in 2029, but it doesn't really affect the quality of the results. But that shouldn't be the problem of Toolbox (it's model internals) - other than letting users use string indexed data efficiently (as I believe will happen with this new schema). Also, sometimes we model multiple years at once, so the seams between years need to work without jumps (i.e. cutting leap years to 8760 is not a good solution). All that said, Duration can be quite handy. E.g. when you want to test a model and run only two weeks, it's nice to use 2w as model Duration. Using time steps would be more involved (especially when going over month lines - that is actually quite error prone). So, having Duration that would not accept months, but would know how to deal with a year in the complex way I outlined above would be ideal in my view. I don't know if there is some data that is given at monthly basis in SpineOpt - if there is, then it's a bit of problem, but I think your solution to use integer months would be suitable in those cases (as long as it's about specific historical months). Just prevent using month in Duration. Ps. I used 'historical' and 'past' to distinguish from future. However, sometimes time series data can be generated and as such have future datetimes. Doesn't change anything though. |
Beta Was this translation helpful? Give feedback.
-
Pydantic Claude tells me that Pydantic is light-weight and quite stable (in version 2), so I don't see it as a problem to have it as a dependency. I suppose lots of applications use it. |
Beta Was this translation helpful? Give feedback.
-
Map The analysis time is the timestep from which the stochastic data originates (when the data about the future was created). Time is the actual timestep for which the data is given (in a particular branch/scenario. In modelling practice the analysis time and the first time step are the same, but in principle one could have analysis_time earlier than the first time. Data is usually created earlier, but model structures tend to ignore any lags there may be (they don't affect the workings of the model). From the example, I don't know why Scenario 2 does not have 'time' stamp, just analysis_time. It doesn't really make sense. Deterministic data would not have analysis_time (nor scenario), but it would have time. |
Beta Was this translation helpful? Give feedback.
-
As Manuel is on leave, maybe @DillonJ can provide examples of time pattern. |
Beta Was this translation helpful? Give feedback.
-
It's all documented toolbox side I believe |
Beta Was this translation helpful? Give feedback.
-
https://spine-toolbox.readthedocs.io/en/latest/parameter_value_editor.html#time-patterns |
Beta Was this translation helpful? Give feedback.
-
I feel we should assume all the functionality is used. We use it and we should assume others use it too. It's incredibly useful for defining data that varies by time of day... e,g, reserve requirements. |
Beta Was this translation helpful? Give feedback.
-
Regarding Maps - they are used for other things in SpineOpt other than for stochastic data:
|
Beta Was this translation helpful? Give feedback.
-
DurationsRelated issue spine-tools/Spine-Database-API#321 and to some extend, spine-tools/Spine-Database-API#319 I have no strong opinion on whatever we choose to do with years and months as I think it is better that the model developers who actually need to deal with them have a say here. The only place where we do any calculations with durations in
Yes.
I would not mind, if the model developers feel it could be useful. Time Patterns
Not sure if I understand the question. Time-pattern is a "time series" of sorts on its own, is that what you mean? MapI do not think we want to specialize our data structures too heavily for model-specific cases. PydanticWe could maybe utilize Pydantic in Spine DB Server some day. Also, it is possible to paste JSON directly to DB editor, which could be validated with Pydantic. But why add it as a dependency before we actually use it? Implementation choicesI approve the index name ambiguity guards [official approval stamp here]. |
Beta Was this translation helpful? Give feedback.
-
@jkiviluo @soininen I'll consolidate my Duration related response here for brevity. DurationThe reasons for our questions about
So far we have been using Based on your comments above, I understand that Spine users indeed use the
IMO, for both uses, the model knows best how to do the calculation. If we choose rel_delta = ... # my relativedelta array in pyarrow
# we have to implement the following, and document how to use with examples
durations = rel_delta.to_durations(anchor_time: datetime = some_datetime_index[-1])
new_datetimes = some_datetime_index + durations The model will have to do something equivalent to the above because only the model knows what is the appropriate value of It seems to me this would be the preferred solution. Footnotes
|
Beta Was this translation helpful? Give feedback.
-
The new schema flattens all data to a tabular form: a list of index/value arrays. Briefly, it can be expressed as the following:
For the full description, see the Python models, and the corresponding JSON schema1. It also includes couple of compressed encodings like run-end and dictionary encoding for arrays with repeated values. Missing values are natively supported. It supports all the original underlying types currently supported by Spine, with a few restrictions discussed below.
Questions
Duration
"Year" and "month" are ambiguous and not part of numpy/pandas/etc., "weeks" is the longest regular interval; e.g. month can mean anything between 28-31 days. We would like to avoid this ambiguity in the new format.
Time Patterns
Map
Consider this stochastic data from Antti (in the current map format):
Pydantic
So far we have avoided
pydantic
as a dependency. We plan to keep it in a separate repository for schema generation. This repo could also include tools for debugging user issues. If we need runtime validation, we needpydantic
as a dependency withinspinedb_api
. What are your thoughts on this?Implementation Choices
Old format uses the default name 'x' when the user doesn't provide anything. This can be ambiguous for the flattened datastructure. This is why we chose the following:
If users supply the same column name twice, raise an error message:
"Columns 5 and 3 conflict, having the same name '{name}'"
If column names are missing, we assign one based on the following template:
"col_{sequence}"
(nesting depth in the old format)@soininen @jkiviluo @manuelma I (and @OleMussmann) are posting this so that you can think ahead a bit before next week's Toolbox dev meeting, and anyone else can also share their feedback.
Footnotes
https://json-schema.org/learn/getting-started-step-by-step ↩
Beta Was this translation helpful? Give feedback.
All reactions