You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I was going through the Quickstart and ran into series.split_before(0.75). As the AirPassengersDataset, total 144 lines(time steps) in the dataset. Given 75% percent we can get the index to split, which is 108. From my perspective, split_before index 108 means, the first half serie will have time steps from 0 to 107, which means there will be 108 time steps in the first half.
However, the result of series.split_before(0.75) splits the first half from index 0 to index 106 without the time step at 107, which is counter-intuitive. I've spotted the line of code which lead to this "issue". In timeseries.py, point_index = int((len(self) - 1) * point). The split index is calculated by the length - 1 times point(0.75 in the case above), which is int((144 -1 ) * 0.75)=107. Then the rest of the job is just split at index 107 without time step 107.
I'm not sure if its a determined bug, since the rest of Quickstart code split at index 108 other than 107. For example, just below the series.split_before(0.75), there is series1, series2 = series[:-36], series[-36:]. I guess the intention of the author is to split the dataset with 75%(108 elements) and 25%(36 elements). But with series.split_before(0.75), the dateset is split into 74%(107 elements) and 26%(37 elements).
Hope someone could address this, since the usage of percentage of series.split_before(0.75) is not completely align with the fraction.
To Reproduce
Simply run to the 3rd code block in the official Quickstart.
System (please complete the following information):
Python version: 3.9.18
darts version: u8darts-all 0.26.0
OS: Linux
The text was updated successfully, but these errors were encountered:
When the argument of split_before is a float, it correspond to the proportion of the series to be included in the first split (splitting point excluded, as described in the docstring), not the explicit relative position of the index used for the split. Since 75% of 144 is exactly 108, this splitting point is contained in the second part of the split, resulting of series of length 107 and 37 respectively.
I agree that the example is a bit counter-intuitive with respect to the next cell and I think that an easy way to clarify it would be to use split_after instead, to obtain identical splits in the two cells.
Hi,
I was going through the Quickstart and ran into
series.split_before(0.75)
. As theAirPassengersDataset
, total 144 lines(time steps) in the dataset. Given 75% percent we can get the index to split, which is 108. From my perspective, split_before index 108 means, the first half serie will have time steps from 0 to 107, which means there will be 108 time steps in the first half.However, the result of
series.split_before(0.75)
splits the first half from index 0 to index 106 without the time step at 107, which is counter-intuitive. I've spotted the line of code which lead to this "issue". In timeseries.py,point_index = int((len(self) - 1) * point)
. The split index is calculated by thelength - 1
times point(0.75 in the case above), which isint((144 -1 ) * 0.75)=107
. Then the rest of the job is just split at index 107 without time step 107.I'm not sure if its a determined bug, since the rest of Quickstart code split at index 108 other than 107. For example, just below the
series.split_before(0.75)
, there isseries1, series2 = series[:-36], series[-36:]
. I guess the intention of the author is to split the dataset with 75%(108 elements) and 25%(36 elements). But withseries.split_before(0.75)
, the dateset is split into 74%(107 elements) and 26%(37 elements).Hope someone could address this, since the usage of percentage of
series.split_before(0.75)
is not completely align with the fraction.To Reproduce
Simply run to the 3rd code block in the official Quickstart.
System (please complete the following information):
The text was updated successfully, but these errors were encountered: