You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Python random split with multiple ratio options fails even when ratios sum to 1
In which platform does it happen?
Python all OSs
How do we replicate the issue?
from reco_utils.dataset.movielens import load_pandas_df
from reco_utils.dataset.python_splitters import python_random_split
df = load_pandas_df()
x, y, z = python_random_split(df, ratio=[.7, .2, .1])
yields
Traceback (most recent call last):
File "C:\Users\scgraham\AppData\Local\Continuum\anaconda3\envs\reco_base\lib\site-packages\IPython\core\interactiveshell.py", line 3325, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-49ca4e7e6410>", line 1, in <module>
x, y, z = python_random_split(df, ratio=[.7, .2, .1])
File "C:\Users\scgraham\repos\Recommenders\reco_utils\dataset\python_splitters.py", line 38, in python_random_split
splits = split_pandas_data_with_ratios(data, ratio, shuffle=True, seed=seed)
File "C:\Users\scgraham\repos\Recommenders\reco_utils\dataset\split_utils.py", line 155, in split_pandas_data_with_ratios
raise ValueError("The ratios have to sum to 1")
ValueError: The ratios have to sum to 1
this occurs because sum([.7, .2, .1]) == .999999.... and there is a check in split_utils.py:39 which fails and renormalizes, subsequently split_utils:154 has the same check which also fails which leads to this error state
Expected behavior (i.e. solution)
data is split into train, test, and validate sets with expected fraction.
Other Comments
The text was updated successfully, but these errors were encountered:
Description
Python random split with multiple ratio options fails even when ratios sum to 1
In which platform does it happen?
Python all OSs
How do we replicate the issue?
yields
this occurs because sum([.7, .2, .1]) == .999999.... and there is a check in split_utils.py:39 which fails and renormalizes, subsequently split_utils:154 has the same check which also fails which leads to this error state
Expected behavior (i.e. solution)
data is split into train, test, and validate sets with expected fraction.
Other Comments
The text was updated successfully, but these errors were encountered: