Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Python random split sum to 1 error #866

Closed
gramhagen opened this issue Jul 15, 2019 · 2 comments
Closed

[BUG] Python random split sum to 1 error #866

gramhagen opened this issue Jul 15, 2019 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@gramhagen
Copy link
Collaborator

Description

Python random split with multiple ratio options fails even when ratios sum to 1

In which platform does it happen?

Python all OSs

How do we replicate the issue?

from reco_utils.dataset.movielens import load_pandas_df
from reco_utils.dataset.python_splitters import python_random_split
df = load_pandas_df()
x, y, z = python_random_split(df, ratio=[.7, .2, .1])

yields

Traceback (most recent call last):
  File "C:\Users\scgraham\AppData\Local\Continuum\anaconda3\envs\reco_base\lib\site-packages\IPython\core\interactiveshell.py", line 3325, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-49ca4e7e6410>", line 1, in <module>
    x, y, z = python_random_split(df, ratio=[.7, .2, .1])
  File "C:\Users\scgraham\repos\Recommenders\reco_utils\dataset\python_splitters.py", line 38, in python_random_split
    splits = split_pandas_data_with_ratios(data, ratio, shuffle=True, seed=seed)
  File "C:\Users\scgraham\repos\Recommenders\reco_utils\dataset\split_utils.py", line 155, in split_pandas_data_with_ratios
    raise ValueError("The ratios have to sum to 1")
ValueError: The ratios have to sum to 1

this occurs because sum([.7, .2, .1]) == .999999.... and there is a check in split_utils.py:39 which fails and renormalizes, subsequently split_utils:154 has the same check which also fails which leads to this error state

Expected behavior (i.e. solution)

data is split into train, test, and validate sets with expected fraction.

Other Comments

@gramhagen gramhagen added the bug Something isn't working label Jul 15, 2019
@yueguoguo
Copy link
Collaborator

Good catch. Will fix it.

@yueguoguo yueguoguo self-assigned this Jul 15, 2019
@yueguoguo yueguoguo mentioned this issue Jul 23, 2019
3 tasks
@yueguoguo
Copy link
Collaborator

Resolved in #874 and validated in #876

@gramhagen gramhagen mentioned this issue Jul 30, 2019
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants