Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: DataBackend did not return the queried rows correctly #172

Open
bommert opened this issue Dec 20, 2024 · 3 comments · May be fixed by #173
Open

Error: DataBackend did not return the queried rows correctly #172

bommert opened this issue Dec 20, 2024 · 3 comments · May be fixed by #173

Comments

@bommert
Copy link
Contributor

bommert commented Dec 20, 2024

I am encountering the error

Error: DataBackend did not return the queried rows correctly: 781 requested, 593 received.
        The resampling was probably instantiated on a different task.
This happened PipeOp performance's $train()

when I run the following code:

task = tsk("ames_housing")

# remove columns with missing values (not of interest to problem)
mi = task$missings()
keep = setdiff(names(mi[mi == 0]), task$target_names)
task$select(keep)

# create graph learner: 
# impact encoding -> filter for feature selection -> linear regression model
learner = lrn("regr.lm")
filter = flt("performance", learner = learner, 
  resampling = rsmp("holdout", ratio = 2/3), measure = msr("regr.rmse"))
enc_po = po("encodeimpact", affect_columns = selector_type("factor"))
filt_po = po("filter", filter = filter, filter.nfeat = 1)
gl = as_learner(enc_po %>>% filt_po %>>% learner)

# reampling the graph learner results in the error
resample(task, gl, rsmp("cv", folds = 5))

There seems to be no problem when there are no factor variables in the dataset, e.g. when task = tsk("mtcars") is used as task in resample() in the code above.

@mb706
Copy link
Contributor

mb706 commented Dec 21, 2024

Thanks for the report! I believe this happens because the resampling is not cloned between different calls to the filter. It is an mlr3filters bug...

@mb706 mb706 transferred this issue from mlr-org/mlr3pipelines Dec 21, 2024
@mb706
Copy link
Contributor

mb706 commented Dec 21, 2024

Can you check if #173 solves it?

remotes::install_github("mlr-org/mlr3filters@filter_performance_clone_resampling")

@bommert
Copy link
Contributor Author

bommert commented Jan 6, 2025

Yes, it solves the problem. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants