-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pl.duration(days=pl.all())
also passes values to hours=
#19007
Comments
Looks like all the args get consumed: df = pl.DataFrame(
{
"a": [1, 2],
"b": [3, 4],
"c": [5, 6],
"d": [7, 8],
"e": [9, 10],
"f": [11, 12],
"g": [13, 14],
}
)
df.select(pl.duration(days=pl.all()))
# shape: (2, 1)
# ┌─────────────────────┐
# │ literal │
# │ --- │
# │ duration[μs] │
# ╞═════════════════════╡
# │ 1d 3h 5m 7s 9011µs │
# │ 2d 4h 6m 8s 10012µs │
# └─────────────────────┘ |
@mcrumiller Yeah, it looks like it is the same thing as the issue with some of the This should output 7 columns, right? df = pl.DataFrame(
{
"a": [[1, 2]],
"b": [[3, 4]],
"c": [[5, 6]],
"d": [[7, 8]],
"e": [[9, 10]],
"f": [[11, 12]],
"g": [[13, 14]],
}
)
df.select(pl.all().list.concat(539))
# shape: (1, 1)
# ┌──────────────────────────────────────────────────────┐
# │ a │
# │ --- │
# │ list[i64] │
# ╞══════════════════════════════════════════════════════╡
# │ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 539] │
# └──────────────────────────────────────────────────────┘ It seems to be related to the |
Not sure if I'm on the right track here - but this also seems to be the cause of the linked struct issue: df = pl.DataFrame({"X": {"a": 1}, "Y": {"a": 3}})
df.select(pl.all().struct.with_fields())
# shape: (1, 1)
# ┌───────────┐
# │ X │
# │ --- │
# │ struct[2] │
# ╞═══════════╡
# │ {1,{3}} │
# └───────────┘ From what I can tell, it looks like if
It's a bit tricky trying to debug what's happening. |
I was looking into this and other issues with pl.all() (#18968). What is happening is that for these functions polars/crates/polars-plan/src/plans/options.rs Lines 145 to 155 in ee9bafb
For these functions I think the second expansion is correct i.e this flag should be disabled.
Below is my understanding .
and
The first expansion is conditional on the flag being true. I tried removing this flag and it seems to be working correctly. All tests are also passing.
If this sounds sensible I can raise a PR for this. |
I am not sure what is the proper logic. df = pl.DataFrame([[{"a": 1, "b": 2}], [{"c": 3, "d": None}]])
result = df.select(pl.concat_list(pl.all()).alias("as_list"))
assert result.to_dict(as_series=False) == {
"as_list": [
[
{"a": 1, "b": 2, "c": None, "d": None},
{"a": None, "b": None, "c": 3, "d": None},
]
]
} So I think it is only clear in the scope of a namespace - |
Yes, the input arguments are columns that all get used in the same operation.
we have different types of arguments here, which causes a problem. The issue is that using |
I get the "argument overflow" thing, but I still don't get - When you see
|
We can make a distinction on how
I think this is how the functions have been designed till now, based on the docs for these functions. If required, we can make the docs more clear there so that users know how the wildcard expansion will behave. |
In general, the first one, but the Polars API often allows for (and even prefers) the second when we're dealing with variadic arguments. For example, Most polars functions that accept variadic columns do something like the following (but not always): def foo(a, *extra_args):
if isinstance(a, list):
args = a
if extra_args:
raise ValueError("Must supply single list or args individually")
else:
args = [a, *extra_args]
return sum(args)
my_list = [1, 2, 3]
foo(1, 2, 3) # 6
foo(my_list) # 6
foo(*my_list) # 6 This is fine, because def foo(a, b=None, c=None):
... then calling My guess is that the decision to include variadics is to allow for generators as inputs to functions, as in: my_func(x for x in my_list) where instead of simply iterating a list we might be doing something more complex. But this ends up with some of these side-effects. |
@mcrumiller @siddharth-vi thanks. In my opinion, generally, for consistency, we need make a clear distinction -
The issue about python keyword arguments should be fixed with the above in mind, in my opinion. @ritchie46 @orlp Do you have any clear statement about this subject? thanks. |
|
Checks
Reproducible example
Log output
No response
Issue description
I think this may actually be another form of:
pl.all().list.count_matches
results in aSchemaError
#18968It seems to be doing
Instead of
Expected behavior
I was expecting a
literal
duplicate error, but thought adding.name.keep()
would be the same as:Installed versions
The text was updated successfully, but these errors were encountered: