Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pl.all().list.count_matches results in a SchemaError #18968

Closed
2 tasks done
cmdlineluser opened this issue Sep 27, 2024 · 0 comments · Fixed by #19449
Closed
2 tasks done

pl.all().list.count_matches results in a SchemaError #18968

cmdlineluser opened this issue Sep 27, 2024 · 0 comments · Fixed by #19449
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@cmdlineluser
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({"a": [[1, 2]], "b": [[3, 4]]})

df.select(pl.all().list.count_matches(539))
# SchemaError: could not evaluate comparison between series 'a' of dtype: i64 and series '' of dtype: list[i64]

Log output

No response

Issue description

The plan:

df.lazy().select(pl.all().list.count_matches(539)).explain()
# ' SELECT [col("a").list.count_matches([col("b"), dyn int: 539])] FROM\n  DF ["a", "b"]; PROJECT 2/2 COLUMNS; SELECTION: None'

Is producing:

col(a).list.count_matches(col(b), lit(539))

Instead of:

col(a).list.count_matches(lit(539))
col(b).list.count_matches(lit(539))

Other examples

.list.concat (and the .list.set_* methods (#18795)) also seem affected causing incorrect results/dropped columns.

df.select(pl.all().list.concat(539))
# shape: (1, 1)
# ┌───────────────────┐
# │ a                 │ # expected: a [1, 2, 539], b [3, 4, 539]
# │ ---               │
# │ list[i64]         │
# ╞═══════════════════╡
# │ [1, 2, 3, 4, 539] │
# └───────────────────┘

Debugging

It seems all these cases end up here:

.contains(FunctionFlags::INPUT_WILDCARD_EXPANSION) =>
{
*input = rewrite_projections(core::mem::take(input), schema, &[], opt_flags).unwrap();

i.e. it only happens if the INPUT_WILDCARD_EXPANSION flag is set.

Expected behavior

The initial example should produce a count for each column.

# shape: (1, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ u32 ┆ u32 │
# ╞═════╪═════╡
# │ 0   ┆ 0   │
# └─────┴─────┘

Installed versions

--------Version info---------
Polars:               1.8.2
Index type:           UInt32
Platform:             macOS-13.6.1-arm64-arm-64bit
Python:               3.12.2 (main, Feb  6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              15.0.2
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@cmdlineluser cmdlineluser added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant