fix: support various reductions in pyspark #1870

FBruzzesi · 2025-01-26T17:59:15Z

What type of PR is this? (check all applicable)

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

narwhals/_spark_like/utils.py

FBruzzesi · 2025-01-26T20:58:01Z

narwhals/_spark_like/dataframe.py

+            ]
+            return self._from_native_frame(self._native_frame.select(*new_columns_list))
+
+    def with_columns(


I just moved this closer to select method - it was easier to debug them while in the same screen 🙈

FBruzzesi · 2025-01-26T20:58:22Z

narwhals/_spark_like/expr.py

-            returns_scalar=self._returns_scalar or returns_scalar,
+            returns_scalar=returns_scalar,


This gave me so much headache before spotting it

MarcoGorelli · 2025-01-26T21:43:21Z

narwhals/_spark_like/dataframe.py

+            new_columns_list = [
+                col.over(Window.partitionBy(F.lit(1))).alias(col_name)
+                if _returns_scalar
+                else col.alias(col_name)
+                for (col_name, col), _returns_scalar in zip(
+                    new_columns.items(), returns_scalar
+                )
+            ]


i think this is may be too late to set over

for example, in nw.col('a') - nw.col('a').mean() - I think it's in the binary operation __sub__ that nw.col('a').mean() needs to become nw.col('a').mean().over(lit(1))

as in, we want to translate nw.col('a') - nw.col('a').mean() to F.col('a') - F.col('a').mean().over(F.lit(1)). the code, however, as far as I can tell, translates it to (F.col('a') - F.col('a').mean()).over(F.lit(1))

That happens in maybe_evaluate, and you can see that now the reduction_test are passing.
I know it's not ideal to have the logic for setting over in two places, but I couldn't figure out a unique place in which to handle this as maybe_evaluate is called only to evaluate other arguments

ah I see! yes this might be fine then, thanks!

MarcoGorelli

The group by simplification is wonderful 😻 thanks so much for doing this!

MarcoGorelli · 2025-01-26T23:13:55Z

narwhals/_spark_like/utils.py

        native_results: dict[str, list[Column]] = {}
+
+        # `returns_scalar` keeps track if an expression returns a scalar and is not lit.
+        # Notice that lit is quite special case, since it gets broadcasted by pyspark


Out of interest, do we run into any issues if we use over anyway with lit? No objections to special casing it, just curious

We end up with tests/expr_and_series/lit_test.py failing 3 tests due to:

pyspark.errors.exceptions.captured.AnalysisException: [UNSUPPORTED_EXPR_FOR_WINDOW] Expression "1" not supported within a window function.;

isn't this still going to break for, say

df.with_columns(nw.lit(2)+1)

?

Yes correct

Added a test case and now it works!

narwhals/_spark_like/utils.py

MarcoGorelli

nice, well done @FBruzzesi !

just one comment/question

this seems correct! or at least, I couldn't think of a way to break it

MarcoGorelli · 2025-01-27T09:51:53Z

cool, release? 🚀

FBruzzesi added 3 commits January 26, 2025 18:47

test and test env

c250928

working solution

a2c3679

Merge branch 'main' into feat/support-pyspark-reductions

6e07d05

FBruzzesi added the fix label Jan 26, 2025

FBruzzesi commented Jan 26, 2025

View reviewed changes

narwhals/_spark_like/utils.py Outdated Show resolved Hide resolved

FBruzzesi added the pyspark Issue is related to pyspark backend label Jan 26, 2025

MarcoGorelli reviewed Jan 26, 2025

View reviewed changes

narwhals/_spark_like/utils.py Outdated Show resolved Hide resolved

FBruzzesi added 2 commits January 26, 2025 21:37

move logic into select and with_columns

e27e38b

also the great group by refactor

1c91326

FBruzzesi commented Jan 26, 2025

View reviewed changes

FBruzzesi marked this pull request as ready for review January 26, 2025 21:01

MarcoGorelli reviewed Jan 26, 2025

View reviewed changes

MarcoGorelli mentioned this pull request Jan 27, 2025

fix: address & / | operator errors for PySpark / chore: use F.lit in maybe_evaluate for pyspark, like we do for duckdb #1872

Merged

10 tasks

FBruzzesi added 3 commits January 27, 2025 10:12

additional test

a1c8a2b

allow nw.lit(x) + y

3caf2b0

one duckdb xfail

598df7d

MarcoGorelli reviewed Jan 27, 2025

View reviewed changes

narwhals/_spark_like/utils.py Outdated Show resolved Hide resolved

MarcoGorelli approved these changes Jan 27, 2025

View reviewed changes

split func name in maybe_evaluate as well

a93c0fa

MarcoGorelli merged commit 702eea5 into main Jan 27, 2025
23 checks passed

MarcoGorelli deleted the feat/support-pyspark-reductions branch January 27, 2025 09:51

MarcoGorelli mentioned this pull request Jan 27, 2025

feat: support more scalar operations for duckdb, Increase width for ipython #1877

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support various reductions in pyspark #1870

fix: support various reductions in pyspark #1870

FBruzzesi commented Jan 26, 2025

FBruzzesi Jan 26, 2025

FBruzzesi Jan 26, 2025

MarcoGorelli Jan 26, 2025

FBruzzesi Jan 26, 2025 •

edited

Loading

MarcoGorelli Jan 26, 2025

MarcoGorelli left a comment

MarcoGorelli Jan 26, 2025

FBruzzesi Jan 27, 2025

MarcoGorelli Jan 27, 2025

FBruzzesi Jan 27, 2025

FBruzzesi Jan 27, 2025

MarcoGorelli left a comment

MarcoGorelli commented Jan 27, 2025

		returns_scalar=self._returns_scalar or returns_scalar,
		returns_scalar=returns_scalar,

fix: support various reductions in pyspark #1870

fix: support various reductions in pyspark #1870

Conversation

FBruzzesi commented Jan 26, 2025

What type of PR is this? (check all applicable)

Checklist

If you have comments or can explain your changes, please do so below

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FBruzzesi Jan 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli commented Jan 27, 2025

FBruzzesi Jan 26, 2025 •

edited

Loading