chore: move scalar_funcs into spark-expr #712

Blizzara · 2024-07-24T10:51:39Z

Which issue does this PR close?

Part of #659

Rationale for this change

Moves scalar_funcs defined for Comet into spark-expr crate to facilitate use in other projects.

What changes are included in this PR?

How are these changes tested?

Existing CI

Blizzara · 2024-07-24T12:01:03Z

native/core/src/execution/datafusion/expressions/comet_scalar_funcs.rs

@@ -0,0 +1,186 @@
+// Licensed to the Apache Software Foundation (ASF) under one


This file is an extract of what used to be scalar_funcs.rs. The create_comet_physical_expr isn't easily reusable for others so it seems reasonable to keep it here.

It could be nice to provide ready-made ScalarUDFs for these in spark-expr and a function to register all of them into the session context, like DF's default functions do. However the way these take in the output data_type makes that a tad challenging, so I didn't do it here.

This looks good. I wonder if having comet in the file name is a bit redundant though

It kinda is, but I had it there just to distinguish from spark-expr/src/scalar_funcs.rs and as what this file does is related to CometScalarUDFs 🤷

Blizzara · 2024-07-24T12:01:46Z

native/core/src/execution/datafusion/shuffle_writer.rs

@@ -1413,6 +1413,14 @@ impl RecordBatchStream for EmptyStream {
    }
 }

+fn pmod(hash: u32, n: usize) -> usize {


this was not used anywhere else so I moved it here

Blizzara · 2024-07-24T12:19:09Z

native/spark-expr/src/scalar_funcs/chr.rs

-}
-
-pub fn chr(args: &[ArrayRef]) -> Result<ArrayRef> {
+fn chr(args: &[ArrayRef]) -> Result<ArrayRef> {


these changes are not necessary so I can revert them if that's preferrable. However given we already have the ScalarUDFImpl here, seems like a waste to not use it (and also means the CometScalarUDF wraps a function that wraps a ScalarUDFImpl, maybe it has some nanoseconds of perf impact)

Blizzara · 2024-07-24T12:20:07Z

FYI @andygrove - thanks for doing the pre-work to split out the spark-exprs package!

…ublic

andygrove · 2024-07-24T21:14:27Z

native/spark-expr/src/spark_hash.rs


 #[inline]
-pub(crate) fn spark_compatible_murmur3_hash<T: AsRef<[u8]>>(data: T, seed: u32) -> u32 {
+pub fn spark_compatible_murmur3_hash<T: AsRef<[u8]>>(data: T, seed: u32) -> u32 {


For consistency with other spark functions, we should probably rename this to spark_murmur3_hash

done! 6780bb6

actually I reverted the rename since this is different form spark_murmur3_hash which we also have (that operates on the ColumnarValues, while this handles a single value).

Preferably this wouldn't be pub but it's used in the core crate for non-expression stuff (like shuffles) so I think it has to be.

andygrove

LGTM. I left a couple comments on naming, and it would be nice if all the public functions had rustdocs, but I would be fine with merging this without those changes.

spark_compatible_xxhash64 -> spark_xxhash64

Blizzara · 2024-07-24T22:53:09Z

LGTM. I left a couple comments on naming, and it would be nice if all the public functions had rustdocs, but I would be fine with merging this without those changes.

Thanks! I did the rename in 6780bb6, and added docs in ac29169 and 8bb99ea

we have separate spark_xxhash64 and spark_murmur3_hash functions which align with the name, these should collide

codecov-commenter · 2024-07-27T15:09:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.60%. Comparing base (ded3dd6) to head (abf05cd).
Report is 5 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #712      +/-   ##
============================================
+ Coverage     33.57%   33.60%   +0.03%     
  Complexity      830      830              
============================================
  Files           110      110              
  Lines         42608    42564      -44     
  Branches       9352     9361       +9     
============================================
- Hits          14306    14304       -2     
+ Misses        25347    25300      -47     
- Partials       2955     2960       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

(cherry picked from commit b04baa5)

Blizzara changed the title ~~chore: move scalar_funcs and some hashing stuff into spark-expr~~ chore: move scalar_funcs into spark-expr Jul 24, 2024

Blizzara commented Jul 24, 2024

View reviewed changes

Blizzara marked this pull request as ready for review July 24, 2024 12:09

Blizzara commented Jul 24, 2024

View reviewed changes

Blizzara added 5 commits July 24, 2024 14:33

move scalar_funcs and some hashing stuff into spark-expr

b008016

move create_comet_physical_fun back into core and make scalar funcs p…

63298bf

…ublic

make funcs for sha2 variants

8db466f

cleanup

4e53345

simplify chr func a bit

5feb777

Blizzara force-pushed the avo/move-scalar-funcs-to-spark-expr branch from 34fdec0 to 5feb777 Compare July 24, 2024 12:33

andygrove reviewed Jul 24, 2024

View reviewed changes

andygrove approved these changes Jul 24, 2024

View reviewed changes

Blizzara added 4 commits July 25, 2024 00:35

spark_compatible_murmur3_hash -> spark_murmur3_hash

6780bb6

spark_compatible_xxhash64 -> spark_xxhash64

fix build

ac29169

add docs

8bb99ea

add one more doc

525146a

Blizzara added 3 commits July 25, 2024 09:33

Merge branch 'main' into avo/move-scalar-funcs-to-spark-expr

4ecae8d

fix semantic merge conflict

4e1d084

revert the earlier naming change

abf05cd

we have separate spark_xxhash64 and spark_murmur3_hash functions which align with the name, these should collide

andygrove merged commit b04baa5 into apache:main Jul 28, 2024
74 checks passed

Blizzara deleted the avo/move-scalar-funcs-to-spark-expr branch July 28, 2024 16:06

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024

chore: move scalar_funcs into spark-expr (apache#712)

d1f840b

(cherry picked from commit b04baa5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: move scalar_funcs into spark-expr #712

chore: move scalar_funcs into spark-expr #712

Blizzara commented Jul 24, 2024

Blizzara Jul 24, 2024 •

edited

Loading

andygrove Jul 24, 2024

Blizzara Jul 24, 2024

Blizzara Jul 24, 2024 •

edited

Loading

Blizzara Jul 24, 2024

Blizzara commented Jul 24, 2024

andygrove Jul 24, 2024

Blizzara Jul 24, 2024

Blizzara Jul 25, 2024

andygrove left a comment

Blizzara commented Jul 24, 2024

codecov-commenter commented Jul 27, 2024 •

edited

Loading

		@@ -0,0 +1,186 @@
		// Licensed to the Apache Software Foundation (ASF) under one

chore: move scalar_funcs into spark-expr #712

chore: move scalar_funcs into spark-expr #712

Conversation

Blizzara commented Jul 24, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Blizzara Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

andygrove Jul 24, 2024

Choose a reason for hiding this comment

Blizzara Jul 24, 2024

Choose a reason for hiding this comment

Blizzara Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

Blizzara Jul 24, 2024

Choose a reason for hiding this comment

Blizzara commented Jul 24, 2024

andygrove Jul 24, 2024

Choose a reason for hiding this comment

Blizzara Jul 24, 2024

Choose a reason for hiding this comment

Blizzara Jul 25, 2024

Choose a reason for hiding this comment

andygrove left a comment

Choose a reason for hiding this comment

Blizzara commented Jul 24, 2024

codecov-commenter commented Jul 27, 2024 • edited Loading

Codecov Report

Blizzara Jul 24, 2024 •

edited

Loading

Blizzara Jul 24, 2024 •

edited

Loading

codecov-commenter commented Jul 27, 2024 •

edited

Loading