-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Add logical operator for map() #31912
Conversation
Signed-off-by: Cheng Su <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG mostly.
@@ -22,44 +23,99 @@ | |||
from typing_extensions import Literal | |||
|
|||
|
|||
class MapBatches(LogicalOperator): | |||
"""Logical operator for map_batches.""" | |||
class BaseMapLike(LogicalOperator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name looks strange, maybe LogicalMap or AbstractMap or MapInterface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianoaix - sure, changed to AbstractMap
.
self._zero_copy_batch = zero_copy_batch | ||
|
||
|
||
class Map(BaseMapLike): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Map/MapRow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm still want to keep Map
, to be consistent as Dataset API name, to reduce our mental overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think need to use the same as public API though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, renamed to MapRows
, to be consistent with MapBatches
operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think the iteration API names are quite intuitive: iter_rows
and iter_batches
. The map
has been leading users to take it as the primary mapping API whereas map_batches
is recommended.
Signed-off-by: Cheng Su <[email protected]>
Signed-off-by: Cheng Su <[email protected]>
Signed-off-by: Cheng Su [email protected]
Why are these changes needed?
This PR is to add logical operator for
map()
, with change:(1).introduce
AbstractMap
abstract logical operator, thatMapBatches
,MapRows
, etc to extend.(2).rename existing
plan_map_batches_op
toplan_map_op
, to make it more general.Related issue number
Closes #31937.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.