-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#610: SampleArray util method replaced by using arrow::compute::Take … #612
Conversation
…te::Take and pass corresponding indices to it. Note: 1)Commented code that was deprecated by the resolution of this issue 2) Note comments in sample_binary_array function related to result status.
…te::Take and pass corresponding indices to it - partial
cpp/src/cylon/util/arrow_utils.cpp
Outdated
auto arr_result = builder.Finish(); | ||
|
||
RETURN_ARROW_STATUS_IF_FAILED(arr_result.status()); | ||
|
||
auto indices = arr_result.ValueOrDie(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto arr_result = builder.Finish(); | |
RETURN_ARROW_STATUS_IF_FAILED(arr_result.status()); | |
auto indices = arr_result.ValueOrDie(); | |
ARROW_ASSIGN_OR_RAISE(auto indices, builder.Finish()); |
This is a convenient macro where arrow::Result
is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be done
cpp/src/cylon/util/arrow_utils.cpp
Outdated
auto sample_array = arrow::compute::Take(out, indices, take_options); | ||
|
||
RETURN_ARROW_STATUS_IF_FAILED(sample_array.status()); | ||
|
||
out = sample_array.ValueOrDie().make_array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be done
cpp/src/cylon/util/arrow_utils.cpp
Outdated
@@ -166,6 +166,7 @@ arrow::Status Duplicate(const std::shared_ptr<arrow::Table> &table, arrow::Memor | |||
return arrow::Status::OK(); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove all commented out code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be done
…te::Take and pass corresponding indices to it - changes from PR review
…te::Take and pass corresponding indices to it - changes from build to using std:vector and std:sort. Vector data is moved to an arrow::Buffer::Wrap (wrapper) that can be used by the actual arrow::Array.
…te::Take and pass corresponding indices to it - changes from build to using std:vector and std:sort. Vector data is moved to an arrow::Buffer::Wrap (wrapper) that can be used by the actual arrow::Array (continued - adds call to WrapNumericVector function added by cylondata#615-cylondata#616)
…nges to table.cpp
cpp/src/cylon/util/arrow_utils.hpp
Outdated
cylon::Status SampleTableUniform(const std::shared_ptr<Table> &local_sorted, | ||
int num_samples, std::vector<int32_t> sort_columns, | ||
std::shared_ptr<Table> &sample_result, | ||
const std::shared_ptr<CylonContext> &ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use all arrow objects in this header, and at the call site convert the arrow table to cylon table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand why we would want to do this. Can you please elaborate on the reasoning.
cpp/src/cylon/util/arrow_utils.cpp
Outdated
std::vector<int64_t> vector_indices(num_samples); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should do a reserve here. Otherwise vector would instantiate num_samples
number of 0 in the vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just syntactic sugar. All utils in this header are accepting/ returning arrow objects. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok just pushed a change for this. Note my question on std::move. Since arrow::Table is an abstract class, std::move will not compile. Thoughts?
…ns and remove the incorrect initialization in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made some minor comments. Can you also run a code formatter with Google styling (100 chars)?
cpp/src/cylon/table.cpp
Outdated
auto status = util::SampleTableUniform(merged_table->get_table(), num_split_points, sort_columns, | ||
const_cast<std::shared_ptr<arrow::Table> &>(split_points->get_table()), ctx); | ||
|
||
return cylon::Status(static_cast<int>(status.code()), status.message()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mstaylor this could be problematic. Hmm... This will actually create a NPE AFAIU.
auto status = util::SampleTableUniform(merged_table->get_table(), num_split_points, sort_columns, | |
const_cast<std::shared_ptr<arrow::Table> &>(split_points->get_table()), ctx); | |
return cylon::Status(static_cast<int>(status.code()), status.message()); | |
std::shared_ptr<arrow::Table> atable; | |
RETURN_CYLON_STATUS_IF_ARROW_FAILS(util::SampleTableUniform(merged_table->get_table(), | |
num_split_points, sort_columns, atable, ctx); | |
return Table::FromArrowTable(ctx, std::move(atable), split_points); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed
cpp/src/cylon/table.cpp
Outdated
|
||
std::shared_ptr<Table> cylon_sample_result; | ||
|
||
Table::FromArrowTable(ctx, sample_result, cylon_sample_result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table::FromArrowTable(ctx, sample_result, cylon_sample_result); | |
RETURN_CYLON_STATUS_IF_TAILED(Table::FromArrowTable(ctx, sample_result, cylon_sample_result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed
@mstaylor BTW please resolve the conflicts as well 🙂 |
# Conflicts: # cpp/src/cylon/table.cpp
Yes - the next build includes that requested change. |
… applied google formatter per PR request. Validated tests complete successfully.
I've validated the build via the following command: python build.py --cpp --test --python --pytest |
…and pass corresponding indices to it. Note:
1)Commented code that was deprecated by the resolution of this issue
2) Note comments in sample_binary_array function related to result status.