-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [sc-58279] [core] add tiledb_array_schema_get_enumeration
API
#5359
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left some small comments, looks great so far 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me. @ypatia already suggested the REST test which is all I had thought of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 👍
…r attribute name, and also use context to do the I/O
After using this branch in my While doing so I also noticed that So the most recent commit does the following:
I reckon this merits a re-review since it's a pretty big change. |
@@ -36,6 +36,7 @@ commence(object_library capi_array_schema_stub) | |||
this_target_sources(${SOURCES}) | |||
this_target_link_libraries(export) | |||
this_target_object_libraries(array_schema) | |||
this_target_object_libraries(array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because tiledb/sm/array_schema/array_schema.cc
calls upon tiledb/sm/array/array_directory.cc
. So now the unit test needs to link with that.
…b_array_schema_get_attribute_from_name
…ay-schema-get-enumeration
I think this is all set, CI passes and I don't intend to make any more changes. I'd recommend re-reviewing as this has drfited a bit since it was approved. Since the last review I determined I was unsatisfied with the "return the enumeration only if loaded" approach and have updated it to load the enumeration as part of the API request. The code was easy enough but the troubles come with avoiding circular dependencies. The other change since the last review is tweaking the API to be The good news is that my downstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Most comments were nits on updating docs since things have changed a bit since the first pass. I don't think there should be too many changes from my comments but if you want me to take another look let me know. Just as a heads up we should probably update the PR description before merging.
auto tracker = ctx.resources().ephemeral_memory_tracker(); | ||
// Pass an empty list of enumeration names. REST will use timestamps to | ||
// load all enumerations on all schemas for the array within that range. | ||
auto ret = rest_client.post_enumerations_from_rest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been great to have avoided the second REST request here, but probably this is the best we can do without changing REST.. I'm ok with this for the time being, #5181 is a WIP and once that's finished this can definitely happen in a single request. Also the rest.load_enumerations_on_array_open
option defaults to false so until that work is finished we won't make this second request unless the client asks for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't love this implementation, but... it gets the job done without having any obvious problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -717,6 +717,32 @@ class Array { | |||
return ArraySchema(ctx, schema); | |||
} | |||
|
|||
/** | |||
* Loads the array schema from an array. | |||
* Options to load additional features are read from the optionally-provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optionally-provided
} | ||
|
||
std::string enumeration_name(enumeration_name_inner->view()); | ||
return api_entry_with_context< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do return tiledb_array_schema_get_enumeration_from_name(ctx, array_schema, enumeration_name.c_str(), enumeration);
to shorten this a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly I think the answer is "no" without making changes elsewhere.
tiledb_string_t
has no c_str()
, just view()
and I was initially surprised to see that string_view
also has no c_str()
, though in retrospect it makes sense.
* @param enmr_name the requested enumeration | ||
* @param schema the target schema | ||
*/ | ||
void load_enumeration_into_schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this maybe go into array_schema_operations.cc
, or did you also run into problems there with dependencies? At a glance it seems like it could work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it won't work there either. I am pretty new to cmake but my impression is that all the array_schema/*.cc
files are compiled into the same "object library", hence putting this code in array_schema_operations.cc
would also have the circular dependency between array
and array_schema
|
||
namespace tiledb::test { | ||
|
||
bool is_equivalent_enumeration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this here, I'm likely going to use it when you merge
Co-authored-by: Shaun M Reed <[email protected]>
Co-authored-by: Shaun M Reed <[email protected]>
Co-authored-by: Shaun M Reed <[email protected]>
Co-authored-by: Shaun M Reed <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was sick Thursday and this went in before I could review it. I am sorry for the delayed comments, but the ones around the new C-API body would need to be addressed.
auto array_schema = serialization::array_schema_deserialize( | ||
serialization_type_, returned_data, memory_tracker_); | ||
|
||
array_schema->set_array_uri(uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some background on why this was needed? Some test failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this change came about when testing against the REST server. The REST server does not serialize the array URI (and I think it may also be different on the backend).
I will remove this line and run the unit tests to see what happened - probably will get to that later this week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this we get an error on the client side in tiledb::sm::load_enumeration_into_schema
.
if (array_schema.array_uri().is_tiledb()) {
/* it's remote, talk to REST server */
} else {
/* it's local, read filesystem */
}
If we don't set the array URI here then it is blank. As a result is_tiledb
is false and the schema tries to read the local filesystem, which results in error when it tries to open the empty path.
auto tracker = ctx.resources().ephemeral_memory_tracker(); | ||
// Pass an empty list of enumeration names. REST will use timestamps to | ||
// load all enumerations on all schemas for the array within that range. | ||
auto ret = rest_client.post_enumerations_from_rest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#5359 was merged and then there were some additional changes requested. This pull request implements those changes. - [fix a comment typo](#5359 (comment)) - [avoid nested C API calls and fix memory leak](#5359 (comment)) - [add a test with schema evolution](#5359 (comment)) --- TYPE: BUG DESC: Follow up on #5359
Story details: https://app.shortcut.com/tiledb-inc/story/58279
This pull request goes all the way and adds functions
tiledb_array_schema_get_enumeration_from_name
andtiledb_array_schema_get_enumeration_from_attribute_name
which load the contents of an enumeration and return a handle to the user.See this comment for further discussion.
TYPE: FEATURE | C_API | CPP_API
DESC: add tiledb_array_schema_get_enumeration