-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query Condition NOT support #3844
Conversation
This pull request has been linked to Shortcut Story #24503: Implement QueryCondition NOT support. |
585c838
to
417ba87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation somewhere should explain how NOT is realized with this implementation (though I suppose that is true for all of query_condition). At least the PR should explain a bit more than it is about NOT.
IIUC it is using DeMorgan's theorem to (recursively) convert a subtree to its complement. Kind of interesting how little needs to be done to implement NOT.
That being said -- the negate is applied while the AST is being built (in the C API). Is evaluating NOT while the tree is being built the best way to implement NOT? One thing we lose when doing it this way is that we can no longer recover the original query expression from the AST. Do we lose any ability to optimize or debug if we partially evaluate expressions this way?
Is support for NOT complete? Lines 1118, 1721, and 2429 in query_condition.cc still have cases in their switch statements that have NOT as an error (not currently supported). I suppose they don't really matter with this implementation of NOT, but in that case, there should be a different error message, because having NOT in the AST is impossible.
NOT will not be in the AST because it is evaluated while the tree is being built -- in the C API. Even if it seems like a good idea to partially evaluate NOT expressions, only being able to do that in the C API seems kind of limiting.
There are not very many unit tests (afaict) for NOT -- and there really should be many more. There do seem to be some tests of negate()
and those only seem to be at the top level, i.e., negating an entire expression and looking at the results. But there aren't any with QueryConditionCombinationOp::NOT
. There should be more complicated test expressions where NOT is not the root node. Similarly there should be tests of NOT NOT, although maybe those are really integration tests. But, per earlier comment, the NOT is not in the expression, so the tests have to consider a partially-evaluated AST, the NOT is gone.
Does NOT of a value ever make any sense? Do users expect to be able to? Perhaps on some type they are using as a bool?
delete (*combined_cond)->query_condition_; | ||
delete *combined_cond; | ||
return TILEDB_ERR; | ||
if (combination_op == TILEDB_NOT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful here to have a comment that we are partially evaluating the tree and applying NOT via DeMorgan's theorem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
sanity_check(ctx, right_cond) == TILEDB_ERR) | ||
(combination_op != TILEDB_NOT && | ||
sanity_check(ctx, right_cond) == TILEDB_ERR) || | ||
(combination_op == TILEDB_NOT && right_cond != nullptr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I like using one half of a binary operator to be a unary operator. OTOH, not sure if it is worth changing if all we are going to have is AND, OR, NOT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this same exact thought. My first reaction was to do away with the existing implementation and then provide a tiledb_query_condition_negate
function. I polled the language binding crowd and the only responses I got were in favor of keeping things as is, reusing tiledb_query_condition_combine
.
If I had it to do all over again, I would change the combine API from:
TILEDB_EXPORT int32_t tiledb_query_condition_combine(
tiledb_ctx_t* ctx,
const tiledb_query_condition_t* left_cond,
const tiledb_query_condition_t* right_cond,
tiledb_query_condition_combination_op_t combination_op,
tiledb_query_condition_t** combined_cond) TILEDB_NOEXCEPT;
to:
TILEDB_EXPORT int32_t tiledb_query_condition_combine(
tiledb_ctx_t* ctx,
const tiledb_query_condition_t** conditions,
size_t num_conditions,
tiledb_query_condition_combination_op_t combination_op,
tiledb_query_condition_t** combined_cond) TILEDB_NOEXCEPT;
The second definition removes the baked in 2-arity semantics of the combination_op
so that NOT
and >2
-arity AND/OR operators would also be naturally expressed. I briefly considered combining this approach with #3814 but @eric-hughes-tiledb had some other concerns around the general approach of #3814 so I'm leaving it be for now.
CHECK( | ||
tiledb::test::ast_node_to_str(combined_and.ast()) == | ||
"(x LT 12 ef cd ab AND y GT 33 33 33 33)"); | ||
check_ast_str(combined_and, "(x LT 12 ef cd ab AND y GT 33 33 33 33)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests negate()
but I don't see anything in this file that tests NOT
. It doesn't seem like we are actually testing expressions with NOT
in them, but rather just testing negative()
on different expressions, meaning what is being tested is only NOT
at the root of a tree. I think we need to test more complicated uses of NOT
than that.
If NOT
is only going to be applicable from the C API, it needs to be tested there.
(My own opinion is that NOT
should not be evaluated only in the C API and that we should be able to build expressions that have QueryConditionCombinationOp::NOT
in them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation/assumption that NOT can't exist in the AST, there's nothing extra to test. I do need to add C/C++ API tests to prove the the changes in tiledb.cc are correct. The current tests are just covering correctness.
That's fine. Though, I'm not sure where best to put that. I can certainly add a comment in the code along with a similar write up in the PR for the code history. If there's another place in documentation about query conditions and the AST that needs updating I can certainly do that as well. Though I'm guessing this sort of documentation is fairly diffuse in each of the language binding implementations of query conditions.
Correct, its a standard negation.
Not only do I have no idea, I suspect this is vaguely unanswerable in a philosophical manner. I would downgrade to a simple "Not off the top of my head" if we relaxed it to "Can you think of a better way to implement NOT?" though.
This also seems a bit philosophical. Given the existing implementation, when we receive the negated AST, as far as the library is concerned, that is the original query. I'd also push back (at least within the given implementation) that this is a partial evaluation. The current implementation has two distinct phases: build the ast/query condition, then evaluate it. Currently, negation is a concept in the build phase and doesn't exist at all in the evaluation phase. For comparison, when we AND two AND clauses together, i.e.,
Define "finished", I guess? These three error messages are a bit belt and suspender-y (or belt and bracer-y for my UK English friends) defensive coding. Going back and preventing any of the constructors from allowing a NOT to exist in the AST and adding assertions/stdx::unreachable/std::logic_error exceptions might make things more clear, though functionally the same in that NOT should not exist in the AST (as things are currently implemented).
I'm not 100% certain I understand this part. Applying NOT conditions can be done from the internal API, the C API, and the C++ API (which should by extension cover all language bindings). This PR is just allowing for the usage in the C API which is then automatically available to the C++ API.
For the test coverage part of this code, the reason it looks like there are so few and shallow tests is probably due to how I implemented the tests by re-using a bunch of the existing tests. Basically, I just gave the The NOT NOT tests are a good point. I'll add those to be explicit. Though, technically, I don't think they add anything other than an explicit assertion of correctness via demonstration of the isomorphism of De Morgan's laws? The reason there aren't any tests with
Currently, I don't think so because 100% of our conditionals are of the form A thought on why all of the above might be completely wrongOne interesting thought I had while writing everything above is that our current AST/query condition implementation is significantly faster when applying AND combinations as opposed to OR combinations. If we allowed NOT into the AST, I'm pretty sure we could use De Morgan's laws to implement any arbitrary AST as a combination of AND and NOT instead of the current AND and OR approach. With the assumption that NOT would be easier/faster to implement than OR, I could see switching things around for that particular case. Come to think of it, I don't think we even need to allow NOT in the tree as the current implementation of NOT should work to satisfy De Morgan's laws. I may have just been nerd sniped into poking at the current implementation to rewrite the tree at evaluation time to see if that actually works. |
tl;dr I’m pretty sure the counting bitmap aspect of apply sparse prevents NOT from being allowed in the AST for at least that evaluation path. So, I tried implementing the NAND approach I described above and ran into two issues. The first is that our handling of NULL comparisons is not symmetric with respect to negation. For instance, we can’t blindly invert the result of “foo < 5” when foo is null. Although we can blindly invert “foo == NULL” so we can’t also just blindly not invert when foo’s validity buffer is zero. I’m pretty sure this could be overcome with fairly minor changes to the existing implementation. However, the second issue is how our “counting bitmaps” work in the apply sparse implementation. If I’m not mistaken, this logic isn’t invertible at all based on how the counters interact. Perhaps there’s a clever solution here, but I’m pretty sure we have an insurmountable issue around false results truncating our boolean signal by setting the result bitmap to zero. For the first issue, I can see multiple approaches. Having a separate validity buffer that we carefully update in multiple places to indicate when negation is appropriate is the simplest. Then there’s an obvious optimization around merging the invertibility boolean array into the result bitmap by treating it as a 8 boolean flags per cell instead of uint8 values of zero or one. Granted, that approach doesn’t help at all for the sparse version. For the sparse version, I could maybe see some sort of signed counting bitmap maybe something something perhaps? I’m not even convinced this is plausible. This half formed vague idea should probably be ignored. The end result being, I don’t think we can have negations in this evaluation path. |
417ba87
to
dd9ef69
Compare
dd9ef69
to
7396a8b
Compare
7396a8b
to
097aa8f
Compare
097aa8f
to
4902c00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Implementation of QC NOT support.
Things changed:
Things to do
TYPE: FEATURE
DESC: Query Condition NOT support