Skip to content

Commit

Permalink
ORC-1796: [C++] fix return wrong result if lack of hasnull
Browse files Browse the repository at this point in the history
This pr fix the bug that if the column statistics in a orc file is not fully written, and lack of hasnull field, user may get a wrong result using c++ to read it.
For example, a file struct<string col1, string col2>, has 10 lines, col1 all has value, col2 all is null. the column 1's stat written by trino may be
numberOfValues: 10
stringStatistics {
  minimum: "10"
  maximum: "100"
  sum: 565
}. col2's stat is  numberOfValues: 0. They all have no hasnull field. When we want to get where col2 is null, we will get nothing.

User may get a wrong result with this bug.

Add unit tests.

No

Closes #2055 from shuai-xu/2054.

Authored-by: shuai-xu <[email protected]>
Signed-off-by: Gang Wu <[email protected]>
  • Loading branch information
shuai-xu authored and wgtmac committed Oct 31, 2024
1 parent 92e0dcc commit c6e7f28
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 0 deletions.
3 changes: 3 additions & 0 deletions c++/src/sargs/PredicateLeaf.cc
Original file line number Diff line number Diff line change
Expand Up @@ -701,6 +701,9 @@ namespace orc {
}
}

// files written by trino may lack of hasnull field.
if (!colStats.has_has_null()) return TruthValue::YES_NO_NULL;

bool allNull = colStats.has_null() && colStats.number_of_values() == 0;
if (operator_ == Operator::IS_NULL ||
((operator_ == Operator::EQUALS || operator_ == Operator::NULL_SAFE_EQUALS) &&
Expand Down
12 changes: 12 additions & 0 deletions c++/test/TestPredicateLeaf.cc
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,12 @@ namespace orc {
return colStats;
}

static proto::ColumnStatistics createIncompleteNullStats() {
proto::ColumnStatistics colStats;
colStats.set_number_of_values(0);
return colStats;
}

static TruthValue evaluate(const PredicateLeaf& pred, const proto::ColumnStatistics& pbStats,
const BloomFilter* bf = nullptr) {
return pred.evaluate(WriterVersion_ORC_135, pbStats, bf);
Expand Down Expand Up @@ -663,4 +669,10 @@ namespace orc {
evaluate(pred8, createTimestampStats(2114380800, 1109000, 2114380800, 6789100)));
}

TEST(TestPredicateLeaf, testLackOfSataistics) {
PredicateLeaf pred(PredicateLeaf::Operator::IS_NULL, PredicateDataType::STRING, 1, {});
EXPECT_EQ(TruthValue::YES_NO, evaluate(pred, createStringStats("c", "d", true)));
EXPECT_EQ(TruthValue::YES_NO_NULL, evaluate(pred, createIncompleteNullStats()));
}

} // namespace orc

0 comments on commit c6e7f28

Please sign in to comment.