-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Security for _field_names field should not override field statistics #33261
Conversation
In Lucene 8 the statistics for a field (doc_count, sum_doc_count, ...) are checked and invalid values (v < 0) are rejected. Though for the _field_names field we hide the statistics of the field if security is enabled since some terms (field names) may be filtered. However this statistics are never used, this field is not used for ranking and cannot be used to generate term vectors. For these reasons this commit restores the original statistics for the field in order to be compliant with Lucene 8.
Pinging @elastic/es-search-aggs |
@@ -109,11 +114,13 @@ public CacheHelper getReaderCacheHelper() { | |||
private final FieldInfos fieldInfos; | |||
/** An automaton that only accepts authorized fields. */ | |||
private final CharacterRunAutomaton filter; | |||
/** {@link Terms} cache with filtered stats for the {@link FieldNamesFieldMapper} field. */ | |||
private Terms fieldNamesFilterTerms; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make it final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@@ -371,37 +375,47 @@ private Terms wrapTerms(Terms terms, String field) { | |||
* representing fields that should not be visible in this reader. | |||
*/ | |||
class FieldNamesTerms extends FilterTerms { | |||
long size = 0; | |||
long sumDocFreq; | |||
int docCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make them final somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
while (e.next() != null) { | ||
size ++; | ||
sumDocFreq += e.docFreq(); | ||
docCount = Math.max(e.docFreq(), docCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct... Maybe we should assume docCount = maxDoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oups thanks, I changed it to return maxDoc instead
run gradle build tests |
* master: (197 commits) Prevent NPE parsing the stop datafeed request. (elastic#33347) HLRC: Add ML get overall buckets API (elastic#33297) Core: Fix epoch millis java time formatter (elastic#33302) [Docs] Improve tuning for speed advice (elastic#33315) [Rollup] Fix Caps Comparator to handle calendar/fixed time (elastic#33336) [CI] Mute IndexShardTests#testIndexCheckOnStartup fails elastic#33345 [CI] Mute LuceneChangesSnapshotTests#testUpdateAndReadChangesConcurrently Security for _field_names field should not override field statistics (elastic#33261) Add early termination support to BucketCollector (elastic#33279) Fix extractjar task ci (elastic#33272) Mute testFollowIndexAndCloseNode Logging: Drop Settings from some logging ctors (elastic#33332) HLREST: add update by query API (elastic#32760) TEST: Increase timeout testFollowIndexAndCloseNode (elastic#33333) HLRC: ML Flush job (elastic#33187) HLRC: Adding ML Job stats (elastic#33183) LLREST: Drop deprecated methods (elastic#33223) Mute testSyncerOnClosingShard [DOCS] Moves machine learning APIs to docs folder (elastic#31118) Mute test watcher usage stats output ...
In Lucene 8 the statistics for a field (doc_count, sum_doc_count, ...) are
checked and invalid values (v < 0) are rejected. Though for the _field_names
field we hide the statistics of the field if security is enabled since
some terms (field names) may be filtered. However this statistics are never
used, this field is not used for ranking and cannot be used to generate
term vectors. For these reasons this commit restores the original statistics
for the field in order to be compliant with Lucene 8.