Security for _field_names field should not override field statistics #33261

jimczi · 2018-08-30T07:52:55Z

In Lucene 8 the statistics for a field (doc_count, sum_doc_count, ...) are
checked and invalid values (v < 0) are rejected. Though for the _field_names
field we hide the statistics of the field if security is enabled since
some terms (field names) may be filtered. However this statistics are never
used, this field is not used for ranking and cannot be used to generate
term vectors. For these reasons this commit restores the original statistics
for the field in order to be compliant with Lucene 8.

In Lucene 8 the statistics for a field (doc_count, sum_doc_count, ...) are checked and invalid values (v < 0) are rejected. Though for the _field_names field we hide the statistics of the field if security is enabled since some terms (field names) may be filtered. However this statistics are never used, this field is not used for ranking and cannot be used to generate term vectors. For these reasons this commit restores the original statistics for the field in order to be compliant with Lucene 8.

elasticmachine · 2018-08-30T07:52:56Z

Pinging @elastic/es-search-aggs

jpountz · 2018-08-31T09:06:11Z

...c/main/java/org/elasticsearch/xpack/core/security/authz/accesscontrol/FieldSubsetReader.java

@@ -109,11 +114,13 @@ public CacheHelper getReaderCacheHelper() {
    private final FieldInfos fieldInfos;
    /** An automaton that only accepts authorized fields. */
    private final CharacterRunAutomaton filter;
+    /** {@link Terms} cache with filtered stats for the {@link FieldNamesFieldMapper} field. */
+    private Terms fieldNamesFilterTerms;


let's make it final?

jpountz · 2018-08-31T09:07:05Z

...c/main/java/org/elasticsearch/xpack/core/security/authz/accesscontrol/FieldSubsetReader.java

@@ -371,37 +375,47 @@ private Terms wrapTerms(Terms terms, String field) {
     * representing fields that should not be visible in this reader.
     */
    class FieldNamesTerms extends FilterTerms {
+        long size = 0;
+        long sumDocFreq;
+        int docCount;


can we make them final somehow?

jpountz · 2018-08-31T09:07:49Z

...c/main/java/org/elasticsearch/xpack/core/security/authz/accesscontrol/FieldSubsetReader.java

+            while (e.next() != null) {
+                size ++;
+                sumDocFreq += e.docFreq();
+                docCount = Math.max(e.docFreq(), docCount);


I don't think this is correct... Maybe we should assume docCount = maxDoc.

oups thanks, I changed it to return maxDoc instead

jimczi · 2018-08-31T09:10:03Z

We discussed with @jpountz and he proposed that we recompute the stats for this field rather than keeping the wrong stats. This should be fast since the number of terms is bounded by the number of fields in the index and the result can be cached per leaf reader. This idea implemented in c3c8ca3.

jimczi · 2018-08-31T13:36:09Z

run gradle build tests

* master: (197 commits) Prevent NPE parsing the stop datafeed request. (elastic#33347) HLRC: Add ML get overall buckets API (elastic#33297) Core: Fix epoch millis java time formatter (elastic#33302) [Docs] Improve tuning for speed advice (elastic#33315) [Rollup] Fix Caps Comparator to handle calendar/fixed time (elastic#33336) [CI] Mute IndexShardTests#testIndexCheckOnStartup fails elastic#33345 [CI] Mute LuceneChangesSnapshotTests#testUpdateAndReadChangesConcurrently Security for _field_names field should not override field statistics (elastic#33261) Add early termination support to BucketCollector (elastic#33279) Fix extractjar task ci (elastic#33272) Mute testFollowIndexAndCloseNode Logging: Drop Settings from some logging ctors (elastic#33332) HLREST: add update by query API (elastic#32760) TEST: Increase timeout testFollowIndexAndCloseNode (elastic#33333) HLRC: ML Flush job (elastic#33187) HLRC: Adding ML Job stats (elastic#33183) LLREST: Drop deprecated methods (elastic#33223) Mute testSyncerOnClosingShard [DOCS] Moves machine learning APIs to docs folder (elastic#31118) Mute test watcher usage stats output ...

jimczi added >non-issue :Search/Search Search-related issues that do not fall into other categories v7.0.0 team-discuss labels Aug 30, 2018

jimczi mentioned this pull request Aug 30, 2018

Lucene 8 upgrade checklist #32899

Closed

18 tasks

jimczi added 2 commits August 31, 2018 09:49

Merge branch 'master' into field_names_field_subset

ce39dd7

recompute filtered stats for the field_names field

c3c8ca3

jpountz reviewed Aug 31, 2018

View reviewed changes

apply feedbacks

8c1d4cd

jimczi mentioned this pull request Aug 31, 2018

Upgrade to a Lucene 8 snapshot #33310

Merged

jimczi removed the team-discuss label Aug 31, 2018

Merge branch 'master' into field_names_field_subset

0079f98

jpountz approved these changes Aug 31, 2018

View reviewed changes

jimczi merged commit f0a61b6 into elastic:master Sep 3, 2018

jimczi deleted the field_names_field_subset branch September 3, 2018 07:36

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security for _field_names field should not override field statistics #33261

Security for _field_names field should not override field statistics #33261

jimczi commented Aug 30, 2018

elasticmachine commented Aug 30, 2018

jpountz Aug 31, 2018

jimczi Aug 31, 2018

jpountz Aug 31, 2018

jimczi Aug 31, 2018

jpountz Aug 31, 2018

jimczi Aug 31, 2018

jimczi commented Aug 31, 2018

jimczi commented Aug 31, 2018

Security for _field_names field should not override field statistics #33261

Security for _field_names field should not override field statistics #33261

Conversation

jimczi commented Aug 30, 2018

elasticmachine commented Aug 30, 2018

jpountz Aug 31, 2018

Choose a reason for hiding this comment

jimczi Aug 31, 2018

Choose a reason for hiding this comment

jpountz Aug 31, 2018

Choose a reason for hiding this comment

jimczi Aug 31, 2018

Choose a reason for hiding this comment

jpountz Aug 31, 2018

Choose a reason for hiding this comment

jimczi Aug 31, 2018

Choose a reason for hiding this comment

jimczi commented Aug 31, 2018

jimczi commented Aug 31, 2018