-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frozen Indices #34352
Comments
Today if a wildcard, date-math expression or alias expands/resolves to an index that is search-throttled we still search it. This is likely not the desired behavior since it can unexpectedly slow down searches significantly. This change adds a new indices option that allows `search`, `count` and `msearch` to ignore throttled indices by default. Users can force expansion to throttled indices by using `ignore_throttled=true` on the rest request to expand also to throttled indices. Relates to elastic#34352
This change adds a `frozen` engine that allows lazily open a directory reader on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader that also allows to release and reset the underlying index readers after any and before secondary search phases. Relates to elastic#34352 F
This change adds a high level freeze API that allows to open an index frozen and vice versa. Indices must be closed in order to become frozen and an open but frozen index must be closed to be defrosted. This change also adds a `index.frozen` setting to mark frozen indices and integrates the frozen engine with the `SearchOperationListener` that resets and releases the directory reader after and before search phases. Relates to elastic#34352 Depends on elastic#34357
@debadair where do you see the documentation go for this feature? |
Pinging @elastic/es-distributed |
I thought about some future work on this and wanted to put it here for awareness and potential discussion:
|
…#34354) Today if a wildcard, date-math expression or alias expands/resolves to an index that is search-throttled we still search it. This is likely not the desired behavior since it can unexpectedly slow down searches significantly. This change adds a new indices option that allows `search`, `count` and `msearch` to ignore throttled indices by default. Users can force expansion to throttled indices by using `ignore_throttled=true` on the rest request to expand also to throttled indices. Relates to #34352
…#34354) Today if a wildcard, date-math expression or alias expands/resolves to an index that is search-throttled we still search it. This is likely not the desired behavior since it can unexpectedly slow down searches significantly. This change adds a new indices option that allows `search`, `count` and `msearch` to ignore throttled indices by default. Users can force expansion to throttled indices by using `ignore_throttled=true` on the rest request to expand also to throttled indices. Relates to #34352
This change adds a `frozen` engine that allows lazily open a directory reader on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader that also allows to release and reset the underlying index readers after any and before secondary search phases. Relates to #34352
This change adds a high level freeze API that allows to open an index frozen and vice versa. Indices must be closed in order to become frozen and an open but frozen index must be closed to be defrosted. This change also adds a `index.frozen` setting to mark frozen indices and integrates the frozen engine with the `SearchOperationListener` that resets and releases the directory reader after and before search phases. Relates to elastic#34352 Depends on elastic#34357
This change adds a high level freeze API that allows to open an index frozen and vice versa. Indices must be closed in order to become frozen and an open but frozen index must be closed to be defrosted. This change also adds a `index.frozen` setting to mark frozen indices and integrates the frozen engine with the `SearchOperationListener` that resets and releases the directory reader after and before search phases. Relates to elastic#34352 Depends on elastic#34357
This change adds a high level freeze API that allows to mark an index as frozen and vice versa. Indices must be closed in order to become frozen and an open but frozen index must be closed to be defrosted. This change also adds a index.frozen setting to mark frozen indices and integrates the frozen engine with the SearchOperationListener that resets and releases the directory reader after and before search phases. Relates to #34352 Depends on #34357
This change adds a `frozen` engine that allows lazily open a directory reader on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader that also allows to release and reset the underlying index readers after any and before secondary search phases. Relates to #34352
This change adds a high level freeze API that allows to mark an index as frozen and vice versa. Indices must be closed in order to become frozen and an open but frozen index must be closed to be defrosted. This change also adds a index.frozen setting to mark frozen indices and integrates the frozen engine with the SearchOperationListener that resets and releases the directory reader after and before search phases. Relates to #34352 Depends on #34357
This change adds a special caching reader that caches all relevant values for a range query to rewrite correctly in a can_match phase without actually opening the underlying directory reader. This allows frozen indices to be filtered with can_match and in-turn searched with wildcards in a efficient way since it allows us to exclude shards that won't match based on their date-ranges without opening their directory readers. Relates to elastic#34352 Depends on elastic#34357
This change adds a `frozen` engine that allows lazily open a directory reader on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader that also allows to release and reset the underlying index readers after any and before secondary search phases. Relates to elastic#34352
This change adds a special caching reader that caches all relevant values for a range query to rewrite correctly in a can_match phase without actually opening the underlying directory reader. This allows frozen indices to be filtered with can_match and in-turn searched with wildcards in a efficient way since it allows us to exclude shards that won't match based on their date-ranges without opening their directory readers. Relates to #34352 Depends on #34357
This change adds a special caching reader that caches all relevant values for a range query to rewrite correctly in a can_match phase without actually opening the underlying directory reader. This allows frozen indices to be filtered with can_match and in-turn searched with wildcards in a efficient way since it allows us to exclude shards that won't match based on their date-ranges without opening their directory readers. Relates to #34352 Depends on #34357
This commit adds a rest endpoint for freezing and unfreezing an index. Among other cleanups mainly fixing an issue accessing package private APIs from a plugin that got caught by integration tests this change also adds documentation for frozen indices. Note: frozen indices are marked as `beta` and available as a basic feature. Relates to elastic#34352
This commit adds a rest endpoint for freezing and unfreezing an index. Among other cleanups mainly fixing an issue accessing package private APIs from a plugin that got caught by integration tests this change also adds documentation for frozen indices. Note: frozen indices are marked as `beta` and available as a basic feature. Relates to #34352
This commit adds a rest endpoint for freezing and unfreezing an index. Among other cleanups mainly fixing an issue accessing package private APIs from a plugin that got caught by integration tests this change also adds documentation for frozen indices. Note: frozen indices are marked as `beta` and available as a basic feature. Relates to #34352
This change adds support for `_freeze` and `_unfreeze` to the HLRC Relates to elastic#34352
When benchmarking frozen indices to get an idea of the overhead and added latency it adds I can see two common scenarios: 1) Dedicated Cold nodes Here all indices will be frozen, and due to the low heap usage it may be possible to hold a lot of data with a small heap. In order to efficiently use the available resources and allow parallel processing of frozen indices, I assume it may make sense to set the number of threads used for frozen indices processing to the number of CPU cores on the host. This naturally assumes there is enough heap available to support this. 2) Warm nodes Frozen indices could also be used to make Warm nodes able to handle larger amounts of data. Here the most recent and generally most queried data would be open, while older indices that are not queried frequently could be frozen. The ratio of frozen to open indices will naturally vary from use-case to use-case, but for the benchmark a 50% ratio might be a good starting point. In this scenario, a single thread used to handle frozen indices is probably appropriate. Does this make sense? In addition to benchmarking these two scenarios, I would also like to benchmark using indices that have and have not been forcemerged down to a single segment since restoration time here may vary. If we add a reference benchmark for each scenario with all indices open, we get a total of 8 benchmark runs. Benchmark set-up Test data would be generated using the rally-eventdata-track. A single index with 2 primary shards and around 50GB in size would probably be reasonably realistic. Snapshots would be created before and after forcemerging it down to a single segment. This index can then be restored and renamed multiple times to reach a reasonable data volume (~1TB?). Querying would simulate the standard Kibana dashboards available in the rally-eventdata-track. As Kibana dashboards can require a good amount of processing, I would probably run querying in a single thread. As querying using Rally is quite light, we should be able to run this on a single host. A AWS EC2 d2.xlarge instance might be a good choice as it has enough disk space and used the type of slower storage we often see for these node types. |
I would like to keep the thread-settings for this to the default. We can still test this but that is for a later iteration.
In general I think this scenario is rare and benchmarking will be tricky. I don't want to give the impression that this is a recommended setup so I'd rather not advertise it too much. |
This issue is basically done. I am keeping this open in order to publish numbers from benchmarks in the near future. |
I have now run a first set of benchmarks for frozen indices where the number of indices in the cluster gradually increases from 1 to 18. Each index consists of 2 shards and is ~50GB in size and was generated and queried using the rally-eventdata-track. This was run against a build with the build hash Querying was done using two types of simulated Kibana queries:
For indices forcemerged down to a single segment, two different heap sizes were used when querying frozen indices: 1GB and 15GB. For the larger heap, a reference run with all indices open was performed at every step. Each run consists of 20 queries with a varying time interval, and 50% and 100% percentile service time values are shown. The results are shown below in the form of Kibana screenshots from the Rally metrics store. Discover query, 15GB Heap, Forcemerged indices, all indices open and not frozen Discover query, 1GB/15GB Heap, Forcemerged indices, all indices frozen Content Issues Dashboard query, 15GB Heap, Forcemerged indices, all indices open and not frozen Content Issues Dashboard query, 1GB/15GB Heap, Forcemerged indices, all indices frozen |
thanks you @cdahlqvist great results. I will now close this issue. thanks everyone involved. |
given the benchmark results on elastic#34352 it's important to recommend users to `_force_merge` their indice to a single segment before freezing.
given the benchmark results on elastic#34352 it's important to recommend users to `_force_merge` their indices to a single segment before freezing.
given the benchmark results on #34352 it's important to recommend users to `_force_merge` their indices to a single segment before freezing.
Today it's very difficult to see which indices are frozen or rather throttled via the commonly used monitoring APIs. This change adds a cell to the `_cat/indices` API to render if an index is `search.throttled` Relates to elastic#34352
Today it's very difficult to see which indices are frozen or rather throttled via the commonly used monitoring APIs. This change adds a cell to the `_cat/indices` API to render if an index is `search.throttled` Relates to #34352
Today it's very difficult to see which indices are frozen or rather throttled via the commonly used monitoring APIs. This change adds a cell to the `_cat/indices` API to render if an index is `search.throttled` Relates to #34352
@s1monw Maybe I missed the answer in this thread, but this is/won't be allowed? But it took me a while to figure out that |
Frozen indices are intended to allow much higher ratios of disk storage to heap, at the expense of search latency. The idea is to keep frozen indices in an unloaded state (ie not loading lucene data-structures into heap) until they are searched. When a search targets frozen indices, each index would be searched sequentially (instead of in parallel), and each index would be loaded, searched, then unloaded again. Frozen indices would be replicated, unlike closed indices today. In-fact frozen indices are more like open indices with metadata in memory and data fully on disk until it's needed to be loaded
Frozen indices will be available in the default distribution but not in the pure OSS distribution.
In oder to implement this, the feature is broken down to the following steps:
index.search.throttled
that will force searches as well as other data access like_get
or_explain
through a dedicatedsearch_throttled
threadpool which has only 1 thread by default. (Introduce asearch_throttled
threadpool #33732)_search
and_msearch
by default. This will prevent frozen indices to be included in wildcard or alias searches by default but allows opt-in to searching these indices. At the same time these indices are treated like ordinary indices in all other APIs (Prevent throttled indices to be searched through wildcards by default #34354)can_match
phase in order to exclude frozen indices incan_match
without opening up their index reader. (Allow efficient can_match phases on frozen indices #35431)_freeze
/_unfreeze
API #35592)_freeze
/_unfreeze
(Add high-level REST client API for_freeze
and_unfreeze
#35723)The text was updated successfully, but these errors were encountered: