ILM shrink action does not retain creation_date #42206

GoodMirek · 2019-05-18T16:23:57Z

Elasticsearch version (bin/elasticsearch --version):
6.7.2 (Elastic Cloud)

Plugins installed: [repository-s3]

JVM version (java -version):
not sure, as it is hosted by elastic

OS version (uname -a if on a Unix-like system):
not sure, as it is hosted by elastic

Description of the problem including expected versus actual behavior:
When index is shrunk by ILM, it does not retain original index settings. Most importantly, it does not retain creation_date, what causes that further ILM processing does not happen at originally planned time.
Similar to elastic/curator#1347

Steps to reproduce:

Create an ILM policy that will shrink an index in warm phase
Associate a policy to an existing index, which is old enough to trigger warm phase
Observe settings of shrunk index

The text was updated successfully, but these errors were encountered:

GoodMirek · 2019-05-18T16:46:17Z

It seems the issue has already been fixed for 6.8 release by this commit.
However, I am not able to validate it will work now.

elasticmachine · 2019-05-18T22:05:27Z

Pinging @elastic/es-core-features

dakrone · 2019-05-18T22:05:46Z

The 6.7.2 release tag does copy the settings with the resize request:

elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifecycle/ShrinkStep.java

Line 61 in 56c6e48

resizeRequest.setCopySettings(true);

Most importantly, it does not retain creation_date, what causes that further ILM processing does not happen at originally planned time.

Can you share more of your policy and example use case so I can understand what you're trying to do?

GoodMirek · 2019-05-19T06:20:50Z

@dakrone Thanks for your answer. So there must be some other issue, as I still experience the described behavior.

Please find my ILM policy below:

{
  "logs-ilm-policy" : {
    "version" : 4,
    "modified_date" : "2019-05-18T19:25:43.876Z",
    "policy" : {
      "phases" : {
        "warm" : {
          "min_age" : "8d",
          "actions" : {
            "forcemerge" : {
              "max_num_segments" : 1
            },
            "set_priority" : {
              "priority" : 50
            },
            "shrink" : {
              "number_of_shards" : 1
            }
          }
        },
        "cold" : {
          "min_age" : "21d",
          "actions" : {
            "allocate" : {
              "number_of_replicas" : 0,
              "include" : { },
              "exclude" : { },
              "require" : {
                "data" : "warm"
              }
            },
            "set_priority" : {
              "priority" : 10
            }
          }
        },
        "hot" : {
          "min_age" : "0ms",
          "actions" : {
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "delete" : {
          "min_age" : "100d",
          "actions" : {
            "delete" : { }
          }
        }
      }
    }
  }
}

What happens is that after transition to warm phase the index.creation_date is reset to the current time. The issue did not happen for indices which already had just one shard, thus were not shrunk by the shrink action.
Also, running POST test-1/_shrink/shrink-test-1?copy_settings=true retains the index.creation_date.

Use case

There are multiple logs indices. Some of them are created with time pattern in their name, e.g. "logs_web.%{+xxxx.ww}", which creates a new index every week. It is important that none of the indices is written again after 8 days since their creation. At that moment, I want to speed up search and save memory, so I shrink all indices to one shard, forcemerge them to one segment and also set their priority to 50. Everything works well, goal is achieved, but the issue with index.creation_date causes shrunk indices to transition to cold and delete phases eight days later than indices which were not shrunk.

dakrone · 2019-05-29T23:04:59Z

Everything works well, goal is achieved, but the issue with index.creation_date causes shrunk indices to transition to cold and delete phases eight days later than indices which were not shrunk.

ILM doesn't use index.creation_date for determining when to transition to the next phase (like cold or delete). In the "init" phase for an index (injected as the first step in any policy) we set custom metadata for when the index was created, see:

elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifecycle/InitializePolicyContextStep.java

Line 46 in 45873b1

newCustomData.setIndexCreationDate(indexMetaData.getCreationDate());

This is what is used when determining whether to transition to a state:

elasticsearch/x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/indexlifecycle/IndexLifecycleRunner.java

Line 80 in 45873b1

final Long lifecycleDate = lifecycleState.getLifecycleDate();

Are you seeing your indices transition later than expected? I can try to reproduce this behavior and see whether there's a bug there.

GoodMirek · 2019-05-30T07:18:38Z

Are you seeing your indices transition later than expected?

Yes, I do.

The issue is that lifecycle_date_millis is the same as creation_date. I do not think the issue was in the ILM lifecycle, but that the ILM shrink action did reset both the lifecycle_date_millis and creation_date. I cannot confirm whether the issue happens also in 6.8.0, as I stopped shrinking indices at all.

dakrone · 2019-06-04T16:58:26Z

I ran a test today to try and reproduce this:

First, setting up the poll interval for every 5 seconds:

PUT /_cluster/settings
{
  "transient": {
    "logger.org.elasticsearch.xpack.core.indexlifecycle": "TRACE",
    "logger.org.elasticsearch.xpack.indexlifecycle": "TRACE",
    "indices.lifecycle.poll_interval": "5s"
  }
}

I created a policy that had only shrink and delete, waiting 1 minute to shrink, and deleting when the index is 2 minutes old

PUT _ilm/policy/shrink-only
{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "1m",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          }
        }
      },
      "delete": {
        "min_age": "2m",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

And then create an index using it:

PUT /shrink-test
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0,
    "index.lifecycle.name": "shrink-only"
  }
}

Here are excerpts from the logs.

First, the index is created, ILM is waiting for it to be at least 1 minute old before moving into the "warm" phase:

[elasticsearch] [2019-06-04T10:40:06,677][INFO ][o.e.c.m.MetaDataCreateIndexService] [node-0] [shrink-test] creating index, cause [api], templates [], shards [2]/[0], mappings []
...
[elasticsearch] [2019-06-04T10:40:06,829][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] checking for index age to be at least [1m] before performing actions in the "warm" phase. Now: 1559666406, lifecycle date: 1559666406, age: [159ms/0s]
...
[elasticsearch] [2019-06-04T10:40:36,760][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] checking for index age to be at least [1m] before performing actions in the "warm" phase. Now: 1559666436, lifecycle date: 1559666406, age: [30s/30s]

After being > 1 minute old, it moves to the warm phase and eventually shrinks at 10:41:16 (~70 seconds after it was created), creating the shrink-shrink-test index:

[elasticsearch] [2019-06-04T10:41:06,761][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] checking for index age to be at least [1m] before performing actions in the "warm" phase. Now: 1559666466, lifecycle date: 1559666406, age: [1m/60s]
...
[elasticsearch] [2019-06-04T10:41:16,767][DEBUG][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] moving to step [shrink-only] {"phase":"warm","action":"shrink","name":"wait-for-shard-history-leases"} -> {"phase":"warm","action":"shrink","name":"readonly"}
...
[elasticsearch] [2019-06-04T10:41:16,903][TRACE][o.e.x.i.ExecuteStepsUpdateTask] [node-0] [shrink-test] cluster state step condition met successfully (CheckShrinkReadyStep) [{"phase":"warm","action":"shrink","name":"check-shrink-allocation"}], moving to next step {"phase":"warm","action":"shrink","name":"shrink"}
...
[elasticsearch] [2019-06-04T10:41:16,922][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] maybe running async action step (ShrinkStep) with current step {"phase":"warm","action":"shrink","name":"shrink"}
[elasticsearch] [2019-06-04T10:41:16,923][DEBUG][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] running policy with async action step [{"phase":"warm","action":"shrink","name":"shrink"}]
...
[elasticsearch] [2019-06-04T10:41:16,964][INFO ][o.e.c.m.MetaDataCreateIndexService] [node-0] [shrink-shrink-test] creating index, cause [shrink_index], templates [], shards [1]/[0], mappings []

The original index shrink-test index is deleted:

[elasticsearch] [2019-06-04T10:41:17,199][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node-0] [shrink-test/f2XRN2ZJRiKLHVMmSsDZfQ] deleting index

Now it waits for the shrink-shrink-index to be 2 minutes old before moving into the delete phase:

[elasticsearch] [2019-06-04T10:41:17,290][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-shrink-test] checking for index age to be at least [2m] before performing actions in the "delete" phase. Now: 1559666477, lifecycle date: 1559666406, age: [1.1m/70s]

At 10:42:06 it moves into the delete phase, finally deleting the index at 10:42:11, 2 minutes and 5 seconds after the original index was created:

[elasticsearch] [2019-06-04T10:42:06,758][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-shrink-test] checking for index age to be at least [2m] before performing actions in the "delete" phase. Now: 1559666526, lifecycle date: 1559666406, age: [2m/120s]
[elasticsearch] [2019-06-04T10:42:06,758][DEBUG][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-shrink-test] moving to step [shrink-only] {"phase":"warm","action":"complete","name":"complete"} -> {"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"}
[elasticsearch] [2019-06-04T10:42:06,758][TRACE][o.e.x.i.MoveToNextStepUpdateTask] [node-0] moving [shrink-shrink-test] to next step ({"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"})
...
[elasticsearch] [2019-06-04T10:42:11,777][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node-0] [shrink-shrink-test/0NTGfF_2SEK5islAHhrCrg] deleting index

The behavior you were describing sounds like it would only be deleted after 3 minutes instead of after 2. Are you able to reproduce this behavior in your environment?

GoodMirek · 2019-06-04T18:51:03Z

@dakrone Thanks for your effort.
What version of ES did you use for your test?

My version was 6.7.2, my scenario was a bit different:

I had existing indices without ILM policy attached to them.
I have created ILM policy
I have attached the ILM policy to the existing indices
Then I observed the issue described above.

dakrone · 2019-06-04T19:10:51Z

@GoodMirek I was doing this with the master branch. Maybe it behaves differently if it is pre-existing indices, I'll try it with 6.7.2 and your scenario and see if I can get it to reproduce that way.

dakrone · 2019-06-10T17:43:07Z

I tested the scenario that you described @GoodMirek, but I was unable to reproduce any scenario where the date didn't work correctly for shrunked indices.

GoodMirek · 2019-06-10T17:51:25Z

@dakrone Thanks for your effort. I have no further ideas why and how that happened.

dakrone added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label May 18, 2019

GoodMirek changed the title ~~ILM shrink action does not set copy_settings=true~~ ILM shrink action does not retain creation_date May 19, 2019

GoodMirek closed this as completed Jun 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ILM shrink action does not retain creation_date #42206

ILM shrink action does not retain creation_date #42206

GoodMirek commented May 18, 2019

GoodMirek commented May 18, 2019

elasticmachine commented May 18, 2019

dakrone commented May 18, 2019 •

edited

Loading

GoodMirek commented May 19, 2019

dakrone commented May 29, 2019

GoodMirek commented May 30, 2019 •

edited

Loading

dakrone commented Jun 4, 2019

GoodMirek commented Jun 4, 2019

dakrone commented Jun 4, 2019

dakrone commented Jun 10, 2019

GoodMirek commented Jun 10, 2019

ILM shrink action does not retain creation_date #42206

ILM shrink action does not retain creation_date #42206

Comments

GoodMirek commented May 18, 2019

GoodMirek commented May 18, 2019

elasticmachine commented May 18, 2019

dakrone commented May 18, 2019 • edited Loading

GoodMirek commented May 19, 2019

Use case

dakrone commented May 29, 2019

GoodMirek commented May 30, 2019 • edited Loading

dakrone commented Jun 4, 2019

GoodMirek commented Jun 4, 2019

dakrone commented Jun 4, 2019

dakrone commented Jun 10, 2019

GoodMirek commented Jun 10, 2019

dakrone commented May 18, 2019 •

edited

Loading

GoodMirek commented May 30, 2019 •

edited

Loading