Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ILM shrink action does not retain creation_date #42206

Closed
GoodMirek opened this issue May 18, 2019 · 11 comments
Closed

ILM shrink action does not retain creation_date #42206

GoodMirek opened this issue May 18, 2019 · 11 comments
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management

Comments

@GoodMirek
Copy link
Contributor

Elasticsearch version (bin/elasticsearch --version):
6.7.2 (Elastic Cloud)

Plugins installed: [repository-s3]

JVM version (java -version):
not sure, as it is hosted by elastic

OS version (uname -a if on a Unix-like system):
not sure, as it is hosted by elastic

Description of the problem including expected versus actual behavior:
When index is shrunk by ILM, it does not retain original index settings. Most importantly, it does not retain creation_date, what causes that further ILM processing does not happen at originally planned time.
Similar to elastic/curator#1347

Steps to reproduce:

  1. Create an ILM policy that will shrink an index in warm phase
  2. Associate a policy to an existing index, which is old enough to trigger warm phase
  3. Observe settings of shrunk index
@GoodMirek
Copy link
Contributor Author

It seems the issue has already been fixed for 6.8 release by this commit.
However, I am not able to validate it will work now.

@dakrone dakrone added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label May 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@dakrone
Copy link
Member

dakrone commented May 18, 2019

The 6.7.2 release tag does copy the settings with the resize request:

Most importantly, it does not retain creation_date, what causes that further ILM processing does not happen at originally planned time.

Can you share more of your policy and example use case so I can understand what you're trying to do?

@GoodMirek GoodMirek changed the title ILM shrink action does not set copy_settings=true ILM shrink action does not retain creation_date May 19, 2019
@GoodMirek
Copy link
Contributor Author

@dakrone Thanks for your answer. So there must be some other issue, as I still experience the described behavior.

Please find my ILM policy below:

{
  "logs-ilm-policy" : {
    "version" : 4,
    "modified_date" : "2019-05-18T19:25:43.876Z",
    "policy" : {
      "phases" : {
        "warm" : {
          "min_age" : "8d",
          "actions" : {
            "forcemerge" : {
              "max_num_segments" : 1
            },
            "set_priority" : {
              "priority" : 50
            },
            "shrink" : {
              "number_of_shards" : 1
            }
          }
        },
        "cold" : {
          "min_age" : "21d",
          "actions" : {
            "allocate" : {
              "number_of_replicas" : 0,
              "include" : { },
              "exclude" : { },
              "require" : {
                "data" : "warm"
              }
            },
            "set_priority" : {
              "priority" : 10
            }
          }
        },
        "hot" : {
          "min_age" : "0ms",
          "actions" : {
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "delete" : {
          "min_age" : "100d",
          "actions" : {
            "delete" : { }
          }
        }
      }
    }
  }
}

What happens is that after transition to warm phase the index.creation_date is reset to the current time. The issue did not happen for indices which already had just one shard, thus were not shrunk by the shrink action.
Also, running POST test-1/_shrink/shrink-test-1?copy_settings=true retains the index.creation_date.

Use case

There are multiple logs indices. Some of them are created with time pattern in their name, e.g. "logs_web.%{+xxxx.ww}", which creates a new index every week. It is important that none of the indices is written again after 8 days since their creation. At that moment, I want to speed up search and save memory, so I shrink all indices to one shard, forcemerge them to one segment and also set their priority to 50. Everything works well, goal is achieved, but the issue with index.creation_date causes shrunk indices to transition to cold and delete phases eight days later than indices which were not shrunk.

@dakrone
Copy link
Member

dakrone commented May 29, 2019

Everything works well, goal is achieved, but the issue with index.creation_date causes shrunk indices to transition to cold and delete phases eight days later than indices which were not shrunk.

ILM doesn't use index.creation_date for determining when to transition to the next phase (like cold or delete). In the "init" phase for an index (injected as the first step in any policy) we set custom metadata for when the index was created, see:

This is what is used when determining whether to transition to a state:

Are you seeing your indices transition later than expected? I can try to reproduce this behavior and see whether there's a bug there.

@GoodMirek
Copy link
Contributor Author

GoodMirek commented May 30, 2019

Are you seeing your indices transition later than expected?

Yes, I do.

The issue is that lifecycle_date_millis is the same as creation_date. I do not think the issue was in the ILM lifecycle, but that the ILM shrink action did reset both the lifecycle_date_millis and creation_date. I cannot confirm whether the issue happens also in 6.8.0, as I stopped shrinking indices at all.

@dakrone
Copy link
Member

dakrone commented Jun 4, 2019

I ran a test today to try and reproduce this:

First, setting up the poll interval for every 5 seconds:

PUT /_cluster/settings
{
  "transient": {
    "logger.org.elasticsearch.xpack.core.indexlifecycle": "TRACE",
    "logger.org.elasticsearch.xpack.indexlifecycle": "TRACE",
    "indices.lifecycle.poll_interval": "5s"
  }
}

I created a policy that had only shrink and delete, waiting 1 minute to shrink, and deleting when the index is 2 minutes old

PUT _ilm/policy/shrink-only
{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "1m",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          }
        }
      },
      "delete": {
        "min_age": "2m",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

And then create an index using it:

PUT /shrink-test
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0,
    "index.lifecycle.name": "shrink-only"
  }
}

Here are excerpts from the logs.

First, the index is created, ILM is waiting for it to be at least 1 minute old before moving into the "warm" phase:

[elasticsearch] [2019-06-04T10:40:06,677][INFO ][o.e.c.m.MetaDataCreateIndexService] [node-0] [shrink-test] creating index, cause [api], templates [], shards [2]/[0], mappings []
...
[elasticsearch] [2019-06-04T10:40:06,829][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] checking for index age to be at least [1m] before performing actions in the "warm" phase. Now: 1559666406, lifecycle date: 1559666406, age: [159ms/0s]
...
[elasticsearch] [2019-06-04T10:40:36,760][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] checking for index age to be at least [1m] before performing actions in the "warm" phase. Now: 1559666436, lifecycle date: 1559666406, age: [30s/30s]

After being > 1 minute old, it moves to the warm phase and eventually shrinks at 10:41:16 (~70 seconds after it was created), creating the shrink-shrink-test index:

[elasticsearch] [2019-06-04T10:41:06,761][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] checking for index age to be at least [1m] before performing actions in the "warm" phase. Now: 1559666466, lifecycle date: 1559666406, age: [1m/60s]
...
[elasticsearch] [2019-06-04T10:41:16,767][DEBUG][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] moving to step [shrink-only] {"phase":"warm","action":"shrink","name":"wait-for-shard-history-leases"} -> {"phase":"warm","action":"shrink","name":"readonly"}
...
[elasticsearch] [2019-06-04T10:41:16,903][TRACE][o.e.x.i.ExecuteStepsUpdateTask] [node-0] [shrink-test] cluster state step condition met successfully (CheckShrinkReadyStep) [{"phase":"warm","action":"shrink","name":"check-shrink-allocation"}], moving to next step {"phase":"warm","action":"shrink","name":"shrink"}
...
[elasticsearch] [2019-06-04T10:41:16,922][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] maybe running async action step (ShrinkStep) with current step {"phase":"warm","action":"shrink","name":"shrink"}
[elasticsearch] [2019-06-04T10:41:16,923][DEBUG][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-test] running policy with async action step [{"phase":"warm","action":"shrink","name":"shrink"}]
...
[elasticsearch] [2019-06-04T10:41:16,964][INFO ][o.e.c.m.MetaDataCreateIndexService] [node-0] [shrink-shrink-test] creating index, cause [shrink_index], templates [], shards [1]/[0], mappings []

The original index shrink-test index is deleted:

[elasticsearch] [2019-06-04T10:41:17,199][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node-0] [shrink-test/f2XRN2ZJRiKLHVMmSsDZfQ] deleting index

Now it waits for the shrink-shrink-index to be 2 minutes old before moving into the delete phase:

[elasticsearch] [2019-06-04T10:41:17,290][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-shrink-test] checking for index age to be at least [2m] before performing actions in the "delete" phase. Now: 1559666477, lifecycle date: 1559666406, age: [1.1m/70s]

At 10:42:06 it moves into the delete phase, finally deleting the index at 10:42:11, 2 minutes and 5 seconds after the original index was created:

[elasticsearch] [2019-06-04T10:42:06,758][TRACE][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-shrink-test] checking for index age to be at least [2m] before performing actions in the "delete" phase. Now: 1559666526, lifecycle date: 1559666406, age: [2m/120s]
[elasticsearch] [2019-06-04T10:42:06,758][DEBUG][o.e.x.i.IndexLifecycleRunner] [node-0] [shrink-shrink-test] moving to step [shrink-only] {"phase":"warm","action":"complete","name":"complete"} -> {"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"}
[elasticsearch] [2019-06-04T10:42:06,758][TRACE][o.e.x.i.MoveToNextStepUpdateTask] [node-0] moving [shrink-shrink-test] to next step ({"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"})
...
[elasticsearch] [2019-06-04T10:42:11,777][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node-0] [shrink-shrink-test/0NTGfF_2SEK5islAHhrCrg] deleting index

The behavior you were describing sounds like it would only be deleted after 3 minutes instead of after 2. Are you able to reproduce this behavior in your environment?

@GoodMirek
Copy link
Contributor Author

@dakrone Thanks for your effort.
What version of ES did you use for your test?

My version was 6.7.2, my scenario was a bit different:

  1. I had existing indices without ILM policy attached to them.
  2. I have created ILM policy
  3. I have attached the ILM policy to the existing indices
  4. Then I observed the issue described above.

@dakrone
Copy link
Member

dakrone commented Jun 4, 2019

@GoodMirek I was doing this with the master branch. Maybe it behaves differently if it is pre-existing indices, I'll try it with 6.7.2 and your scenario and see if I can get it to reproduce that way.

@dakrone
Copy link
Member

dakrone commented Jun 10, 2019

I tested the scenario that you described @GoodMirek, but I was unable to reproduce any scenario where the date didn't work correctly for shrunked indices.

@GoodMirek
Copy link
Contributor Author

@dakrone Thanks for your effort. I have no further ideas why and how that happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management
Projects
None yet
Development

No branches or pull requests

3 participants