Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dangling meta character '*' near index 0 #24749

Closed
marc-lebourdais opened this issue May 17, 2017 · 10 comments
Closed

Dangling meta character '*' near index 0 #24749

marc-lebourdais opened this issue May 17, 2017 · 10 comments
Labels
>bug help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@marc-lebourdais
Copy link

Elasticsearch version: 2.4.1 (docker)

Plugins installed: [ license, marvel-agent ]

JVM version (java -version): java-8-openjdk-amd64 (openjdk:8-jre docker)

OS version (uname -a if on a Unix-like system): Linux HOSTNAME 4.4.0-34-generic #53~14.04.1-Ubuntu SMP Wed Jul 27 16:56:40 UTC 2016 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior: Elasticsearch is periodically (and seemingly randomly) rejecting log messages due to a mapper parsing exception with a nested pattern syntax expression. See below for a sample log message. There is nothing special in our setup. We have separated master/client/data nodes. We have two index templates, one which matches all indexes "" at priority 0 and one for each dedicated index, e.g. "storage-" at priority 1. Each of those index templates defines some index properties (that don't overlap), as well as some mapping properties and dynamic mapping templates, e.g.:

{
  "long_fields": {
    "mapping": {
      "doc_values": true,
      "type": "long"
    },
    "match_mapping_type": "long",
    "match": "*"
  }
}

The issue only affects some logs in each index. The logs it affects don't have anything special in their content either. The error appears to be erroneous, but any insight would be greatly appreciated.

Steps to reproduce:

  1. Run elasticsearch cluster
  2. Index data
  3. See problem

Provide logs (if relevant):

[2017-05-17 17:43:51,725][DEBUG][action.bulk              ] [d-d08p03r05u05-t0g2] [storage-2017.05.17][0] failed to execute bulk item (index) index {[storage-2017.05.17][events][AVwXgkKowptZJ_-jLodQ], source[REDACTED_JSON_SOURCE]}
MapperParsingException[failed to parse]; nested: PatternSyntaxException[Dangling meta character '*' near index 0
@spinscale
Copy link
Contributor

spinscale commented May 18, 2017

hey, can you provide a full stacktrace from your logs?

@spinscale spinscale added :Search Foundations/Mapping Index mappings, including merging and defining field types feedback_needed labels May 18, 2017
@clintongormley
Copy link
Contributor

...plus the full index templates that you're using, and some examples of failed docs

@clintongormley
Copy link
Contributor

Also, are you using an ingest pipeline?

@marc-lebourdais
Copy link
Author

With some values obscured, here's the full stack trace with the document data:

[2017-05-17 23:37:17,348][DEBUG][action.bulk              ] [d-d08p03r05u05-t0g2] [storage-2017.05.17][0] failed to execute bulk item (index) index {[storage-2017.05.17][events][AVwYxdUg3Xm6BAlG1BiH], source[{"@timestamp":"2017-05-17T23:37:17.216887+00:00","syslog_host":"somehost","syslog_program":"cleaner_tuna","syslog_severity":"info","syslog_facility":"local7","syslog_tag":"cleaner_tuna[1]:","syslog_region":"someregion","noidx_rawmsg":"<190>2017-05-17T23:37:17.216887+00:00 somehost cleaner_tuna[1]: @cee:{\"egid\":999,\"eid\":999,\"env\":\"production\",\"host\":\"ephemeralhost\",\"intent\":{\"Intent\":{\"DeleteSnapshot\":{\"snapshot\":{\"allocation_id\":\"someallocationid\",\"backend\":{\"Details\":{\"Ceph\":{\"pool\":\"rbd\"}},\"cluster_id\":\"someclusterid\"},\"created_at\":\"2017-03-23T22:20:12Z\",\"id\":\"someid\",\"lifecycle_event_count\":1,\"name\":\"somename\",\"old_ceph_details\":{\"cluster\":\"somecluster\",\"pool\":\"rbd\"},\"parent_size_bytes\":107374182400,\"pending_events\":[{\"Event\":{\"SnapshotCreate\":{\"created_at\":\"2017-03-23T22:20:12Z\",\"name\":\"somename\"}},\"parent_id\":\"someparentid\",\"user_id\":117102}],\"user_id\":123456}}},\"claimed_at\":\"2017-05-17T23:37:17Z\",\"created_at\":\"2017-03-23T22:22:13Z\",\"delay_secs\":120,\"id\":\"someid\",\"owner\":{\"name\":\"somename\"}},\"level\":\"info\",\"msg\":\"releasing intent\",\"pid\":1,\"pname\":\"\/cleanerd\",\"time\":\"2017-05-17T23:37:17Z\",\"version\":\"d8c41ea964\"}", "egid": 999, "eid": 999, "env": "production", "host": "cleaner-2795265912-tqeu6", "intent": { "Intent": { "DeleteSnapshot": { "snapshot": { "allocation_id": "someallocationid", "backend": { "Details": { "Ceph": { "pool": "rbd" } }, "cluster_id": "fra1-prod" }, "created_at": "2017-03-23T22:20:12Z", "id": "someid", "lifecycle_event_count": 1, "name": "volume-fra1-test-snapshot-1", "old_ceph_details": { "cluster": "fra1-prod", "pool": "rbd" }, "parent_size_bytes": 107374182400, "pending_events": [ { "Event": { "SnapshotCreate": { "created_at": "2017-03-23T22:20:12Z", "name": "somename" } }, "parent_id": "someparentid", "user_id": 12345 } ], "user_id": 12345 } } }, "claimed_at": "2017-05-17T23:37:17Z", "created_at": "2017-03-23T22:22:13Z", "delay_secs": 120, "id": "someid", "owner": { "name": "somename" } }, "level": "info", "msg": "releasing intent", "pid": 1, "pname": "\/cleanerd", "time": "2017-05-17T23:37:17Z", "version": "d8c41ea964" }]}
MapperParsingException[failed to parse]; nested: PatternSyntaxException[Dangling meta character '*' near index 0
*
^];
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:156)
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
	at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
	at org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
	at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:214)
	at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:223)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:327)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:120)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:657)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:287)
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
	at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*
^
	at java.util.regex.Pattern.error(Pattern.java:1955)
	at java.util.regex.Pattern.sequence(Pattern.java:2123)
	at java.util.regex.Pattern.expr(Pattern.java:1996)
	at java.util.regex.Pattern.compile(Pattern.java:1696)
	at java.util.regex.Pattern.<init>(Pattern.java:1351)
	at java.util.regex.Pattern.compile(Pattern.java:1028)
	at java.util.regex.Pattern.matches(Pattern.java:1133)
	at java.lang.String.matches(String.java:2121)
	at org.elasticsearch.index.mapper.object.DynamicTemplate.patternMatch(DynamicTemplate.java:163)
	at org.elasticsearch.index.mapper.object.DynamicTemplate.match(DynamicTemplate.java:131)
	at org.elasticsearch.index.mapper.object.RootObjectMapper.findTemplate(RootObjectMapper.java:263)
	at org.elasticsearch.index.mapper.object.RootObjectMapper.findTemplateBuilder(RootObjectMapper.java:248)
	at org.elasticsearch.index.mapper.object.RootObjectMapper.findTemplateBuilder(RootObjectMapper.java:244)
	at org.elasticsearch.index.mapper.DocumentParser.createBuilderFromDynamicValue(DocumentParser.java:557)
	at org.elasticsearch.index.mapper.DocumentParser.parseDynamicValue(DocumentParser.java:619)
	at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:444)
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:264)
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:308)
	at org.elasticsearch.index.mapper.DocumentParser.parseAndMergeUpdate(DocumentParser.java:740)
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:354)
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:254)
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:308)
	at org.elasticsearch.index.mapper.DocumentParser.parseAndMergeUpdate(DocumentParser.java:740)
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:354)
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:254)
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:308)
	at org.elasticsearch.index.mapper.DocumentParser.parseAndMergeUpdate(DocumentParser.java:740)
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:354)
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:254)
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:308)
	at org.elasticsearch.index.mapper.DocumentParser.parseAndMergeUpdate(DocumentParser.java:740)
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:354)
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:254)
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:124)
	... 18 more

Our setup is has queue-backed rsyslog instances writing bulk indexing requests to ES. The full index templates are as follows:

{
  "template": "*",
  "order": 0,
  "settings": {
    "index": {
      "refresh_interval": "5s"
    }
  },
  "mappings": {
    "_default_": {
      "properties": {
        "geoip": {
          "properties": {
            "longitude": {
              "doc_values": true,
              "type": "float"
            },
            "latitude": {
              "doc_values": true,
              "type": "float"
            },
            "location": {
              "doc_values": true,
              "type": "geo_point"
            },
            "ip": {
              "doc_values": true,
              "type": "ip"
            }
          },
          "dynamic": true,
          "type": "object"
        },
        "@version": {
          "doc_values": true,
          "index": "not_analyzed",
          "type": "string"
        },
        "@timestamp": {
          "doc_values": true,
          "type": "date"
        }
      },
      "dynamic_templates": [
        {
          "numeric_fields": {
            "mapping": {
              "index": "not_analyzed",
              "type": "double"
            },
            "match": "(?i)(.*[_-])?(code|size|length|count|seconds?|sec|us|ms|s)",
            "unmatch": "grpc_code",
            "match_pattern": "regex"
          }
        },
        {
          "date_fields": {
            "mapping": {
              "index": "not_analyzed",
              "type": "date"
            },
            "path_match": "(?i)(.*[_-])?(timestamp|time|at)",
            "match_pattern": "regex"
          }
        },
        {
          "large_text_fields": {
            "mapping": {
              "index": "no",
              "type": "string"
            },
            "path_match": "(?i)(noidx.*)",
            "match_pattern": "regex"
          }
        },
        {
          "message_field": {
            "mapping": {
              "omit_norms": true,
              "index": "analyzed",
              "type": "string"
            },
            "match_mapping_type": "string",
            "match": "message"
          }
        },
        {
          "id_field": {
            "mapping": {
              "fields": {
                "raw": {
                  "ignore_above": 256,
                  "doc_values": true,
                  "index": "not_analyzed",
                  "type": "string"
                }
              },
              "omit_norms": true,
              "index": "analyzed",
              "type": "string"
            },
            "type": "string",
            "match": "*_id"
          }
        },
        {
          "process_id_field": {
            "mapping": {
              "fields": {
                "raw": {
                  "ignore_above": 256,
                  "doc_values": true,
                  "index": "not_analyzed",
                  "type": "string"
                }
              },
              "omit_norms": true,
              "index": "analyzed",
              "type": "string"
            },
            "type": "string",
            "match": "pid"
          }
        },
        {
          "req_params_image_field": {
            "mapping": {
              "fields": {
                "raw": {
                  "ignore_above": 256,
                  "doc_values": true,
                  "index": "not_analyzed",
                  "type": "string"
                }
              },
              "omit_norms": true,
              "index": "analyzed",
              "type": "string"
            },
            "type": "string",
            "match": "req_params_image"
          }
        },
        {
          "string_fields": {
            "mapping": {
              "fields": {
                "raw": {
                  "ignore_above": 256,
                  "doc_values": true,
                  "index": "not_analyzed",
                  "type": "string"
                }
              },
              "omit_norms": true,
              "index": "analyzed",
              "type": "string"
            },
            "match_mapping_type": "string",
            "match": "*"
          }
        },
        {
          "float_fields": {
            "mapping": {
              "doc_values": true,
              "type": "float"
            },
            "match_mapping_type": "float",
            "match": "*"
          }
        },
        {
          "double_fields": {
            "mapping": {
              "doc_values": true,
              "type": "double"
            },
            "match_mapping_type": "double",
            "match": "*"
          }
        },
        {
          "byte_fields": {
            "mapping": {
              "doc_values": true,
              "type": "byte"
            },
            "match_mapping_type": "byte",
            "match": "*"
          }
        },
        {
          "short_fields": {
            "mapping": {
              "doc_values": true,
              "type": "short"
            },
            "match_mapping_type": "short",
            "match": "*"
          }
        },
        {
          "integer_fields": {
            "mapping": {
              "doc_values": true,
              "type": "integer"
            },
            "match_mapping_type": "integer",
            "match": "*"
          }
        },
        {
          "long_fields": {
            "mapping": {
              "doc_values": true,
              "type": "long"
            },
            "match_mapping_type": "long",
            "match": "*"
          }
        },
        {
          "date_fields": {
            "mapping": {
              "doc_values": true,
              "type": "date"
            },
            "match_mapping_type": "date",
            "match": "*"
          }
        },
        {
          "geo_point_fields": {
            "mapping": {
              "doc_values": true,
              "type": "geo_point"
            },
            "match_mapping_type": "geo_point",
            "match": "*"
          }
        }
      ],
      "_all": {
        "omit_norms": true,
        "enabled": true
      }
    }
  }
}
{
  "template": "storage-*",
  "order" : 1,
  "settings": {
    "number_of_shards" : "1",
    "number_of_replicas": "2"
  }
}

@clintongormley
Copy link
Contributor

Do you have any examples of failed documents?

@marc-lebourdais
Copy link
Author

There was an example in the exception trace:

{
  "@timestamp": "2017-05-17T23:37:17.216887+00:00",
  "syslog_host": "somehost",
  "syslog_program": "cleaner_tuna",
  "syslog_severity": "info",
  "syslog_facility": "local7",
  "syslog_tag": "cleaner_tuna[1]:",
  "syslog_region": "someregion",
  "noidx_rawmsg": "<190>2017-05-17T23:37:17.216887+00:00 somehost cleaner_tuna[1]: @cee:{\"egid\":999,\"eid\":999,\"env\":\"production\",\"host\":\"ephemeralhost\",\"intent\":{\"Intent\":{\"DeleteSnapshot\":{\"snapshot\":{\"allocation_id\":\"someallocationid\",\"backend\":{\"Details\":{\"Ceph\":{\"pool\":\"rbd\"}},\"cluster_id\":\"someclusterid\"},\"created_at\":\"2017-03-23T22:20:12Z\",\"id\":\"someid\",\"lifecycle_event_count\":1,\"name\":\"somename\",\"old_ceph_details\":{\"cluster\":\"somecluster\",\"pool\":\"rbd\"},\"parent_size_bytes\":107374182400,\"pending_events\":[{\"Event\":{\"SnapshotCreate\":{\"created_at\":\"2017-03-23T22:20:12Z\",\"name\":\"somename\"}},\"parent_id\":\"someparentid\",\"user_id\":117102}],\"user_id\":123456}}},\"claimed_at\":\"2017-05-17T23:37:17Z\",\"created_at\":\"2017-03-23T22:22:13Z\",\"delay_secs\":120,\"id\":\"someid\",\"owner\":{\"name\":\"somename\"}},\"level\":\"info\",\"msg\":\"releasing intent\",\"pid\":1,\"pname\":\"\/cleanerd\",\"time\":\"2017-05-17T23:37:17Z\",\"version\":\"d8c41ea964\"}",
  "egid": 999,
  "eid": 999,
  "env": "production",
  "host": "cleaner-2795265912-tqeu6",
  "intent": {
    "Intent": {
      "DeleteSnapshot": {
        "snapshot": {
          "allocation_id": "someallocationid",
          "backend": {
            "Details": {
              "Ceph": {
                "pool": "rbd"
              }
            },
            "cluster_id": "fra1-prod"
          },
          "created_at": "2017-03-23T22:20:12Z",
          "id": "someid",
          "lifecycle_event_count": 1,
          "name": "somename",
          "old_ceph_details": {
            "cluster": "fra1-prod",
            "pool": "rbd"
          },
          "parent_size_bytes": 107374182400,
          "pending_events": [
            {
              "Event": {
                "SnapshotCreate": {
                  "created_at": "2017-03-23T22:20:12Z",
                  "name": "somename"
                }
              },
              "parent_id": "someparentid",
              "user_id": 12345
            }
          ],
          "user_id": 12345
        }
      }
    },
    "claimed_at": "2017-05-17T23:37:17Z",
    "created_at": "2017-03-23T22:22:13Z",
    "delay_secs": 120,
    "id": "someid",
    "owner": {
      "name": "somename"
    }
  },
  "level": "info",
  "msg": "releasing intent",
  "pid": 1,
  "pname": "\/cleanerd",
  "time": "2017-05-17T23:37:17Z",
  "version": "d8c41ea964"
}

@clintongormley
Copy link
Contributor

sorry, didn't see that

@clintongormley
Copy link
Contributor

Hmmm nothing obvious that I can see. Your regexes should use [_\-] instead of [_-], but I doubt that's the issue. It doesn't replicate for me, but you said it happens only sometimes. I wonder if a simple pattern like * is being parsed as a regex somewhere?

Have you been able to narrow it down at all? eg removing parts of the template to see if the problem goes away? I'd also be interested to know if upgrading ES or Java helps.

I don't have any ideas - I'll leave it for somebody else to investigate.

@marc-lebourdais
Copy link
Author

This is our production cluster ingesting about 2TB of log data per day, with ~40TB total data. We don't have a lot of freedom to play around with arbitrary tests to see if the issue goes away, when it's seemingly sporadic as is. We're already on the 2.4.1 series running in docker, so I could bump to 2.4.4 but I don't have any reason to believe that that would fix the underlying problem.

At this point, we don't have to operational capacity to migrate to 5.x, so I was hoping the issue would be pretty obvious from the stack trace.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Mar 13, 2018
Today you would only get these errors at index time.

Relates elastic#24749
@jpountz
Copy link
Contributor

jpountz commented Mar 13, 2018

I don't think 2.4.4 would fix the problem either. I suspect @clintongormley is right that a simple pattern is sometimes parsed as a regex even though it shouldn't. The fact that you mentioned that this issue is sporadic also leads me to thinking that it might be specific to one node of the cluster.

I will close this issue and improve validation of the match pattern for now: #29013. Let's revisit if/when someone reports this issue with a newer version. We will have more information to understand what is happening.

@jpountz jpountz closed this as completed Mar 13, 2018
jpountz added a commit that referenced this issue Mar 15, 2018
Today you would only get these errors at index time.

Relates #24749
jpountz added a commit that referenced this issue Mar 15, 2018
Today you would only get these errors at index time.

Relates #24749
@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants