Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.0.0-rc Painless support for working with dates #21121

Closed
niemyjski opened this issue Oct 25, 2016 · 31 comments
Closed

5.0.0-rc Painless support for working with dates #21121

niemyjski opened this issue Oct 25, 2016 · 31 comments
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache discuss

Comments

@niemyjski
Copy link
Contributor

I'm upgrading from groovy to painless and it's not so painless so far. I wanted to post this here as I think this would be a pretty common scenario and I'm having trouble with dates (gotta love dates)

PUT /test-stacks-v1
{
  "aliases": {
    "test-stacks": {}
  },
  "mappings": {
    "stacks": {
      "_all": {
        "enabled": false
      },
      "dynamic": false,
      "properties": {
        "id": {
          "type": "keyword"
        },
        "first_occurrence": {
          "type": "date"
        },
        "last_occurrence": {
          "type": "date"
        },
        "total_occurrences": {
          "type": "float"
        }
      }
    }
  }
}

PUT /test-stacks/stacks/1ecd0826e447a44e78877ab1
{
  "id": "1ecd0826e447a44e78877ab1",
  "total_occurrences": 0,
  "first_occurrence": "0001-01-01T00:00:00",
  "last_occurrence": "0001-01-01T00:00:00"
}

POST /test-stacks/stacks/1ecd0826e447a44e78877ab1/_update?retry_on_conflict=3
{
  "script": { 
    "params": {
      "minOccurrenceDateUtc": "2016-10-25T22:04:19.3272056Z",
      "maxOccurrenceDateUtc": "2016-10-25T22:04:19.3272056Z",
      "count": 1
    },
    "inline": "if (ctx._source.total_occurrences == 0 || ctx._source.first_occurrence > params.minOccurrenceDateUtc) { ctx._source.first_occurrence = params.minOccurrenceDateUtc; } if (ctx._source.last_occurrence < params.maxOccurrenceDateUtc) { ctx._source.last_occurrence = params.maxOccurrenceDateUtc; } ctx._source.total_occurrences += params.count;"
  }
}

The last script update returns

json
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[8mJiEA3][local[1]][indices:data/write/update[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "runtime error",
"caused_by": {
"type": "class_cast_exception",
"reason": "Cannot apply [<] operation to types [java.lang.String] and [java.lang.String]."
},
"script_stack": [
"if (ctx._source.last_occurrence < params.maxOccurrenceDateUtc) { ",
" ^---- HERE"
],
"script": "if (ctx._source.total_occurrences == 0 || ctx._source.first_occurrence > params.minOccurrenceDateUtc) { ctx._source.first_occurrence = params.minOccurrenceDateUtc; } if (ctx._source.last_occurrence < params.maxOccurrenceDateUtc) { ctx._source.last_occurrence = params.maxOccurrenceDateUtc; } ctx._source.total_occurrences += params.count;",
"lang": "painless"
}
},
"status": 400
}


The original groovy script worked flawlessly in 1.7.5 (including linebreaks)

``` cs
            var request = new UpdateRequest<Stack, Stack>(GetIndexById(stackId), ElasticType.Type, stackId) {
                RetryOnConflict = 3,
                Script = @"if (ctx._source.total_occurrences == 0 || ctx._source.first_occurrence > minOccurrenceDateUtc) {
                            ctx._source.first_occurrence = minOccurrenceDateUtc;
                            }
                            if (ctx._source.last_occurrence < maxOccurrenceDateUtc) {
                            ctx._source.last_occurrence = maxOccurrenceDateUtc;
                            }
                            ctx._source.total_occurrences += count;",
                Params = new Dictionary<string, object>(3) {
                    { "minOccurrenceDateUtc", minOccurrenceDateUtc },
                    { "maxOccurrenceDateUtc", maxOccurrenceDateUtc },
                    { "count", count }
                }
            };

Is there anything special I need to do to cast a field (which is mapped as a date) and stored as a string to a date reliably?

@niemyjski niemyjski changed the title 5.0.0-rc Help with upgrading groovy script to painless 5.0.0-rc Help with upgrading groovy script containing dates to painless Oct 25, 2016
@niemyjski
Copy link
Contributor Author

I'll try to update (https://www.elastic.co/guide/en/elasticsearch/reference/5.0/docs-update.html) with a date example once this is figured out :)

@niemyjski
Copy link
Contributor Author

cc @jdconrad #17992

@jasontedor
Copy link
Member

Elastic reserves GitHub for bug reports and feature requests; please post general questions on the Elastic Discourse forum.

@niemyjski
Copy link
Contributor Author

niemyjski commented Oct 25, 2016

Considering you don't have a single painless test for dates or docs (https://github.com/elastic/elasticsearch/blob/master/modules/lang-painless/src/test/java/org/elasticsearch/painless/CastTests.java) I think this is a bug/feature report.

@niemyjski niemyjski changed the title 5.0.0-rc Help with upgrading groovy script containing dates to painless 5.0.0-rc Painless support for working with dates Oct 25, 2016
@niemyjski
Copy link
Contributor Author

niemyjski commented Oct 25, 2016

I've created a post on https://discuss.elastic.co/t/comparing-dates-in-painless/63943 and renamed this title

@niemyjski
Copy link
Contributor Author

Looks like it's almost impossible to parse an ISO-8601 date with painless (http://stackoverflow.com/questions/2201925/converting-iso-8601-compliant-string-to-java-util-date) am I missing something?

@niemyjski
Copy link
Contributor Author

niemyjski commented Oct 25, 2016

To anyone wondering I was able to get this working at least locally, I'm not sure how reliable it is at parsing different date formats as I can't specify Zulu because min date time doesn't have it (pre epoch)

def sf = new SimpleDateFormat(\""yyyy-MM-dd'T'HH:mm:ss\"");
if (ctx._source.total_occurrences == 0 || sf.parse(ctx._source.first_occurrence).after(sf.parse(params.minOccurrenceDateUtc))) {
  ctx._source.first_occurrence = params.minOccurrenceDateUtc;
}
if (sf.parse(ctx._source.last_occurrence).before(sf.parse(params.maxOccurrenceDateUtc))) {
  ctx._source.last_occurrence = params.maxOccurrenceDateUtc;
}
ctx._source.total_occurrences += params.count;

@nik9000
Copy link
Member

nik9000 commented Oct 31, 2016

I'm reopening this because I think it is interesting. I mean, you have a workaround but it kind of sucks for a few reasons:

  1. We end up re-parsing the params on every execution.
  2. It isn't painless.
  3. first_occurrence is a date and should be exposed to painless as a date.
  4. We really need to document examples like this.

I don't know how many changes should stem from this issue but my instinct is "a couple".

I do want to be clear on point number 2. Painless's list of priorities are:

  1. Security
  2. Performance
  3. Awesomeness/Painlessness
  4. Compatibility with Groovy

The work around you have has trouble with 2 and 3 and 4. I don't know if we'll ever get 4, but we should be able to get 2 and 3.

@niemyjski
Copy link
Contributor Author

Yeah, I ended up having to do even more work as stated in here (https://discuss.elastic.co/t/comparing-dates-in-painless/63943). I had to create a function to parse the date into a Instant (in a try block) and return a default value. It definitely wasn't straight forward. I appreciate you reopening this.

@jdconrad
Copy link
Contributor

jdconrad commented Oct 31, 2016

@nik9000 Point 1 to me is certainly the most interesting here. On point 3, I thought @rjernst made the change to Date fields to expose these things as dates? Maybe I'm misremembering, though. I think for compatibility with Groovy, it's best effort, but if something doesn't make sense or we agree it's a poor feature we should definitely not be forced to implement it. Groovy has a lot of really insecure/overly lenient features.

@rjernst
Copy link
Member

rjernst commented Nov 2, 2016

The change I made was for booleans, and only applicable for doc values (which are not available in update scripts).

@niemyjski
Copy link
Contributor Author

Some additional feedback. I seem to be hitting more issues around this (I'm working to get to the bottom of this.).. I have a date "date": "2017-01-03T14:12:36.2770782-06:00" and I'm doing an explain on it .

_explain
{
  "query": {
    "script": {
      "script": "Debug.explain(doc.date)"
    }
  }
}

which returns

        "type": "script_exception",
        "reason": "runtime error",
        "to_string": "[1483474356277]",

So I tried to convert it by doing both Instant.parse(doc.date) returns "org.elasticsearch.index.fielddata.ScriptDocValues$Longs cannot be cast to java.lang.CharSequence" and new Date(doc.date) returns "org.elasticsearch.index.fielddata.ScriptDocValues$Longs cannot be cast to java.lang.Long" Seems at least on the second one that the conversion should be much easier in painless.

@rjernst
Copy link
Member

rjernst commented Jan 3, 2017

and new Date(doc.date) returns "org.elasticsearch.index.fielddata.ScriptDocValues$Longs cannot be cast to java.lang.Long"

ScriptDocValues$Longs already provides a date accessor. Use doc['date'].date.

@niemyjski
Copy link
Contributor Author

niemyjski commented Jan 3, 2017

@rjernst good to know, I just also figured I could do this as well (get the utc time) def x = OffsetDateTime.parse('2017-01-01T00:00:00.0000000+12:00'); Debug.explain(x.toInstant())

@niemyjski
Copy link
Contributor Author

@rjernst is that (doc['date'.date) ]available on reindex scripts? Getting a null reference when I try that outside of regular explain.

@rjernst
Copy link
Member

rjernst commented Jan 3, 2017

That would be available in any script that has doc available. I think that is just search scripts. But if you see a message about ScriptDocValues, it should be available.

@nik9000
Copy link
Member

nik9000 commented Jan 3, 2017

In reindex ctx._source.date should return a date. You should be able to have a look at it with Debug.explain(ctx._source.date) (5.1.1+). That should return an error with the class of the date and you can go from there. This has some instructions that should be useful. I'd love to have a proper API reference to point you at and I've been working on and off on it for a bit. I've uncovered quite a few crabs along the way though....

@niemyjski
Copy link
Contributor Author

niemyjski commented Jan 3, 2017

@nik9000 thanks!, looks like it's a string, I can do OffsetDateTime.parse(ctx._source.date).toInstant();

@cp2587
Copy link

cp2587 commented Feb 6, 2017

Hello,

I also encounter a lot of painful problems when trying to work with datetime in painless... For instance, i wanted to apply a TZ to two dates and get the difference but cannot find a way to do it in painless...

In groovy, i could do something like:
(doc['t1'].date.setZone(org.joda.time.DateTimeZone.forID(tz)) - doc['t2'].date.setZone(org.joda.time.DateTimeZone.forID(tz))).days

Do you plan to improve this aspect of painless or do i need to keep using groovy if i want to do complex operations on date object ?

@clintongormley
Copy link
Contributor

@cp2587 see #22875 and #22948

@nik9000
Copy link
Member

nik9000 commented Feb 6, 2017

(doc['t1'].date.setZone(org.joda.time.DateTimeZone.forID(tz)) - doc['t2'].date.setZone(org.joda.time.DateTimeZone.forID(tz))).days

I don't think a lot of these are whitelisted. I'm not sure why, but right now most of java's time API is whitelisted and joda isn't.... I expect something like this should work:

ZoneId zone = ZoneId.of(params.tz); ChronoUnit.DAYS. between(Instant.ofEpochMilli(doc.t2.date.millis).atZone(zone), Instant.ofEpochMilli(doc.t1.date.millis).atZone(zone))

@cp2587
Copy link

cp2587 commented Feb 6, 2017

Thank you very much @nik9000, it works well !!

@nik9000
Copy link
Member

nik9000 commented Feb 6, 2017

Thank you very much @nik9000, it works well !!

And now I'm torn! Painless supports java's time api well because we expected to port Elasticsearch to java's time api and drop joda time at some point. But we haven't yet. Either we should push the migration through in 6.0 or fix up joda's date support in painless or both....

@rjernst
Copy link
Member

rjernst commented Feb 6, 2017

or fix up joda's date support in painless

I don't think we should do that. The java 8 time api in painless works fine.

@nik9000
Copy link
Member

nik9000 commented Feb 6, 2017

The java 8 time api in painless works fine.

Yeah. I'm fairly sure the right thing to do is to complete the port. For the time being things like date aggregations can work if you return millis-since-epoch from the script.

@jdconrad
Copy link
Contributor

jdconrad commented Feb 6, 2017

I'm also against adding support for Joda. Dates are already challenging to use, but having two libraries would really step up the difficulty level.

@jasontedor
Copy link
Member

Removing Joda is an eventuality.

@clintongormley
Copy link
Contributor

I agree with not adding support for joda to painless. We should (at some stage) start working on #12829 to replace joda with java time instead

@niemyjski
Copy link
Contributor Author

niemyjski commented Feb 6, 2017

@rjernst I wouldn't say it works fine, but it works.. It's been a PITA working with dates in elastic painless...

@eranhirs
Copy link

eranhirs commented Aug 24, 2017

@niemyjski thanks for the help.
Here is my experience with manipulating dates.
Maybe this will help someone, I eventually succeed using the .plusHours() function.

@s1monw
Copy link
Contributor

s1monw commented Nov 29, 2017

it seems like this issue can be closed since we have a way forward with java time... closing

@s1monw s1monw closed this as completed Nov 29, 2017
@clintongormley clintongormley added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Plugin Lang Painless labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache discuss
Projects
None yet
Development

No branches or pull requests

9 participants