Detect error on _bulk api on elasticsearch plugin #681

HarukaMa · 2018-02-20T03:18:30Z

I am trying elasticsearch plugin and encountered this issue:

HTTP log:

POST /_bulk HTTP/1.1
Host: localhost:9200
User-Agent: libcrp/0.1
Accept: */*
Content-Type: application/json
Content-Length: 46770
Expect: 100-continue

{ "index" : { "_index" : "graphene-2018-02", "_type" : "data", "op_type" : "create", "_id" : "2.9.145258579" } }
...

HTTP/1.1 200 OK
access-control-allow-credentials: true
content-type: application/json; charset=UTF-8
content-length: 10534

{"took":0,"errors":true,"items":[{"create":{"_index":"graphene-2018-02","_type":"data","_id":"2.9.145258579","status":403,"error":{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"}}},{"create"
...

The error [FORBIDDEN/12/index read-only / allow delete (api)] seems related to low free space when indexing and will prevent further insert operations, but the plugin is currently ignoring the error, causing all subsequent operations missing in this index.

I think it would be better if the plugin could detect such errors instead of peacefully ignoring it to prevent data loss, as recovering could take a full replay which is lengthy using elasticsearch. The wiki should updated with estimated disk space requirement to somewhat prevent this happening too. My partial data up to 2018-01 takes about 60 GB disk space(excluding translog) using best-compression codec setting, with reduced translog size and age settings and 2 shards per index.

The text was updated successfully, but these errors were encountered:

abitmore · 2018-02-20T10:17:41Z

I heard that 160 GB of disk space is not enough for up-to-date full history.

HarukaMa · 2018-02-20T11:08:48Z

The data of indices shouldn't take more than 100 GB, I guess it's mainly because of the translog: By default every index have 5 shards, and each of them could take up to 512 MB, which means there could be 2.5 GB temporary storage overhead for every index during replay. In my opinion, 5 shards is a bit overkill as the largest index is still below 20 GB currently. Also, enabling best_compression could save about 10% space at the expense of slower operations.

oxarbitrage · 2018-02-20T22:36:59Z

we have considered to do something with http error logs(https://github.com/bitshares/bitshares-core/blob/master/libraries/plugins/elasticsearch/elasticsearch_plugin.cpp#L285-L295).

can include code 413 there but do what ? send a log msg to the node ? try to kill the node ?

let me know if you have a good idea, will be happy to add.

in regards to the shards, i am not expert in ES but the settings for the index are added at creation, when the first insert is sent, this creates the index with the default settings. in order to control the index settings we need to send a query before with the custom options we want.

if you have a set of good settings i can try add that.

in regards to the wiki, you are right, i added a note here: https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin#checking-if-it-is-working with 160 gigs even if it is less, better to have people prepared for big hd.

HarukaMa · 2018-02-21T06:11:08Z

I'm using template to pre-define the settings:

$ curl -XPUT 'http://localhost:9200/_template/graphene' -d '{
  "index_patterns" : ["graphene-*"],
  "settings": { "number_of_shards": 2,
    "index": {
      "translog": {
        "retention": {
          "size": "512mb", "age": "300s"
        }
      }
    }
  }
}' -H 'Content-Type: application/json

This template would apply those settings to all newly created index prefixed with graphene-. It's one time so there will be no need to specify them for every new index. In this settings I have also reduced translog age to 15min to minimize the storage usage, but I think that's optional.

Also, some errors are returned with 200 status, so checking status code for errors is not enough I think.

Is killing the node normal if one of the plugins have encountered errors during normal operations? I'm not quite sure about it now... I think we should have a way to "fix" partial indexes, like replaying from specific points instead of replaying from start to save (a lot of) time.

I'm having an additional question: Can we somehow make get_account_history api call use ES to get data? Currently it's only returning 1 op which matches the behavior of the plugin, but will affect the functionality of light wallet and various applications relying on this call, as they need to interact to ES instead to get the data.

oxarbitrage · 2018-02-21T18:58:14Z

as i think each server can define its own pre-settings i think is better to add the command to the wiki instead of making the call from the plugin itself, added:
https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin#pre-define-settings

Kill the node is not something any other plugin do as far as i know, a msg in the witness console can be at least something better than just do nothing. in the case of disk full, a msg for error 413 will do it.

Errors inside 200 are generally for a malformed query, i saw a lot when building the plugin, never saw any after released, the details for this can be obtained from the log index.
the other kind of error i saw inside 200 is "document already exist". as https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin#note-on-duplicates documents with the same id will not be added. can be caused by a block arriving twice or something like that. this are ok to ignore.

in regards to the get_account_history i definitely think that the call should have an if elasticsearch plugin active: do it with elastic; else use normal call code.
i need aproval from @abitmore and @pmconrad in order to do this.

oxarbitrage · 2018-02-23T23:05:38Z

about the get_account_history changes to use elasticsearch when available is a no-go for the bitshares core development team. we already discussed it before but i forgot about it.
the reason is that we do not want more api calls inside the bitshares-core if possible, if we add elasticsearch call version for get_account_history we need to do the same for get_account_history_operations. once we have them, new calls will be requested like get_account_history_by_date, get_account_history_by_block, etc.

this is against what we initially tried to do with the plugin which is remove the api call load from the nodes. to make all the queries imaginable with elasticsearch the api node can 1) expose full elasticsearch access to application(not recommended for security but if app is in the same machine as node this is an option) or 2) expose a wrapper like https://github.com/oxarbitrage/bitshares-es-wrapper
or 3) develop its own wrapper to expose the data the app will need.

we need to educate elasticsearch api nodes to use one of this options depending on their needs but one thing is sure in the short term, bitshares-core will not make use of elasticsearch to pull data out.

HarukaMa · 2018-02-24T08:30:29Z

Then we need a way to let clients know if the server is using ES plugin, and that should belong to #626 . For something like reference wallet or similar applications I think we may still need some "standardized" way if possible...

oxarbitrage · 2018-08-03T22:41:20Z

The error handling had been improved in the last version. There is a dedicated function looking for the error code here https://github.com/bitshares/bitshares-core/blob/develop/libraries/utilities/elasticsearch.cpp#L96 and returning true of false.

When sending data to ES fails plugin_exception will be raised: https://github.com/bitshares/bitshares-core/blob/develop/libraries/plugins/elasticsearch/elasticsearch_plugin.cpp#L405
This will make the plugin to stop processing blocks and keep trying until is solved(ES can be down, so it will resume when restarted, if no space - it will continue when space is freed, etc).
So basically will not keep going until the problem is fixed and data can be sent.

For this reason i think this issue can be closed but feel free to reopen if you think this is not enough.

oxarbitrage · 2018-08-03T22:49:07Z

reference to the pull where error handling was added: #1201

abitmore added plugin bug documentation labels Feb 20, 2018

abitmore added this to the Future Non-Consensus-Changing Release milestone Feb 25, 2018

oxarbitrage mentioned this issue Feb 25, 2018

es_objects plugin #500

Merged

oxarbitrage closed this as completed Aug 3, 2018

oxarbitrage modified the milestones: Future Non-Consensus-Changing Release, 201808 - Non-Consensus-Changing Release Aug 3, 2018

abitmore mentioned this issue Aug 4, 2018

Dummy issue for release note for (test-)2.0.1808xx #1234

Closed

abitmore mentioned this issue Jan 2, 2022

Store object arrays in ES as nested type instead of object #2568

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect error on _bulk api on elasticsearch plugin #681

Detect error on _bulk api on elasticsearch plugin #681

HarukaMa commented Feb 20, 2018

abitmore commented Feb 20, 2018

HarukaMa commented Feb 20, 2018

oxarbitrage commented Feb 20, 2018

HarukaMa commented Feb 21, 2018 •

edited

Loading

oxarbitrage commented Feb 21, 2018

oxarbitrage commented Feb 23, 2018

HarukaMa commented Feb 24, 2018

oxarbitrage commented Aug 3, 2018

oxarbitrage commented Aug 3, 2018

Detect error on _bulk api on elasticsearch plugin #681

Detect error on _bulk api on elasticsearch plugin #681

Comments

HarukaMa commented Feb 20, 2018

abitmore commented Feb 20, 2018

HarukaMa commented Feb 20, 2018

oxarbitrage commented Feb 20, 2018

HarukaMa commented Feb 21, 2018 • edited Loading

oxarbitrage commented Feb 21, 2018

oxarbitrage commented Feb 23, 2018

HarukaMa commented Feb 24, 2018

oxarbitrage commented Aug 3, 2018

oxarbitrage commented Aug 3, 2018

HarukaMa commented Feb 21, 2018 •

edited

Loading