Index-append operation only indexing bulk-size * clients documents #377

dakrone · 2017-12-05T17:46:13Z

Rally version (get with esrally --version):
Latest from master, 425d8f6

Invoked command:

./rally --track-path=/home/hinmanm/es/mytrack --target-hosts=127.0.0.1:9200 --pipeline=benchmark-only

Configuration file (located in ~/.rally/rally.ini)):

[meta]
config.version = 12

[system]
env.name = local

[node]
root.dir = /home/hinmanm/.rally/benchmarks
src.root.dir = /home/hinmanm/es

[source]
remote.repo.url = https://github.com/elastic/elasticsearch.git
elasticsearch.src.subdir = elasticsearch

[build]
gradle.bin = /home/hinmanm/.sdkman/candidates/gradle/current/bin/gradle

[runtime]
java.home = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc26.x86_64

[benchmarks]
local.dataset.cache = ${node:root.dir}/data

[reporting]
datastore.type = elasticsearch
datastore.host = localhost
datastore.port = 9900
datastore.secure = False
datastore.user = 
datastore.password = 

[tracks]
default.url = https://github.com/elastic/rally-tracks

[teams]
default.url = https://github.com/elastic/rally-teams

[defaults]
preserve_benchmark_candidate = False

[distributions]
release.1.url = https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-{{VERSION}}.tar.gz
release.2.url = https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/{{VERSION}}/elasticsearch-{{VERSION}}.tar.gz
release.url = https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{{VERSION}}.tar.gz
release.cache = true

JVM version:
JDK 8

OS version:
Fedora 26

Description of the problem including expected versus actual behavior:

I have a track with an index-append operation defined inline in the challenge like so:

      "schedule": [
        {
          "operation": {
            "name": "index-append",
            "operation-type": "bulk",
            "bulk-size": {{bulk_size | default(100)}}
          },
          "clients": 4
        },

The documents.json contains 1967 documents, however, only 400 are actually indexed.

Steps to reproduce:

Using a track with many documents, add a challenge schedule with a low bulk-size and multiple clients
Run the track
Observe that only bulk-size x clients documents are indexed, in my case, 100 x 4 = 400 documents actually indexed.

I've noticed that this didn't affect me when the indexing was defined in a separate operation, it only started affecting me when I defined it inline in the challenge.

Provide logs (if relevant):
The data is from a private repo, so I cannot provide it here.

The text was updated successfully, but these errors were encountered:

danielmitterdorfer · 2017-12-06T11:54:31Z

I could reproduce the behavior that you are seeing. It is caused by the fact that you did not specify any iterations or time-periods on the task. If you add "warmup-time-period": 0 to the task definition, then it will index all documents, i.e. this will do what you want:

      "schedule": [
        {
          "clients": 4,
          "warmup-time-period": 0,
          "operation": {
            "name": "index-append",
            "operation-type": "bulk",
            "bulk-size": {{bulk_size | default(100)}}
          }
        }

The reason for this - admittedly - strange behavior is that you can either have a time-period-based or an iteration-based task. If you do not specify anything, Rally will run the provided operation once without warmup by default and that's what you see here.

While we could argue that it makes no sense to execute a bulk operation only once, Rally does not impose any semantics on the operation on that level. It simply executes what you give it.

dakrone · 2017-12-06T15:31:03Z

Very odd, okay, I wonder if maybe it'd be nice to have a different operation type that will always consume all of the documents from the file? That's the only thing I could think that would help alleviate the weirdness

danielmitterdorfer · 2017-12-06T15:38:49Z

Yes. I let this ticket open as a reminder for now but I need to think how to make this less trappy in the future.

danielmitterdorfer · 2018-02-19T13:14:25Z

Another user just hit this in https://discuss.elastic.co/t/bulk-index-operation-for-multiple-indices/120373. Hence, I have changed the milestone now so we do something about this earlier.

danielmitterdorfer · 2018-03-09T10:46:19Z

Rally 0.9.4 will implement the following behavior in case the user did not specify warmup-time-period, time-period, warmup-iterations or iterations: It will still default to an iteration-based approach (as opposed to a time-based approach). However, instead of defaulting to no warmup iterations and one measurement iteration, it will first check the corresponding parameter source. For bulk operations, the parameter source is able to determine the necessary number of bulks upfront (for all other operations this has never been a problem). Consequently, we will now ingest all data by default.

With this commit we also query the parameter source when determining the default number of iterations. Previously, when the user did not specify any time-period nor any number of iterations we always defaulted to zero warmup iterations and one measurement iteration. This lead to surprising behavior for bulk-indexing when the user forgot to add a warmup time period because we only issued one bulk request. Closes elastic#377

With this commit we also query the parameter source when determining the default number of iterations. Previously, when the user did not specify any time-period nor any number of iterations we always defaulted to zero warmup iterations and one measurement iteration. This lead to surprising behavior for bulk-indexing when the user forgot to add a warmup time period because we only issued one bulk request. Closes #377 Relates #436

danielmitterdorfer added :Track Management New operations, changes in the track format, track download changes and the like :Usability Makes Rally easier to use enhancement Improves the status quo labels Dec 6, 2017

danielmitterdorfer added this to the 0.9.x milestone Feb 19, 2018

danielmitterdorfer modified the milestones: 0.9.x, 0.9.4 Mar 9, 2018

danielmitterdorfer mentioned this issue Mar 9, 2018

Bulk index all data by default #436

Merged

danielmitterdorfer closed this as completed in #436 Mar 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index-append operation only indexing bulk-size * clients documents #377

Index-append operation only indexing bulk-size * clients documents #377

dakrone commented Dec 5, 2017

danielmitterdorfer commented Dec 6, 2017 •

edited

Loading

dakrone commented Dec 6, 2017

danielmitterdorfer commented Dec 6, 2017

danielmitterdorfer commented Feb 19, 2018

danielmitterdorfer commented Mar 9, 2018

Index-append operation only indexing bulk-size * clients documents #377

Index-append operation only indexing bulk-size * clients documents #377

Comments

dakrone commented Dec 5, 2017

danielmitterdorfer commented Dec 6, 2017 • edited Loading

dakrone commented Dec 6, 2017

danielmitterdorfer commented Dec 6, 2017

danielmitterdorfer commented Feb 19, 2018

danielmitterdorfer commented Mar 9, 2018

danielmitterdorfer commented Dec 6, 2017 •

edited

Loading