Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected size discrepancy #1240

Closed
alsyia opened this issue Apr 14, 2021 · 6 comments
Closed

Expected size discrepancy #1240

alsyia opened this issue Apr 14, 2021 · 6 comments
Assignees
Labels
bug Something's wrong

Comments

@alsyia
Copy link

alsyia commented Apr 14, 2021

Hello,

Just sumbled upon what looks like a bug with Rally! The tool looks great and I would really like to use, so hopefully someone can help me understand what's going on here. Apologies if this is not bug! I looked in the repo and on the dicuss but could not find anything related. The fact it happens everywhere, with different tracks, and that the downloaded size (see below) matches the listed size on the bucket makes me things it might be a bug.

Thanks for your help!


Rally version (get with esrally --version): esrally 2.1.0

Invoked command: esrally race --track=so --target-hosts=<node1_ip>:9200,<node2_ip>:9200,<node3_ip>:9200 --pipeline=benchmark-only (replaced node ips by placeholders)

Configuration file (located in ~/.rally/rally.ini)):

[meta]
config.version = 17

[system]
env.name = local

[node]
root.dir = /home/<username>/.rally/benchmarks
src.root.dir = /home/<username>/.rally/benchmarks/src

[source]
remote.repo.url = https://github.com/elastic/elasticsearch.git
elasticsearch.src.subdir = elasticsearch

[benchmarks]
local.dataset.cache = /home/<username>/.rally/benchmarks/data

[reporting]
datastore.type = in-memory
datastore.host =
datastore.port =
datastore.secure = False
datastore.user =
datastore.password =


[tracks]
default.url = https://github.com/elastic/rally-tracks

[teams]
default.url = https://github.com/elastic/rally-teams

[defaults]
preserve_benchmark_candidate = false

[distributions]
release.cache = true

JVM version: My understand is that this is not required since I'm running --pipeline=benchmark-only

OS version: Ubuntu 20.04.2 LTS

Description of the problem including expected versus actual behavior:

All tracks data downloads, whatever track I choose, always filed because of a size discrepancy.
Example with track so:

[INFO] Downloading track data (8.9 GB total size)                                 [100.0%]
[ERROR] Cannot race. Error in track preparator
	Download of [/home/<username>/.rally/benchmarks/data/so/posts.json.bz2] is corrupt. Downloaded [9600716233] bytes but [9599137228] bytes are expected. Please retry.

I've tried different tracks, different servers, different networks, I retried a few times, nothing changed. What makes me thinks this might be a bug is that at http://benchmarks.elasticsearch.org.s3.amazonaws.com/, the listed file size for corpora/so/posts.json.bz2 is indeed 9600716233. So it looks like the file is not corrupted but that esrally expects it to have another size for some reason?

Steps to reproduce:

  1. Get a node with ES on it (it might bug with ESRally integrated provisionning, but I can't install Java to test on these VMs)
  2. Run esrally race --track=so --target-hosts=<node1_ip>:9200 --pipeline=benchmark-only
  3. Observe how the download seemingly fails

Provide logs (if relevant):

2021-04-14 14:37:05,860 ActorAddr-(T|:41297)/PID:9089 esrally.track.loader INFO Preparing track [so]
2021-04-14 14:37:05,862 ActorAddr-(T|:41297)/PID:9089 esrally.track.loader INFO Resolved data root directory for document corpus [so] in track [so] to [['/home/<username>/.rally/benchmarks/data/so']].
2021-04-14 14:37:05,863 ActorAddr-(T|:41297)/PID:9089 esrally.track.loader INFO Downloading data from [http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/so/posts.json.bz2] (9154 MB) to [/home/<username>/.rally/benchmarks/data/so/posts.json.bz2].
2021-04-14 14:41:51,166 ActorAddr-(T|:41297)/PID:9089 esrally.actor ERROR Track preparator has detected a benchmark failure. Notifying master...
Traceback (most recent call last):

  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 415, in prepare_track
    tp.on_prepare_track(t, data_root_dir)

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 88, in on_prepare_track
    if not t.on_prepare_track(track, data_root_dir):

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 437, in on_prepare_track
    prep.prepare_document_set(document_set, data_root[0])

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 585, in prepare_document_set
    self.downloader.download(document_set.base_url, target_path, expected_size)

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 500, in download
    net.download(data_url, target_path, size_in_bytes, progress_indicator=progress)

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/utils/net.py", line 223, in download
    raise exceptions.DataError("Download of [%s] is corrupt. Downloaded [%d] bytes but [%d] bytes are expected. Please retry." %

esrally.exceptions.DataError: Download of [/home/<username>/.rally/benchmarks/data/so/posts.json.bz2] is corrupt. Downloaded [9600716233] bytes but [9599137228] bytes are expected. Please retry.

2021-04-14 14:41:51,169 ActorAddr-(T|:42963)/PID:9088 esrally.actor ERROR Main driver received a fatal exception from a load generator. Shutting down.
2021-04-14 14:41:51,169 ActorAddr-(T|:42963)/PID:9088 esrally.metrics INFO Closing metrics store.
2021-04-14 14:41:51,170 ActorAddr-(T|:44985)/PID:9068 esrally.actor INFO Received a benchmark failure from [ActorAddr-(T|:42963)] and will forward it now.
2021-04-14 14:41:51,172 -not-actor-/PID:9052 esrally.racecontrol ERROR A benchmark failure has occurred
2021-04-14 14:41:51,172 -not-actor-/PID:9052 esrally.racecontrol INFO Telling benchmark actor to exit.
2021-04-14 14:41:51,173 ActorAddr-(T|:44985)/PID:9068 esrally.actor INFO BenchmarkActor received unknown message [ActorExitRequest] (ignoring).
2021-04-14 14:41:51,174 ActorAddr-(T|:42963)/PID:9088 esrally.actor INFO Main driver received ActorExitRequest and will terminate all load generators.
2021-04-14 14:41:51,174 ActorAddr-(T|:35223)/PID:9087 esrally.actor INFO MechanicActor#receiveMessage unrecognized(msg = [<class 'thespian.actors.ActorExitRequest'>] sender = [ActorAddr-(T|:44985)])
2021-04-14 14:41:51,175 ActorAddr-(T|:44985)/PID:9068 esrally.actor INFO BenchmarkActor received unknown message [ChildActorExited:ActorAddr-(T|:35223)] (ignoring).
2021-04-14 14:41:51,176 ActorAddr-(T|:42963)/PID:9088 esrally.actor INFO A track preparator has exited.
2021-04-14 14:41:51,177 ActorAddr-(T|:44985)/PID:9068 esrally.actor INFO BenchmarkActor received unknown message [ChildActorExited:ActorAddr-(T|:42963)] (ignoring).
2021-04-14 14:41:54,176 -not-actor-/PID:9052 esrally.rally INFO Attempting to shutdown internal actor system.
2021-04-14 14:41:54,178 -not-actor-/PID:9067 root INFO ActorSystem Logging Shutdown
2021-04-14 14:41:54,199 -not-actor-/PID:9066 root INFO ---- Actor System shutdown
@dliappis
Copy link
Contributor

@alsyia what version of Elasticsearch are you pointing Rally against? I just tested locally against ES 7.12.0 without issues.

One thing to watch out is if you have old definitions in your track files. I'd start by wiping away the contents of ~/.rally/benchmarks/ (if you don't have custom work there of course) and let Rally redownload from the GH repos the track definitions it needs to use.

For the future: note that for unconfirmed bug reports we prefer posting in our discuss forums first: https://discuss.elastic.co/tag/rally as there's an active community there and it allows us to use GitHub for verified bug reports, feature requests, and pull requests.

@dliappis dliappis self-assigned this Apr 14, 2021
@alsyia
Copy link
Author

alsyia commented Apr 14, 2021

@dliappis Wow, you're reactive!

I'm running Rally against 5.5.3. I know it's not really supported anymore, but I still needed to benchmark this version! I thought that since the issue happens at download this wouldn't be related to ES version.

I deleted the entire ~/.rally directory and tried again (this is a fresh install on a fresh server anyway) but it didn't change anything :/

I will make sure to post in the discuss forum next time, sorry!

EDIT: I tried downloading the data file manually with the download.sh script, but got

[ERROR] Cannot race. Error in track preparator
	Cannot find [/home/<usernsame>/.rally/benchmarks/data/so/posts.json.bz2]. Please disable offline mode and retry.

Even if the file is very much there.

<username>@<hostname>:~$ ls -la ~/.rally/benchmarks/data/so/
total 9375896
drwxrwxr-x 2 <username> <username>       4096 Apr 14 15:38 .
drwxrwxr-x 3 <username> <username>       4096 Apr 14 15:24 ..
-rw-rw-r-- 1 <username> <username>     185694 Apr 14 15:37 posts-1k.json.bz2
-rw-rw-r-- 1 <username> <username> 9600716233 Apr 14 15:37 posts.json.bz2

I then edited the track file so that the expected size matches the real size, and it looks like its working.

@dliappis
Copy link
Contributor

I see the problem.

As you are benchmarking ES 5, Rally pulls the branch 5 of github.com/elastic/rally-tracks.

Trouble is that ES 5 is EOL and we haven't backported elastic/rally-tracks#109, which brought those changes, to branch 5.

master: https://github.com/elastic/rally-tracks/blob/e02b939c32714e508a7cb8b95daa9b0f3544720e/so/track.json#L18-L21
5: https://github.com/elastic/rally-tracks/blob/490b7a2384091504a00b0b69bdd134cffc159db4/so/track.json#L17-L23

i.e. the diff is

~/source/elastic/rally-tracks (5) $ git diff 6 5 so/track.json
diff --git a/so/track.json b/so/track.json
index 18bf1bd..1472c19 100644
--- a/so/track.json
+++ b/so/track.json
@@ -18,7 +18,7 @@
         {
           "source-file": "posts.json.bz2",
           "document-count": 36062278,
-          "compressed-bytes": 9600716233,
+          "compressed-bytes": 9599137228,
           "uncompressed-bytes": 35564808298
         }
       ]

I raised elastic/rally-tracks#167 to fix this

@dliappis dliappis added the bug Something's wrong label Apr 14, 2021
@dliappis
Copy link
Contributor

dliappis commented Apr 14, 2021

@alsyia could you attempt to use elastic/rally-tracks#167 and see if it works for you? (you can leverage the --track-path parameter to bypass Rally's automatic selection of the branch)

@alsyia
Copy link
Author

alsyia commented Apr 15, 2021

@dliappis Very clear explanation, thanks!

I added hotfix.url = https://github.com/dliappis/rally-tracks in the [tracks] section of rally.ini, cleaned up the existing data, and ran

esrally race --track=so --target-hosts=es-<host_1>:9200,<host_2>:9200,<host_3>:9200 --pipeline=benchmark-only --track-repository=hotfix --track-revision=update-size-in-5

It works! 🎉

Thank you so much for your reactivity :) I guess we can close this ticket since I have a workaround and I see you have a PR for the fix, but I leave it to you, maybe you want to keep it open for tracking!

@dliappis
Copy link
Contributor

Thanks for much for the verification @alsyia . elastic/rally-tracks#167 has been merged, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something's wrong
Projects
None yet
Development

No branches or pull requests

2 participants