Additional metrics for bulk requests #286

cdahlqvist · 2017-06-05T11:24:08Z

When comparing bulk indexing throughput for different types of data, pure EPS can be very misleading if events have different sizes. I have found the total volume of JSON records indexed per time period to sometimes be a more useful measure, as this takes event size into consideration.

This pull request adds the total size of the bulk request as well as the total size of the documents indexed to the metrics reported, allowing the amount of JSON indexed per timeframe and total volume sent over the wire to be calculated.

danielmitterdorfer

Thanks for your PR @cdahlqvist! I left a couple of comments.

danielmitterdorfer · 2017-06-20T10:11:44Z

esrally/driver/runner.py

@@ -114,6 +114,8 @@ def __call__(self, es, params):

        * ``index``: name of the affected index. May be `None` if it could not be derived.
        * ``bulk-size``: bulk size, e.g. 5.000.
+        * ``bulk-request-size-bytes``: size of the full bulk requset in bytes


nit: typo "requset" -> "request"

danielmitterdorfer · 2017-06-20T11:23:01Z

esrally/driver/runner.py

+        bulk_request_size_bytes = 0
+        total_document_size_bytes = 0
+
+        for i in range(len(params["body"])):


This adds some overhead which is - depending on the track - at worst in the single-digit percentage range (I've measured up to 95ms). I think it would make sense to move this loop to the method detailed_stats. Users can then enable it for their tracks by setting detailed-results to true. Complete example:

{ "name": "bulk-index", "operation-type": "index", "bulk-size": 5000, "detailed-results": true }

Btw, you could also use enumerate for the loop instead of range to simplify it a bit:

for line_number, data in enumerate(params["body"]): line_size = len(data) if params["action_metadata_present"]: total_document_size_bytes += line_size #....

Another fine detail: the len function in Python returns the number of characters (although it's not really clearly documented) but combining the following two facts show that this is the case:

"[len] [r]eturn[s] the length (the number of items) of an object. The argument may be a sequence (such as a string [...])" (source)

"Strings are immutable sequences of Unicode code points." (source)

Therefore, we can reason that len returns the number of Unicode code points (i.e. characters) of a string. For characters that can be represented with 1 byte, the number of characters is identical to the number of bytes but strictly speaking it's the number of characters. To determine the number of bytes you need to encode the string as UTF-8, e.g. len(s.encode('utf-8')).

cdahlqvist · 2017-06-21T14:55:16Z

@danielmitterdorfer Have updated the PR according to given suggestions.

danielmitterdorfer

LGTM. I think it makes sense to add the following two asserations to BulkIndexRunnerTests#test_mixed_bulk_with_detailed_stats(self, es):

        self.assertEqual(158, result["bulk-request-size-bytes"])
        self.assertEqual(62, result["total-document-size-bytes"])

cdahlqvist · 2017-06-29T09:20:13Z

Added the assertions to the test.

danielmitterdorfer · 2017-06-29T09:26:29Z

LGTM. You can merge at any time.

cdahlqvist force-pushed the master branch from cc72585 to 3f44b0d Compare June 19, 2017 09:49

Added additional metrics for bulk requests.

85e502b

cdahlqvist force-pushed the master branch from 3f44b0d to 85e502b Compare June 19, 2017 10:22

danielmitterdorfer reviewed Jun 20, 2017

View reviewed changes

danielmitterdorfer added :Metrics How metrics are stored, calculated or aggregated Review enhancement Improves the status quo labels Jun 20, 2017

cdahlqvist added 3 commits June 21, 2017 15:08

Moved bulk size stats to detailed_stats

b66c5dc

Fixed variable naming issue

0e57e0f

Corrected documentation

fa57638

cdahlqvist force-pushed the master branch from a8e26a9 to fa57638 Compare June 21, 2017 14:54

danielmitterdorfer approved these changes Jun 22, 2017

View reviewed changes

Updated detailed bulk index test with size stats

25db1bb

danielmitterdorfer removed the Review label Jun 29, 2017

cdahlqvist merged commit 70065e2 into elastic:master Jun 29, 2017

danielmitterdorfer added this to the 0.6.1 milestone Jul 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional metrics for bulk requests #286

Additional metrics for bulk requests #286

cdahlqvist commented Jun 5, 2017

danielmitterdorfer left a comment

danielmitterdorfer Jun 20, 2017

danielmitterdorfer Jun 20, 2017 •

edited

Loading

cdahlqvist commented Jun 21, 2017

danielmitterdorfer left a comment

cdahlqvist commented Jun 29, 2017

danielmitterdorfer commented Jun 29, 2017

Additional metrics for bulk requests #286

Additional metrics for bulk requests #286

Conversation

cdahlqvist commented Jun 5, 2017

danielmitterdorfer left a comment

Choose a reason for hiding this comment

danielmitterdorfer Jun 20, 2017

Choose a reason for hiding this comment

danielmitterdorfer Jun 20, 2017 • edited Loading

Choose a reason for hiding this comment

cdahlqvist commented Jun 21, 2017

danielmitterdorfer left a comment

Choose a reason for hiding this comment

cdahlqvist commented Jun 29, 2017

danielmitterdorfer commented Jun 29, 2017

danielmitterdorfer Jun 20, 2017 •

edited

Loading