From f907c3d87c20fbedf2b15261b56621692cf3589e Mon Sep 17 00:00:00 2001 From: Dimitrios Liappis Date: Wed, 3 Feb 2021 11:25:21 +0200 Subject: [PATCH 1/2] Clarify num of docs in corpora when action and metadata is used In the typical cases of corpora that doesn't include an action-and-meta -data line, defining the number of documents is simple and equivalent to the number of lines. This commit clarifies that this calculation differs when using the `includes-action-and-meta-data` property and that number of documents should not include action-and-meta-data lines. --- docs/track.rst | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/track.rst b/docs/track.rst index 1e518595a..822287758 100644 --- a/docs/track.rst +++ b/docs/track.rst @@ -298,7 +298,12 @@ Each entry in the ``documents`` list consists of the following properties: * Google Storage: Either using `client library authentication `_ or by presenting an `oauth2 token `_ via the ``GOOGLE_AUTH_TOKEN`` environment variable, typically done using: ``export GOOGLE_AUTH_TOKEN=$(gcloud auth print-access-token)``. * ``source-format`` (optional, default: ``bulk``): Defines in which format Rally should interpret the data file specified by ``source-file``. Currently, only ``bulk`` is supported. * ``source-file`` (mandatory): File name of the corresponding documents. For local use, this file can be a ``.json`` file. If you provide a ``base-url`` we recommend that you provide a compressed file here. The following extensions are supported: ``.zip``, ``.bz2``, ``.gz``, ``.tar``, ``.tar.gz``, ``.tgz`` or ``.tar.bz2``. It must contain exactly one JSON file with the same name. The preferred file extension for our official tracks is ``.bz2``. -* ``includes-action-and-meta-data`` (optional, defaults to ``false``): Defines whether the documents file contains already an action and meta-data line (``true``) or only documents (``false``). +* ``includes-action-and-meta-data`` (optional, defaults to ``false``): Defines whether the documents file contains already an `action and meta-data `_ line (``true``) or only documents (``false``). + + .. note:: + + In this case the ``documents`` property should only reflect the number of documents and not additionally include the number of action and metadata lines. + * ``document-count`` (mandatory): Number of documents in the source file. This number is used by Rally to determine which client indexes which part of the document corpus (each of the N clients gets one N-th of the document corpus). If you are using parent-child, specify the number of parent documents. * ``compressed-bytes`` (optional but recommended): The size in bytes of the compressed source file. This number is used to show users how much data will be downloaded by Rally and also to check whether the download is complete. * ``uncompressed-bytes`` (optional but recommended): The size in bytes of the source file after decompression. This number is used by Rally to show users how much disk space the decompressed file will need and to check that the whole file could be decompressed successfully. From 20c839e0f6fe5329d47b8f60edfb1eae63738941 Mon Sep 17 00:00:00 2001 From: Dimitrios Liappis Date: Wed, 3 Feb 2021 11:47:44 +0200 Subject: [PATCH 2/2] Address PR comment --- docs/track.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/track.rst b/docs/track.rst index 822287758..d59849f40 100644 --- a/docs/track.rst +++ b/docs/track.rst @@ -302,7 +302,7 @@ Each entry in the ``documents`` list consists of the following properties: .. note:: - In this case the ``documents`` property should only reflect the number of documents and not additionally include the number of action and metadata lines. + When this is ``true``, the ``documents`` property should only reflect the number of documents and not additionally include the number of action and metadata lines. * ``document-count`` (mandatory): Number of documents in the source file. This number is used by Rally to determine which client indexes which part of the document corpus (each of the N clients gets one N-th of the document corpus). If you are using parent-child, specify the number of parent documents. * ``compressed-bytes`` (optional but recommended): The size in bytes of the compressed source file. This number is used to show users how much data will be downloaded by Rally and also to check whether the download is complete.