Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'Tracker' track-generation tool #875

Closed
wants to merge 42 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
a4be1ff
Add 'tracker' track-generation tool
Jan 17, 2020
e0fb9ba
Add Tracker to lint list
Jan 17, 2020
bcb7dff
Fix pylint 'import-self' warning in Tracker
Jan 17, 2020
545dd46
Implement multi-index support, logging fix
Jan 17, 2020
ef36af2
Prefix tracker with 'es' for disambiguation.
Jan 21, 2020
3d32262
Addressing code review comments for corpus.py
Jan 21, 2020
ff42833
Merge operations into challenges.
Jan 21, 2020
e588aec
Change default trackname to first index argument.
Jan 21, 2020
0dfdbfd
Use Rally client factory + options
Jan 21, 2020
7d47ae9
Fix name + description of default challenge.
Jan 22, 2020
1ef7f2a
Drop extraneous operations.
Jan 22, 2020
0467224
Fix divide-by-zero with small corpus.
Jan 23, 2020
50c7f74
Move index_test to tests, add corpus test.
Feb 4, 2020
ed0ab42
Add Tracker to docs.
Feb 7, 2020
31d9624
Merge branch 'master' into tracker
Feb 7, 2020
f5cfc1c
Add standard license header to __init__ files.
Feb 12, 2020
8f9e846
Change package name from 'tracker' to 'estracker'.
Feb 13, 2020
0848be0
Code review comments
Feb 12, 2020
796c68f
More code review
Feb 14, 2020
46dc2c5
Move to corpora vs multi-document corpus.
Feb 14, 2020
140b9f2
Add basic integration test for Tracker.
Feb 14, 2020
e7cdae4
Fixes uncovered by testing.
Feb 14, 2020
4e5ebbb
Filter 'store' in settings, parameterize replicas + shards
Feb 14, 2020
47cadb2
Race the new track in integration-test.
Feb 14, 2020
340ddca
Refactor corpus.py, add testmode output + better testing
Feb 14, 2020
4be9f72
Define template_vars later to avoid update().
Feb 19, 2020
988cdf8
Just make entire challenge template raw for now.
Feb 14, 2020
99938b2
Fixes to integration-test:
Feb 19, 2020
02441f4
Fix comma usage in track template.
Feb 19, 2020
5249fa9
Add estracker tests to pytest tests/ estracker/tests/
Feb 19, 2020
df0f247
Merge branch 'master' into tracker
Feb 20, 2020
6bfdc31
Quick fix for dumping whole index in test-mode.
Feb 20, 2020
16097ac
Fix tracker integration test, add some additional checks.
Feb 21, 2020
3e4fac1
Refactoring + improvements in corpus + track
Feb 21, 2020
4e23575
Improve output prettiness by moving comma() to previous line
Feb 21, 2020
c557a1e
Drop warmup-time-period.
Feb 21, 2020
244a888
Fix update_index_setting_parameters, add test.
Feb 28, 2020
f962abf
Change indices to csv_to_list format, require track-name when more th…
Feb 28, 2020
3c9ed60
As extraction failures are warnings, we should still fail if all extr…
Feb 28, 2020
0072116
Use compressed filename for all paths.
Feb 28, 2020
59e9bb3
Fix trackname -> track-name and add example client-options
Feb 28, 2020
1089724
Disallow hidden + empty index names
Feb 28, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,6 @@ recipes/ccr/ccr-target-hosts.json
*~
/.project
/.pydevproject

# Tracker tracks
tracks/
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ tox-env-clean:
rm -rf .tox

lint: check-venv
@find esrally benchmarks scripts tests -name "*.py" -exec $(VEPYLINT) -j0 -rn --load-plugins pylint_quotes --rcfile=$(CURDIR)/.pylintrc \{\} +
@find esrally benchmarks scripts tests tracker -name "*.py" -exec $(VEPYLINT) -j0 -rn --load-plugins pylint_quotes --rcfile=$(CURDIR)/.pylintrc \{\} +

docs: check-venv
@. $(VENV_ACTIVATE_FILE); cd docs && $(MAKE) html
Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,8 @@ def str_from_file(name):
entry_points={
"console_scripts": [
"esrally=esrally.rally:main",
"esrallyd=esrally.rallyd:main"
"esrallyd=esrally.rallyd:main",
"tracker=tracker.tracker:main"
],
},
classifiers=[
Expand Down
Empty file added tracker/__init__.py
Empty file.
81 changes: 81 additions & 0 deletions tracker/corpus.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Licensed to Elasticsearch B.V. under one or more contributor
# license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright
# ownership. Elasticsearch B.V. licenses this file to you under
# the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

import bz2
import json
import logging
import os
import pathlib

from elasticsearch import helpers


def template_vars(index_name, out_path, comp_outpath, doc_count):
corpus_path = pathlib.Path(out_path)
compressed_corpus_path = pathlib.Path(comp_outpath)
return {
"index_name": index_name,
"base_url": corpus_path.parent.as_uri(),
"filename": corpus_path.name,
"path": corpus_path,
"doc_count": doc_count,
"uncompressed_bytes": os.stat(corpus_path.as_posix()).st_size,
"compressed_bytes": os.stat(compressed_corpus_path.as_posix()).st_size
}


def extract(client, outdir, index):
"""
Scroll an index with a match-all query, dumping document source to
outdir/documents.json
:param client: Elasitcsearch client to scroll
:param outdir: Destination directory for corpus dump
:param index: Name of index to dump
:return: dict of properties describing the corpus for templates
"""
outpath = os.path.join(outdir, "{}-documents.json".format(index))

total_docs = client.count(index=index)["count"]
logging.info("%d total docs in index %s", total_docs, index)
freq = total_docs // 1000

compressor = bz2.BZ2Compressor()
comp_outpath = outpath + ".bz2"

with open(outpath, "wb") as outfile:
with open(comp_outpath, "wb") as comp_outfile:
logging.info("Now dumping corpus to %s...", outpath)

query = {"query": {"match_all": {}}}
for n, doc in enumerate(helpers.scan(client, query=query, index=index)):
docsrc = doc["_source"]
data = (json.dumps(docsrc, separators=(',', ':')) + "\n").encode("utf-8")

outfile.write(data)
comp_outfile.write(compressor.compress(data))

render_progress(n+1, total_docs, freq)

print() # progress prints didn't have a newline
comp_outfile.write(compressor.flush())
return template_vars(index, outpath, comp_outpath, total_docs)


def render_progress(cur, total, freq):
if cur % freq == 0 or total - cur < freq:
percent = (cur * 100) / total
print("\r{n}/{total_docs} ({percent:.1f}%)".format(n=cur, total_docs=total, percent=percent), end="")
72 changes: 72 additions & 0 deletions tracker/index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Licensed to Elasticsearch B.V. under one or more contributor
# license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright
# ownership. Elasticsearch B.V. licenses this file to you under
# the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

import json
import os


INDEX_SETTINGS_EPHEMERAL_KEYS = ["uuid",
"creation_date",
"version",
"provided_name"]


def filter_ephemeral_index_settings(settings):
"""
Some of the 'settings' reported by Elasticsearch for an index are
ephemeral values, not useful for re-creating the index.
:param settings: Index settings reported by index.get()
:return: settings with ephemeral keys removed
"""
return {k: v for k, v in settings.items() if k not in INDEX_SETTINGS_EPHEMERAL_KEYS}


def extract_index_mapping_and_settings(client, index):
"""
Calls index GET to retrieve mapping + settings, filtering settings
so they can be used to re-create this index
:param client: Elasticsearch client
:param index: name of index
:return: index creation dictionary
"""
response = client.indices.get(index)
details = response[index]

mappings = details["mappings"]
index_settings = filter_ephemeral_index_settings(details["settings"]["index"])
return {"mappings": mappings, "settings": {"index": index_settings}}


def extract(client, outdir, index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function name is not particularly helpful in understanding what exactly is extracted?

"""
Request index information to format in "index.json" for Rally
:param client: Elasticsearch client
:param outdir: destination directory
:param index: name of index
:return: None
"""
filename = index + ".json"
index_obj = extract_index_mapping_and_settings(client, index)
outpath = os.path.join(outdir, filename)
with open(outpath, "w") as outfile:
json.dump(index_obj, outfile, indent=4, sort_keys=True)
outfile.write('\n')
return {
"name": index,
"path": outpath,
"filename": filename,
}
101 changes: 101 additions & 0 deletions tracker/index_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Licensed to Elasticsearch B.V. under one or more contributor
# license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright
# ownership. Elasticsearch B.V. licenses this file to you under
# the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from unittest import mock
from tracker.index import filter_ephemeral_index_settings, extract_index_mapping_and_settings


def test_index_setting_filter():
unfiltered_index_settings = {
"number_of_shards": "5",
"provided_name": "queries",
"creation_date": "1579230289084",
"requests": {
"cache": {
"enable": "false"
}
},
"number_of_replicas": "0",
"queries": {
"cache": {
"enabled": "false"
}
},
"uuid": "jdzVt-dDS1aRlqdZWK4pdA",
"version": {
"created": "7050099"
}
}
settings = filter_ephemeral_index_settings(unfiltered_index_settings)
assert settings.keys() == {"number_of_shards", "number_of_replicas", "requests", "queries"}


@mock.patch("elasticsearch.Elasticsearch")
def test_extract_index_create(client):
client.indices.get.return_value = {
"osmgeopoints": {
"aliases": {},
"mappings": {
"dynamic": "strict",
"properties": {
"location": {
"type": "geo_point"
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "osmgeopoints",
"creation_date": "1579210032233",
"requests": {
"cache": {
"enable": "false"
}
},
"number_of_replicas": "0",
"uuid": "vOOsPNfxTJyQekkIo9TjPA",
"version": {
"created": "7050099"
}
}
}
}
}
expected = {
"mappings": {
"dynamic": "strict",
"properties": {
"location": {
"type": "geo_point"
}
}
},
"settings": {
"index": {
"number_of_replicas": "0",
"number_of_shards": "5",
"requests": {
"cache": {
"enable": "false"
}
}
}
}
}
res = extract_index_mapping_and_settings(client, "osmgeopoints")
assert res == expected
38 changes: 38 additions & 0 deletions tracker/resources/logging.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"version": 1,
"formatters": {
"normal": {
"format": "%(asctime)s,%(msecs)d PID:%(process)d %(name)s %(levelname)s %(message)s",
"datefmt": "%Y-%m-%d %H:%M:%S",
"()": "esrally.log.configure_utc_formatter"
}
},
"handlers": {
"default_log_handler": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_log_handler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in: 796c68f

"class": "logging.handlers.WatchedFileHandler",
"filename": "tracker.log",
"encoding": "UTF-8",
"formatter": "normal"
},
"console_handler": {
"class": "logging.StreamHandler",
"level": "INFO"
}
},
"root": {
"handlers": [
"default_log_handler",
"console_handler"
],
"level": "INFO"
},
"loggers": {
"elasticsearch": {
"handlers": [
"default_log_handler"
],
"level": "WARNING",
"propagate": false
}
}
}
49 changes: 49 additions & 0 deletions tracker/templates/challenges.json.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{
"name": "append-no-conflicts",
"description": "Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run.",
"default": true,
"schedule": [
{
"operation": "delete-index"
},
{
"operation": {
"operation-type": "create-index",
"settings": {% raw %}{{index_settings | default({}) | tojson}}{% endraw %}
}
},
{
"operation": "index-append",
"warmup-time-period": 120,
"clients": {% raw %}{{bulk_indexing_clients | default(8)}}{% endraw %}
},
{
"name": "refresh-after-index",
"operation": "refresh",
"clients": 1
},
{
"operation": "force-merge",
"clients": 1
},
{
"name": "refresh-after-force-merge",
"operation": "refresh",
"clients": 1
},
{
"operation": "index-stats",
"clients": 1,
"warmup-iterations": 500,
"iterations": 1000,
"target-throughput": 90
},
{
"operation": "node-stats",
"clients": 1,
"warmup-iterations": 100,
"iterations": 1000,
"target-throughput": 90
}
]
}
16 changes: 16 additions & 0 deletions tracker/templates/operations.json.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"name": "index-append",
"operation-type": "bulk",
"bulk-size": {{bulk_size | default(5000)}},
"ingest-percentage": {{ingest_percentage | default(100)}}
},
{
"name": "index-update",
"operation-type": "bulk",
"bulk-size": {{bulk_size | default(5000)}},
"ingest-percentage": {{ingest_percentage | default(100)}},
"conflicts": "{{conflicts | default('random')}}",
"on-conflict": "{{on_conflict | default('index')}}",
"conflict-probability": {{conflict_probability | default(25)}},
"recency": {{recency | default(0)}}
}
Loading