-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Measuring the airspeed velocity of an unladen signac #629
Changes from all commits
1ab2b60
b772cb0
dc13107
3114efd
15ad225
dcc45a1
041e812
4723b0a
f79afd3
0f6a326
729cfce
73625e9
7b8a501
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ __pycache__ | |
pip-log.txt | ||
|
||
# Unit test / coverage reports | ||
.asv | ||
.noseids | ||
.coverage | ||
.tox | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
{ | ||
// The version of the config file format. Do not change, unless | ||
// you know what you are doing. | ||
"version": 1, | ||
|
||
// The name of the project being benchmarked | ||
"project": "signac", | ||
|
||
// The project's homepage | ||
"project_url": "https://signac.io/", | ||
|
||
// The URL or local path of the source code repository for the | ||
// project being benchmarked | ||
"repo": ".", | ||
|
||
// The Python project's subdirectory in your repo. If missing or | ||
// the empty string, the project is assumed to be located at the root | ||
// of the repository. | ||
// "repo_subdir": "", | ||
|
||
// Customizable commands for building, installing, and | ||
// uninstalling the project. See asv.conf.json documentation. | ||
// | ||
// "install_command": ["in-dir={env_dir} python -mpip install {wheel_file}"], | ||
// "uninstall_command": ["return-code=any python -mpip uninstall -y {project}"], | ||
// "build_command": [ | ||
// "python setup.py build", | ||
// "PIP_NO_BUILD_ISOLATION=false python -mpip wheel --no-deps --no-index -w {build_cache_dir} {build_dir}" | ||
// ], | ||
|
||
// List of branches to benchmark. If not provided, defaults to "master" | ||
// (for git) or "default" (for mercurial). | ||
// "branches": ["master"], // for git | ||
// "branches": ["default"], // for mercurial | ||
|
||
// The DVCS being used. If not set, it will be automatically | ||
// determined from "repo" by looking at the protocol in the URL | ||
// (if remote), or by looking for special directories, such as | ||
// ".git" (if local). | ||
"dvcs": "git", | ||
|
||
// The tool to use to create environments. May be "conda", | ||
// "virtualenv" or other value depending on the plugins in use. | ||
// If missing or the empty string, the tool will be automatically | ||
// determined by looking for tools on the PATH environment | ||
// variable. | ||
"environment_type": "virtualenv", | ||
|
||
// timeout in seconds for installing any dependencies in environment | ||
// defaults to 10 min | ||
//"install_timeout": 600, | ||
|
||
// the base URL to show a commit for the project. | ||
"show_commit_url": "https://github.com/glotzerlab/signac/commit/", | ||
|
||
// The Pythons you'd like to test against. If not provided, defaults | ||
// to the current version of Python used to run `asv`. | ||
// "pythons": ["3.9"], | ||
|
||
// The list of conda channel names to be searched for benchmark | ||
// dependency packages in the specified order | ||
// "conda_channels": ["conda-forge"], | ||
|
||
// The matrix of dependencies to test. Each key is the name of a | ||
// package (in PyPI) and the values are version numbers. An empty | ||
// list or empty string indicates to just test against the default | ||
// (latest) version. null indicates that the package is to not be | ||
// installed. If the package to be tested is only available from | ||
// PyPi, and the 'environment_type' is conda, then you can preface | ||
// the package name by 'pip+', and the package will be installed via | ||
// pip (with all the conda available packages installed first, | ||
// followed by the pip installed packages). | ||
// | ||
// "matrix": { | ||
// "numpy": ["1.6", "1.7"], | ||
// "six": ["", null], // test with and without six installed | ||
// "pip+emcee": [""], // emcee is only available for install with pip. | ||
// }, | ||
|
||
// Combinations of libraries/python versions can be excluded/included | ||
// from the set to test. Each entry is a dictionary containing additional | ||
// key-value pairs to include/exclude. | ||
// | ||
// An exclude entry excludes entries where all values match. The | ||
// values are regexps that should match the whole string. | ||
// | ||
// An include entry adds an environment. Only the packages listed | ||
// are installed. The 'python' key is required. The exclude rules | ||
// do not apply to includes. | ||
// | ||
// In addition to package names, the following keys are available: | ||
// | ||
// - python | ||
// Python version, as in the *pythons* variable above. | ||
// - environment_type | ||
// Environment type, as above. | ||
// - sys_platform | ||
// Platform, as in sys.platform. Possible values for the common | ||
// cases: 'linux2', 'win32', 'cygwin', 'darwin'. | ||
// | ||
// "exclude": [ | ||
// {"python": "3.2", "sys_platform": "win32"}, // skip py3.2 on windows | ||
// {"environment_type": "conda", "six": null}, // don't run without six on conda | ||
// ], | ||
// | ||
// "include": [ | ||
// // additional env for python2.7 | ||
// {"python": "2.7", "numpy": "1.8"}, | ||
// // additional env if run on windows+conda | ||
// {"platform": "win32", "environment_type": "conda", "python": "2.7", "libpython": ""}, | ||
// ], | ||
|
||
// The directory (relative to the current directory) that benchmarks are | ||
// stored in. If not provided, defaults to "benchmarks" | ||
"benchmark_dir": "benchmarks", | ||
|
||
// The directory (relative to the current directory) to cache the Python | ||
// environments in. If not provided, defaults to "env" | ||
"env_dir": ".asv/env", | ||
|
||
// The directory (relative to the current directory) that raw benchmark | ||
// results are stored in. If not provided, defaults to "results". | ||
"results_dir": ".asv/results", | ||
|
||
// The directory (relative to the current directory) that the html tree | ||
// should be written to. If not provided, defaults to "html". | ||
"html_dir": ".asv/html", | ||
|
||
// The number of characters to retain in the commit hashes. | ||
// "hash_length": 8, | ||
|
||
// `asv` will cache results of the recent builds in each | ||
// environment, making them faster to install next time. This is | ||
// the number of builds to keep, per environment. | ||
// "build_cache_size": 2, | ||
|
||
// The commits after which the regression search in `asv publish` | ||
// should start looking for regressions. Dictionary whose keys are | ||
// regexps matching to benchmark names, and values corresponding to | ||
// the commit (exclusive) after which to start looking for | ||
// regressions. The default is to start from the first commit | ||
// with results. If the commit is `null`, regression detection is | ||
// skipped for the matching benchmark. | ||
// | ||
// "regressions_first_commits": { | ||
// "some_benchmark": "352cdf", // Consider regressions only after this commit | ||
// "another_benchmark": null, // Skip regression detection altogether | ||
// }, | ||
|
||
// The thresholds for relative change in results, after which `asv | ||
// publish` starts reporting regressions. Dictionary of the same | ||
// form as in ``regressions_first_commits``, with values | ||
// indicating the thresholds. If multiple entries match, the | ||
// maximum is taken. If no entry matches, the default is 5%. | ||
// | ||
// "regressions_thresholds": { | ||
// "some_benchmark": 0.01, // Threshold of 1% | ||
// "another_benchmark": 0.5, // Threshold of 50% | ||
// }, | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
# Copyright 2021 The Regents of the University of Michigan | ||
# All rights reserved. | ||
# This software is licensed under the BSD 3-Clause License. | ||
"""Benchmarks for use with asv (airspeed velocity). | ||
|
||
This script defines benchmarks of common signac operations, used to assess the | ||
performance of the framework over time. The asv tools allow for profiling, | ||
comparison, and visualization of benchmark results. This complements the file | ||
``benchmark.py`` in the root directory of the repository, which is primarily | ||
intended for CI tests. | ||
""" | ||
|
||
import random | ||
import string | ||
from itertools import islice | ||
from multiprocessing import Pool | ||
from tempfile import TemporaryDirectory | ||
|
||
from tqdm import tqdm | ||
|
||
import signac | ||
|
||
|
||
def _random_str(size): | ||
return "".join(random.choice(string.ascii_lowercase) for _ in range(size)) | ||
|
||
|
||
def _make_json_data(i, num_keys=1, data_size=0): | ||
assert num_keys >= 1 | ||
assert data_size >= 0 | ||
|
||
data = {f"b_{j}": _random_str(data_size) for j in range(num_keys - 1)} | ||
data["a"] = f"{i}{_random_str(max(0, data_size - len(str(i))))}" | ||
return data | ||
|
||
|
||
def _make_job(project, num_keys, num_doc_keys, data_size, data_std, i): | ||
size = max(0, int(random.gauss(data_size, data_std))) | ||
job = project.open_job(_make_json_data(i, num_keys, size)) | ||
if num_doc_keys > 0: | ||
size = max(0, int(random.gauss(data_size, data_std))) | ||
job.document.update(_make_json_data(i, num_doc_keys, size)) | ||
else: | ||
job.init() | ||
|
||
|
||
def generate_random_data( | ||
project, | ||
N, | ||
num_keys=1, | ||
num_doc_keys=0, | ||
data_size_mean=0, | ||
data_size_std=0, | ||
parallel=True, | ||
): | ||
assert len(project) == 0 | ||
|
||
if parallel: | ||
with Pool() as pool: | ||
p = [ | ||
(project, num_keys, num_doc_keys, data_size_mean, data_size_std, i) | ||
for i in range(N) | ||
] | ||
list(pool.starmap(_make_job, tqdm(p, desc="init random project data"))) | ||
else: | ||
from functools import partial | ||
|
||
make = partial( | ||
_make_job, project, num_keys, num_doc_keys, data_size_mean, data_size_std | ||
) | ||
list(map(make, tqdm(range(N), desc="init random project data"))) | ||
|
||
|
||
def setup_random_project( | ||
N, num_keys=1, num_doc_keys=0, data_size_mean=0, data_size_std=0, seed=0, root=None | ||
): | ||
random.seed(seed) | ||
if not isinstance(N, int): | ||
raise TypeError("N must be an integer!") | ||
|
||
temp_dir = TemporaryDirectory() | ||
project = signac.init_project(f"benchmark-N={N}", root=temp_dir.name) | ||
generate_random_data( | ||
project, N, num_keys, num_doc_keys, data_size_mean, data_size_std | ||
) | ||
return project, temp_dir | ||
|
||
|
||
PARAMETERS = { | ||
"N": [100, 1_000], | ||
"num_statepoint_keys": [10], | ||
"num_document_keys": [0], | ||
"data_size_mean": [100], | ||
"data_size_std": [0], | ||
} | ||
|
||
|
||
class _ProjectBenchBase: | ||
param_names = PARAMETERS.keys() | ||
params = PARAMETERS.values() | ||
|
||
def setup(self, *params): | ||
N, num_keys, num_doc_keys, data_size_mean, data_size_std = params | ||
self.project, self.temp_dir = setup_random_project( | ||
N, | ||
num_keys=num_keys, | ||
num_doc_keys=num_doc_keys, | ||
data_size_mean=data_size_mean, | ||
data_size_std=data_size_std, | ||
) | ||
|
||
def teardown(self, *params): | ||
self.temp_dir.cleanup() | ||
|
||
|
||
class ProjectBench(_ProjectBenchBase): | ||
def time_determine_len(self, *params): | ||
len(self.project) | ||
|
||
def time_iterate_single_pass(self, *params): | ||
list(self.project) | ||
|
||
def time_iterate(self, *params): | ||
for _ in range(10): | ||
list(self.project) | ||
|
||
def time_iterate_load_sp(self, *params): | ||
for _ in range(10): | ||
[job.sp() for job in self.project] | ||
Comment on lines
+127
to
+129
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given lazy loading it would be interesting to see single pass speeds as well, unless this does it every pass (I am not too familiar with the loading that takes place here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to leave the scope of what we benchmark the same in this PR as we have currently implemented in |
||
|
||
|
||
class ProjectRandomJobBench(_ProjectBenchBase): | ||
def setup(self, *params): | ||
super().setup(*params) | ||
self.random_job = random.choice(list(self.project)) | ||
self.random_job_sp = self.random_job.statepoint() | ||
self.random_job_id = self.random_job.id | ||
self.lean_filter = {k: v for k, v in islice(self.random_job_sp.items(), 1)} | ||
|
||
def time_select_by_id(self, *params): | ||
self.project.open_job(id=self.random_job_id) | ||
|
||
def time_search_lean_filter(self, *params): | ||
len(self.project.find_jobs(self.lean_filter)) | ||
|
||
def time_search_rich_filter(self, *params): | ||
len(self.project.find_jobs(self.random_job_sp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not use (a potentially extended/adapted variant of)
signac/signac/contrib/project.py
Line 2412 in e60bfbc
signac/signac/testing.py
Line 8 in 8a804d0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
TemporaryProject
yields a project when used as a context manager. The benchmark script requires something we can store (likeself.project
andself.temp_dir
) insetup
and then destroy during theteardown
method. I'm not aware of a clean way to use that context manager here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, even if you don't use
TemporaryProject
, you could still use thetesting.init_jobs()
function?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I had forgotten about that function. It looks like it is currently designed for tests of projects with complex schemas, not tests of performance with varying data sizes. We would probably have to change that function to support the arguments like
num_keys
,num_doc_keys
,data_size
,data_std
, etc., and that feels out of scope for this PR. I'll finalize and merge as-is, since you indicated this is a non-blocking issue.