-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a hardware benchmark to test memory, disk, and network bandwidths #5966
Conversation
@quasiben you all might find this interesting |
Unit Test Results 12 files ± 0 12 suites ±0 6h 9m 16s ⏱️ - 9m 38s For more details on these failures, see this check. Results for commit 67f7514. ± Comparison against base commit 2fbf9eb. ♻️ This comment has been updated with latest results. |
cc @crusaderky if you have time to take a quick look |
@mrocklin quick comment: can you add the corresponding entry to this doc-page please https://github.com/dask/distributed/blob/main/docs/source/http_services.rst |
This is awesome! I think we should also build client.go_burr() cc @jacobtomlinson |
I've gone ahead and done this, but I'm not sure that I agree with that page generally. Historically distributed.dask.org gets almost no traffic. We've really focused on docs.dask.org. I don't think that we should be investing much in writing these docs unless we really invest in them. Instead I think that we should be focusing on docs.dask.org. I'm also not sure how much value there is in a listing of some of the dashboard pages. I might suggest a few alternatives:
Personally, I would do 1 or 3 |
Yeah, last week twelve people went to that page. 100% of them bounced in about a minute. Rather than continue spending dev time developing that page in its current style I'm inclined instead to remove it. Thoughts? |
Thank you for the thorough feedback @crusaderky. I've handled or responded to all of your comments and pushed an update. |
Matt, I agree that there should be more dashboard documentation and there is been an effort in starting that (see https://docs.dask.org/en/latest/dashboard.html) the truth is that outside the main status page we don't have proper docs, and the dashboard docstrings are not necessarily good. The HTTP endpoints page is the only place where are all the names of the dashboard pages and it's a good reference to find which tab corresponds to which plot.
Those twelve people I believe were Bryan, myself, and a couple more trying to find some information about certain plots, and it was helpful. I agree it's not the best one. But until we have better docs on every page, I'd think we should keep it. Besides the effort in maintaining this page is to add one line every time a plot is added, which I don't find to be a much of a problem |
To be clear, when I say "twelve" I really mean that I'm rounding that down to zero. If there weren't at least hundreds of views (preferably thousands) then in my mind the doc page doesn't really provide value.
Eh, things like this add inertia to the dev process. Asing everyone who makes a plot to add a line to a doc page that no one sees feels like unneccessary and useless work to me. I'd rather that we trim off things that aren't useful and focus more on critical issues. If we want to document the dashboard then great, but we need to work on documentation that has an impact. I'm against asking people to do work that doesn't have an impact. If the answer is "we should document this somewhere" then sure, I agree, but it needs to be somewhere that people actually look. Currently the dashboard is undocumented. That page isn't currently helping to solve the problem. |
dashboard test is failing, likely because of the task spawned by the new widget: |
I can't find the new widget in the [More...] dropdown in the GUI splash page.
|
That's also my read of how things worked. I've added this but am still not seeing it locally. I'm not yet sure why. |
Co-authored-by: crusaderky <[email protected]>
@@ -7329,6 +7331,51 @@ async def get_call_stack(self, keys=None): | |||
response = {w: r for w, r in zip(workers, results) if r} | |||
return response | |||
|
|||
async def benchmark_hardware(self) -> "dict[str, dict[str, float]]": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a docstring and explain the output?
Outstanding points:
|
This didn't pass pre-commit checks. I don't understand mypy well enough to diagnose why it's upset here. ``` distributed/dashboard/scheduler.py:133: error: No overload variant of "sorted" matches argument types "object", "Callable[[Any], Any]" distributed/dashboard/scheduler.py:133: note: Possible overload variants: distributed/dashboard/scheduler.py:133: note: def [SupportsRichComparisonT] sorted(Iterable[SupportsRichComparisonT], *, key: None = ..., reverse: bool = ...) -> List[SupportsRichComparisonT] distributed/dashboard/scheduler.py:133: note: def [_T] sorted(Iterable[_T], *, key: Callable[[_T], Union[SupportsDunderLT, SupportsDunderGT]], reverse: bool = ...) -> List[_T] distributed/dashboard/scheduler.py:138: error: "object" has no attribute "insert" Found 2 errors in 1 file (checked 1 source file) ```
I'm also happy to just close this PR. I've probably spent about as much time as I can justify here. If we can get this closed out soon then awesome, I'll keep pushing on it. If it's going t take more than, say, 30m of my time though then I'll need to prioritize elsewhere. |
Oh, I see, the |
See also dask/dask#8848 |
It's visible in the latest CI run:
diff --git a/distributed/dashboard/scheduler.py b/distributed/dashboard/scheduler.py
index 154aaafb2..a7dc0a053 100644
--- a/distributed/dashboard/scheduler.py
+++ b/distributed/dashboard/scheduler.py
@@ -108,7 +108,7 @@ applications = {
}
-template_variables = {
+template_variables: dict = {
"pages": [
"status",
"workers",
diff --git a/distributed/worker.py b/distributed/worker.py
index 5501d50ac..16d6d789a 100644
--- a/distributed/worker.py
+++ b/distributed/worker.py
@@ -54,11 +54,11 @@ from distributed.comm.utils import OFFLOAD_THRESHOLD
from distributed.compatibility import randbytes
from distributed.core import (
CommClosedError,
+ ConnectionPool,
Status,
coerce_to_address,
error_message,
pingpong,
- rpc,
send_recv,
)
from distributed.diagnostics import nvml
@@ -4657,7 +4657,7 @@ def benchmark_memory(
async def benchmark_network(
address: str,
- rpc: rpc,
+ rpc: ConnectionPool,
sizes: Iterable[str] = ("1 kiB", "10 kiB", "100 kiB", "1 MiB", "10 MiB", "50 MiB"),
duration="1 s",
) -> dict[str, float]:
I think we're almost done. |
Otherwise we let a lingering thread start, which makes some tests unhappy. There is some chance that the worker might have an executor which doesn't work well with asyncio. That seems rare enough and this feature seems fringe enough that I'm totally willing to take that chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Woot. Thanks for shepherding this through
…On Mon, Mar 28, 2022 at 4:49 AM crusaderky ***@***.***> wrote:
Merged #5966 <#5966> into main.
—
Reply to this email directly, view it on GitHub
<#5966 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTHMIHKVZPFL74JRHJDVCF6CJANCNFSM5RF3JXEA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This includes both a client-side function and a /hardware dashboard page