Add a hardware benchmark to test memory, disk, and network bandwidths #5966

mrocklin · 2022-03-20T18:29:53Z

This includes both a client-side function and a /hardware dashboard page

mrocklin · 2022-03-20T18:36:42Z

@quasiben you all might find this interesting

github-actions · 2022-03-20T20:39:50Z

Unit Test Results

      12 files ±  0       12 suites ±0 6h 9m 16s ⏱️ - 9m 38s
  2 673 tests +  3   2 591 ✔️ +  4   81 💤 - 1 1 ❌ +1
15 942 runs +18 15 083 ✔️ +24 858 💤 - 5 1 ❌ ±0

For more details on these failures, see this check.

Results for commit 67f7514. ± Comparison against base commit 2fbf9eb.

♻️ This comment has been updated with latest results.

mrocklin · 2022-03-21T18:54:07Z

cc @crusaderky if you have time to take a quick look

ncclementi · 2022-03-21T19:23:15Z

@mrocklin quick comment: can you add the corresponding entry to this doc-page please https://github.com/dask/distributed/blob/main/docs/source/http_services.rst

quasiben · 2022-03-21T20:56:01Z

This is awesome!

I think we should also build client.go_burr() cc @jacobtomlinson

mrocklin · 2022-03-21T22:38:45Z

can you add the corresponding entry to this doc-page please

I've gone ahead and done this, but I'm not sure that I agree with that page generally.

Historically distributed.dask.org gets almost no traffic. We've really focused on docs.dask.org. I don't think that we should be investing much in writing these docs unless we really invest in them. Instead I think that we should be focusing on docs.dask.org. I'm also not sure how much value there is in a listing of some of the dashboard pages. I might suggest a few alternatives:

If we're going to document the dashboard then let's really document the dashboard, showing what pages do what with images, diagrams, videos, and so on
If we're going for more of API documentation then we might consider adding these to the docstrings and using automatically generated sphinx style docs. If we do this, I'd recommend moving this over to docs.dask.org
Just don't bother doing this, especially if it's not getting read

Personally, I would do 1 or 3

mrocklin · 2022-03-21T22:58:34Z

Yeah, last week twelve people went to that page. 100% of them bounced in about a minute. Rather than continue spending dev time developing that page in its current style I'm inclined instead to remove it. Thoughts?

distributed/scheduler.py

distributed/worker.py

mrocklin · 2022-03-22T13:16:09Z

Thank you for the thorough feedback @crusaderky. I've handled or responded to all of your comments and pushed an update.

ncclementi · 2022-03-22T14:37:14Z

Historically distributed.dask.org gets almost no traffic. We've really focused on docs.dask.org. I don't think that we should be investing much in writing these docs unless we really invest in them. Instead, I think that we should be focusing on docs.dask.org. I'm also not sure how much value there is in a listing of some of the dashboard pages. I might suggest a few alternatives:
1. If we're going to document the dashboard then let's really document the dashboard, showing what pages do what with images, diagrams, videos, and so on
2. If we're going for more of API documentation then we might consider adding these to the docstrings and using automatically generated sphinx style docs. If we do this, I'd recommend moving this over to docs.dask.org
3. Just don't bother doing this, especially if it's not getting read
Personally, I would do 1 or 3

Matt, I agree that there should be more dashboard documentation and there is been an effort in starting that (see https://docs.dask.org/en/latest/dashboard.html) the truth is that outside the main status page we don't have proper docs, and the dashboard docstrings are not necessarily good. The HTTP endpoints page is the only place where are all the names of the dashboard pages and it's a good reference to find which tab corresponds to which plot.

Yeah, last week twelve people went to that page. 100% of them bounced in about a minute. Rather than continue spending dev time developing that page in its current style I'm inclined instead to remove it. Thoughts?

Those twelve people I believe were Bryan, myself, and a couple more trying to find some information about certain plots, and it was helpful. I agree it's not the best one. But until we have better docs on every page, I'd think we should keep it. Besides the effort in maintaining this page is to add one line every time a plot is added, which I don't find to be a much of a problem

mrocklin · 2022-03-22T15:25:59Z

Those twelve people I believe were Bryan, myself, and a couple more trying to find some information about certain plots,

To be clear, when I say "twelve" I really mean that I'm rounding that down to zero. If there weren't at least hundreds of views (preferably thousands) then in my mind the doc page doesn't really provide value.

But until we have better docs on every page, I'd think we should keep it. Besides the effort in maintaining this page is to add one line every time a plot is added, which I don't find to be a much of a problem

Eh, things like this add inertia to the dev process. Asing everyone who makes a plot to add a line to a doc page that no one sees feels like unneccessary and useless work to me. I'd rather that we trim off things that aren't useful and focus more on critical issues.

If we want to document the dashboard then great, but we need to work on documentation that has an impact. I'm against asking people to do work that doesn't have an impact. If the answer is "we should document this somewhere" then sure, I agree, but it needs to be somewhere that people actually look. Currently the dashboard is undocumented. That page isn't currently helping to solve the problem.

distributed/scheduler.py

distributed/tests/test_client.py

distributed/worker.py

distributed/scheduler.py

distributed/dashboard/components/scheduler.py

Co-authored-by: crusaderky <[email protected]>

crusaderky · 2022-03-23T14:31:28Z

dashboard test is failing, likely because of the task spawned by the new widget:
https://github.com/dask/distributed/runs/5660831076?check_suite_focus=true

distributed/dashboard/components/scheduler.py

crusaderky · 2022-03-23T14:59:07Z

I can't find the new widget in the [More...] dropdown in the GUI splash page.
I guess it's because of this line?

distributed/distributed/dashboard/scheduler.py

Line 118 in b872d45

"plots": [x.replace("/", "") for x in applications if "individual" in x],

mrocklin · 2022-03-24T15:01:58Z

I can't find the new widget in the [More...] dropdown in the GUI splash page.
I guess it's because of this line?

That's also my read of how things worked. I've added this but am still not seeing it locally. I'm not yet sure why.

distributed/worker.py

Co-authored-by: crusaderky <[email protected]>

distributed/dashboard/components/scheduler.py

crusaderky · 2022-03-25T15:00:13Z

distributed/scheduler.py

@@ -7329,6 +7331,51 @@ async def get_call_stack(self, keys=None):
        response = {w: r for w, r in zip(workers, results) if r}
        return response

+    async def benchmark_hardware(self) -> "dict[str, dict[str, float]]":


Could you add a docstring and explain the output?

crusaderky · 2022-03-25T15:05:27Z

Outstanding points:

add docstring to Scheduler.benchmark_hardware
failing test test_scheduler_bokeh.py::test_simple
failing linters
plot does not appear in [More...] tab

This didn't pass pre-commit checks. I don't understand mypy well enough to diagnose why it's upset here. ``` distributed/dashboard/scheduler.py:133: error: No overload variant of "sorted" matches argument types "object", "Callable[[Any], Any]" distributed/dashboard/scheduler.py:133: note: Possible overload variants: distributed/dashboard/scheduler.py:133: note: def [SupportsRichComparisonT] sorted(Iterable[SupportsRichComparisonT], *, key: None = ..., reverse: bool = ...) -> List[SupportsRichComparisonT] distributed/dashboard/scheduler.py:133: note: def [_T] sorted(Iterable[_T], *, key: Callable[[_T], Union[SupportsDunderLT, SupportsDunderGT]], reverse: bool = ...) -> List[_T] distributed/dashboard/scheduler.py:138: error: "object" has no attribute "insert" Found 2 errors in 1 file (checked 1 source file) ```

mrocklin · 2022-03-25T18:36:38Z

add docstring to Scheduler.benchmark_hardware

Done
failing test test_scheduler_bokeh.py::test_simple

I'm not seeing this
failing linters

I'm not seeing this either, but I'll check CI when it starts up again
plot does not appear in [More...] tab

I found out what was going on here and fixed it. However mypy is unhappy and I'm not sure why.

I'm also happy to just close this PR. I've probably spent about as much time as I can justify here. If we can get this closed out soon then awesome, I'll keep pushing on it. If it's going t take more than, say, 30m of my time though then I'll need to prioritize elsewhere.

…hmark

mrocklin · 2022-03-25T19:11:41Z

failing linters
I'm not seeing this either, but I'll check CI when it starts up again

Oh, I see, the duration: str change that you suggested forced an issue with parse_timedelta. I reverted the typing change.

mrocklin · 2022-03-25T19:18:27Z

See also dask/dask#8848

crusaderky · 2022-03-25T20:45:15Z

failing test test_scheduler_bokeh.py::test_simple
I'm not seeing this

It's visible in the latest CI run:
https://github.com/dask/distributed/runs/5697032918?check_suite_focus=true

def check_thread_leak():
E               AssertionError: (<Thread(ThreadPoolExecutor-100_1, started daemon 140416036894464)>, ['  File "distributed/worker.py", line 4622, in benchmark_disk

failing linters
I found out what was going on here and fixed it. However mypy is unhappy and I'm not sure why.

diff --git a/distributed/dashboard/scheduler.py b/distributed/dashboard/scheduler.py
index 154aaafb2..a7dc0a053 100644
--- a/distributed/dashboard/scheduler.py
+++ b/distributed/dashboard/scheduler.py
@@ -108,7 +108,7 @@ applications = {
 }
 
 
-template_variables = {
+template_variables: dict = {
     "pages": [
         "status",
         "workers",
diff --git a/distributed/worker.py b/distributed/worker.py
index 5501d50ac..16d6d789a 100644
--- a/distributed/worker.py
+++ b/distributed/worker.py
@@ -54,11 +54,11 @@ from distributed.comm.utils import OFFLOAD_THRESHOLD
 from distributed.compatibility import randbytes
 from distributed.core import (
     CommClosedError,
+    ConnectionPool,
     Status,
     coerce_to_address,
     error_message,
     pingpong,
-    rpc,
     send_recv,
 )
 from distributed.diagnostics import nvml
@@ -4657,7 +4657,7 @@ def benchmark_memory(
 
 async def benchmark_network(
     address: str,
-    rpc: rpc,
+    rpc: ConnectionPool,
     sizes: Iterable[str] = ("1 kiB", "10 kiB", "100 kiB", "1 MiB", "10 MiB", "50 MiB"),
     duration="1 s",
 ) -> dict[str, float]:

I'm also happy to just close this PR

I think we're almost done.

Otherwise we let a lingering thread start, which makes some tests unhappy. There is some chance that the worker might have an executor which doesn't work well with asyncio. That seems rare enough and this feature seems fringe enough that I'm totally willing to take that chance.

crusaderky

LGTM. Thanks!

mrocklin · 2022-03-28T12:51:46Z

Woot. Thanks for shepherding this through

…

On Mon, Mar 28, 2022 at 4:49 AM crusaderky ***@***.***> wrote: Merged #5966 <#5966> into main. — Reply to this email directly, view it on GitHub <#5966 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTHMIHKVZPFL74JRHJDVCF6CJANCNFSM5RF3JXEA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mrocklin added 3 commits March 18, 2022 21:10

Add first draft of benchmark code

b5f9eca

set up scheduler and worker routes

15a411b

Add hardware benchmark page to dashboard

e26ae0d

cleanup hardware benchmark test

fb1ffcf

mrocklin changed the title ~~Adds a hardware benchmark to test memory, disk, and network bandwidths~~ Add a hardware benchmark to test memory, disk, and network bandwidths Mar 21, 2022

add reference to /hardware in http_services page

609ffdc

crusaderky requested changes Mar 22, 2022

View reviewed changes

Respond to feedback

b463a7d

mrocklin added 2 commits March 22, 2022 13:43

add randbytes to compatibility

685ada0

Have workers that only send or receive, not both

04ba089

crusaderky requested changes Mar 23, 2022

View reviewed changes

mrocklin and others added 3 commits March 23, 2022 07:50

Apply suggestions from code review

3177fc6

Co-authored-by: crusaderky <[email protected]>

add docstrings

6ea0675

Merge branch 'main' of github.com:dask/distributed into benchmark

4ee3a2b

crusaderky reviewed Mar 23, 2022

View reviewed changes

distributed/dashboard/components/scheduler.py Outdated Show resolved Hide resolved

mrocklin added 2 commits March 24, 2022 08:47

add hardware to More... plots

50bd49e

Update factor ranges dynamically

d3b5baa

fixup test

6b53f4a

crusaderky requested changes Mar 25, 2022

View reviewed changes

distributed/worker.py Outdated Show resolved Hide resolved

distributed/worker.py Outdated Show resolved Hide resolved

distributed/worker.py Outdated Show resolved Hide resolved

mrocklin and others added 2 commits March 25, 2022 09:35

change update

5c484ae

Apply suggestions from code review

f24f2ad

Co-authored-by: crusaderky <[email protected]>

crusaderky reviewed Mar 25, 2022

View reviewed changes

distributed/dashboard/components/scheduler.py Outdated Show resolved Hide resolved

Update distributed/dashboard/components/scheduler.py

1916f0c

crusaderky reviewed Mar 25, 2022

View reviewed changes

mrocklin added 2 commits March 25, 2022 12:57

add docstring

ca61cac

mrocklin added 2 commits March 25, 2022 13:58

Merge branch 'benchmark' of github.com:mrocklin/distributed into benc…

9d7d4d6

…hmark

Merge branch 'main' of github.com:dask/distributed into benchmark

8a0fb9c

mrocklin added 2 commits March 25, 2022 15:50

types

1b97b5d

crusaderky approved these changes Mar 28, 2022

View reviewed changes

crusaderky merged commit 06170d5 into dask:main Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a hardware benchmark to test memory, disk, and network bandwidths #5966

Add a hardware benchmark to test memory, disk, and network bandwidths #5966

mrocklin commented Mar 20, 2022

mrocklin commented Mar 20, 2022

github-actions bot commented Mar 20, 2022 •

edited

Loading

mrocklin commented Mar 21, 2022

ncclementi commented Mar 21, 2022

quasiben commented Mar 21, 2022

mrocklin commented Mar 21, 2022

mrocklin commented Mar 21, 2022

mrocklin commented Mar 22, 2022

ncclementi commented Mar 22, 2022 •

edited

Loading

mrocklin commented Mar 22, 2022

crusaderky commented Mar 23, 2022

crusaderky commented Mar 23, 2022

mrocklin commented Mar 24, 2022

crusaderky Mar 25, 2022

crusaderky commented Mar 25, 2022 •

edited

Loading

mrocklin commented Mar 25, 2022

mrocklin commented Mar 25, 2022

mrocklin commented Mar 25, 2022

crusaderky commented Mar 25, 2022

crusaderky left a comment

mrocklin commented Mar 28, 2022 via email

Add a hardware benchmark to test memory, disk, and network bandwidths #5966

Add a hardware benchmark to test memory, disk, and network bandwidths #5966

Conversation

mrocklin commented Mar 20, 2022

mrocklin commented Mar 20, 2022

github-actions bot commented Mar 20, 2022 • edited Loading

Unit Test Results

mrocklin commented Mar 21, 2022

ncclementi commented Mar 21, 2022

quasiben commented Mar 21, 2022

mrocklin commented Mar 21, 2022

mrocklin commented Mar 21, 2022

mrocklin commented Mar 22, 2022

ncclementi commented Mar 22, 2022 • edited Loading

mrocklin commented Mar 22, 2022

crusaderky commented Mar 23, 2022

crusaderky commented Mar 23, 2022

mrocklin commented Mar 24, 2022

crusaderky Mar 25, 2022

Choose a reason for hiding this comment

crusaderky commented Mar 25, 2022 • edited Loading

mrocklin commented Mar 25, 2022

mrocklin commented Mar 25, 2022

mrocklin commented Mar 25, 2022

crusaderky commented Mar 25, 2022

crusaderky left a comment

Choose a reason for hiding this comment

mrocklin commented Mar 28, 2022 via email

github-actions bot commented Mar 20, 2022 •

edited

Loading

ncclementi commented Mar 22, 2022 •

edited

Loading

crusaderky commented Mar 25, 2022 •

edited

Loading