DHT Benchmark with asynchronous w/r #406

MuXauJl11110 · 2021-11-07T19:11:19Z

Based on #350

justheuristic · 2021-11-07T19:30:14Z

Hi!
Please refer to contributing.md#code-style for code style and how to convert the code automatically.

justheuristic · 2021-11-07T19:34:03Z

benchmarks/benchmark_dht.py

+    return value, expiration
+
+
+async def corouting_task(


Consider renaming this into something more informative, e.g. store_and_get_task

codecov · 2021-11-07T19:34:20Z

Codecov Report

Merging #406 (6bdb734) into master (5d31c3b) will decrease coverage by 0.05%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master     #406      +/-   ##
==========================================
- Coverage   83.52%   83.46%   -0.06%     
==========================================
  Files          77       77              
  Lines        7783     7785       +2     
==========================================
- Hits         6501     6498       -3     
- Misses       1282     1287       +5

Impacted Files	Coverage Δ
hivemind/utils/mpfuture.py	`94.09% <66.66%> (-1.33%)`	⬇️
hivemind/optim/experimental/progress_tracker.py	`98.28% <0.00%> (-1.15%)`	⬇️

justheuristic · 2021-11-07T19:34:30Z

benchmarks/benchmark_dht.py

+    return store_ok
+
+
+async def get_task(peer, key):


This is function is essentially a single expression that is used once,
perhaps it would be best to inline it into the main coro so that one can read it top-to-bottom without looking up auxiliary functions

justheuristic · 2021-11-07T19:47:35Z

Let's create a task that runs this benchmark (and the two remaining ones) on pull-request using the same python version as codecov_in_develop_mode

More on other benchmarks:
https://learning-at-home.readthedocs.io/en/latest/user/benchmarks.html

python benchmark_tensor_compression.py
python benchmark_throughput.py --preset minimalistic
[enter whichever command runs your benchmark here]

For instance, you can create a separate job in run_tests.yml

Rationale:

why CI: we often break benchmarks, introducing CI will ensure that all benchmarks work
- for example, benchmark_averaging is currently broken, probably my fault
why separate CI job: we could add this to tests, but it would extend the (already long) test runtime

justheuristic · 2021-11-07T19:56:11Z

Before merge

style (black/isort)
add CI job with benchmarks
measure global correctness rate and time to store/get (based on A better DHT benchmark #350 )
change latest from random variable into a parameter (based on A better DHT benchmark #350 )
TODO @justheuristic re-run benchmark for docs/bechmarking.html

If you still have time after that, lets implement the failure rate as described in #350

justheuristic · 2021-11-07T20:01:44Z

benchmarks/benchmark_dht.py

-    logger.info(f"Sampled {len(expert_uids)} unique ids (after deduplication)")
-    random.shuffle(expert_uids)
+    task_list = [
+        loop.create_task(


consider using asyncio.run with asyncio.create_task

optionally make the whole bennchmark async and asyncio-run it from main

justheuristic · 2021-11-07T20:03:40Z

benchmarks/benchmark_dht.py

-        logger.warning(
-            "keys expired midway during get requests. If that isn't desired, increase expiration_time param"
-        )
+    loop.run_until_complete(asyncio.wait(task_list))


Suggested change

loop.run_until_complete(asyncio.wait(task_list))

loop.run_until_complete(asyncio.gather(*task_list))

mryab · 2021-11-07T21:16:40Z

.github/workflows/check-style.yml

-on: [ push ]
+on: [ push, pull_request ]


I'm not sure if these changes are necessary for this PR; if possible, I'd keep the diff as short as possible

They are, the required tests do not run in a forked PR w/o this change

justheuristic · 2021-11-15T11:30:12Z

benchmarks/benchmark_dht.py

-    args = vars(parser.parse_args())
+    parser.add_argument("--expiration", type=float, default=300, required=False)
+    parser.add_argument("--latest", type=bool, default=True, required=False)
+    parser.add_argument("--failure_rate", type=float, default=0.1, required=False)


Let's keep the option to increase the file limit, for the sake of benchmarking with a very large number of peers.

justheuristic · 2021-11-15T11:34:29Z

benchmarks/benchmark_dht.py

+
+        store_start = time.perf_counter()
+        store_peers = random.sample(peers, min(num_store_peers, len(peers)))
+        store_tasks = [store_task(peer, key, value, expiration) for peer in store_peers]


Suggested change

store_tasks = [store_task(peer, key, value, expiration) for peer in store_peers]

subkeys = [uuid.uuid4().hex for peer in store_peers]

store_tasks = [peer.store(

peer, key, subkey=subkey, value=value, get_dht_time() + expiration, return_future=True)

for peer, subkey in zip(store_peers, subkeys)]

To the best of my knowledge, this coro is only used once. I would recommend either of:

inlining it: see suggestion above

or formatting it: add docstring and type hints

justheuristic · 2021-11-15T11:36:46Z

benchmarks/benchmark_dht.py

+    expiration: float,
+    latest: bool,
+    failure_rate: float,
+):


Suggested change

):

) -> Tuple[int, int, int, int]:

"""Iteratively choose random peers to store data onto the dht, then retreive with another random subset of peers"""

justheuristic · 2021-11-15T11:37:35Z

benchmarks/benchmark_dht.py

+        get_start = time.perf_counter()
+        get_peers = random.sample(peers, min(num_get_peers, len(peers)))
+        get_tasks = [peer.get(key, latest, return_future=True) for peer in get_peers]
+        get_result, _ = await asyncio.wait(get_tasks, return_when=asyncio.ALL_COMPLETED)


Suggested change

get_result, _ = await asyncio.wait(get_tasks, return_when=asyncio.ALL_COMPLETED)

get_result = await asyncio.gather(*get_tasks)

justheuristic · 2021-11-15T13:11:38Z

.github/workflows/run-benchmarks.yml

+          cd benchmarks
+          python benchmark_throughput.py
+          python benchmark_tensor_compression.py
+          python benchmark_dht.py


Suggested change

python benchmark_dht.py

python benchmark_dht.py

[add \n]

justheuristic · 2021-11-15T13:12:24Z

.github/workflows/run-benchmarks.yml

+      - name: Benchmark
+        run: |
+          cd benchmarks
+          python benchmark_throughput.py


please choose presets that fit into the time limit, e.g. --preset minimalistic

justheuristic · 2021-11-22T12:31:28Z

.github/workflows/run-benchmarks.yml

+          python benchmark_throughput.py --preset minimalistic
+          python benchmark_tensor_compression.py
+          python benchmark_dht.py
+


Initial commit

d8d0da9

justheuristic self-requested a review November 7, 2021 19:14

Michael added 4 commits November 7, 2021 22:18

Test

9322272

Add changes to run-tests.yml

d2b72c0

Add changes to run-tests.yml(1)

5e39b1a

Add changes to check-style.yml

8456366

justheuristic reviewed Nov 7, 2021

View reviewed changes

mryab reviewed Nov 7, 2021

View reviewed changes

borzunov self-requested a review November 8, 2021 13:47

Michael added 2 commits November 12, 2021 15:44

Add time and success rate measurements

a9d010d

Add run-benchmarks.yml and change some output of benchmark_dht.py

d7ca5ae

justheuristic requested changes Nov 15, 2021

View reviewed changes

justheuristic requested changes Nov 16, 2021

View reviewed changes

Michael and others added 4 commits November 16, 2021 19:13

Some changes at run-benchmarks.yml

7b722c8

Merge branch 'master' into master

f8d7fb0

Add failure rate implementation

d059d1b

Merge branch 'master' of github.com:MuXauJl11110/hivemind

8f4fff2

justheuristic marked this pull request as ready for review November 22, 2021 12:30

Merge branch 'master' into master

90fb157

justheuristic reviewed Nov 22, 2021

View reviewed changes

justheuristic approved these changes Nov 22, 2021

View reviewed changes

justheuristic added 2 commits November 22, 2021 15:40

Update mpfuture.py

e76206e

empty line at the end

3900e08

justheuristic and others added 6 commits November 22, 2021 15:42

Update run-benchmarks.yml

b3aea09

Merge branch 'master' into master

ce7f2c4

Add NodeKiller class and modify outpus

51c85c5

Merge branch 'master' of github.com:MuXauJl11110/hivemind

92b78bc

Add None check at store_and_get_task

7ca7e9d

Merge branch 'master' into master

6bdb734

justheuristic merged commit a960438 into learning-at-home:master Nov 30, 2021

justheuristic mentioned this pull request Dec 15, 2021

A better DHT benchmark #350

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DHT Benchmark with asynchronous w/r #406

DHT Benchmark with asynchronous w/r #406

MuXauJl11110 commented Nov 7, 2021

justheuristic commented Nov 7, 2021 •

edited

Loading

justheuristic Nov 7, 2021 •

edited

Loading

codecov bot commented Nov 7, 2021 •

edited

Loading

justheuristic Nov 7, 2021 •

edited

Loading

justheuristic commented Nov 7, 2021 •

edited

Loading

justheuristic commented Nov 7, 2021 •

edited

Loading

justheuristic Nov 7, 2021 •

edited

Loading

justheuristic Nov 7, 2021

mryab Nov 7, 2021

justheuristic Nov 8, 2021 •

edited

Loading

justheuristic Nov 15, 2021

justheuristic Nov 15, 2021

justheuristic Nov 15, 2021

justheuristic Nov 15, 2021

justheuristic Nov 15, 2021

justheuristic Nov 15, 2021

justheuristic Nov 22, 2021

	loop.run_until_complete(asyncio.wait(task_list))
	loop.run_until_complete(asyncio.gather(*task_list))

-        store_tasks = [store_task(peer, key, value, expiration) for peer in store_peers]
+        subkeys = [uuid.uuid4().hex for peer in store_peers]
+        store_tasks = [peer.store(
+            peer, key, subkey=subkey, value=value, get_dht_time() + expiration, return_future=True)
+            for peer, subkey in zip(store_peers, subkeys)]

	):
	) -> Tuple[int, int, int, int]:
	"""Iteratively choose random peers to store data onto the dht, then retreive with another random subset of peers"""

	get_result, _ = await asyncio.wait(get_tasks, return_when=asyncio.ALL_COMPLETED)
	get_result = await asyncio.gather(*get_tasks)

DHT Benchmark with asynchronous w/r #406

DHT Benchmark with asynchronous w/r #406

Conversation

MuXauJl11110 commented Nov 7, 2021

justheuristic commented Nov 7, 2021 • edited Loading

justheuristic Nov 7, 2021 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Nov 7, 2021 • edited Loading

Codecov Report

justheuristic Nov 7, 2021 • edited Loading

Choose a reason for hiding this comment

justheuristic commented Nov 7, 2021 • edited Loading

justheuristic commented Nov 7, 2021 • edited Loading

justheuristic Nov 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justheuristic Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justheuristic commented Nov 7, 2021 •

edited

Loading

justheuristic Nov 7, 2021 •

edited

Loading

codecov bot commented Nov 7, 2021 •

edited

Loading

justheuristic Nov 7, 2021 •

edited

Loading

justheuristic commented Nov 7, 2021 •

edited

Loading

justheuristic commented Nov 7, 2021 •

edited

Loading

justheuristic Nov 7, 2021 •

edited

Loading

justheuristic Nov 8, 2021 •

edited

Loading