Process-wide channel cache for gRPC+aio #120

justheuristic · 2020-11-17T01:24:06Z

process-wide grpc channel/stub caching
- implemented hivemind.utils.grpc.ChannelCache
- extracted TimedStorage from hivemind.dht.storage to hivmind.utils.timed_storage
- use in hivemind/client/expert.py
  - verify no performance degradation
- use in hivemind/dht/protocol.py
  - verify no performance degradation (upd: got faster)
- use in hivemind.client.allreduce.py
  - verify no performance degradation
dht.get_endpoint() to discover this peer's current public endpoint from peers
implement a basic synchronization check for get_dht_time()

process-wide channel caching

Currently, each time we query a remote peer with gRPC, we have to create a new channel.
In contrast, gRPC best practices recommend reusing channels for multiple rpc calls

hivemind/client/expert.py introduced channel caching, but it had a side-effect of keeping channels open forever

In this PR we implement a process-wide ChannelCache object that keeps track of open channels. The code is largely inspired by https://github.com/grpc/grpc/blob/master/src/python/grpcio/grpc/_simple_stubs.py , but with the added support for grpc.aio channels.

justheuristic · 2020-11-17T16:15:23Z

upd - from discussion with @mryab -

we decided to postpone these two to a #98 as they are not essential for ICML experiments

hivemind/utils/timed_storage.py

mryab · 2020-11-17T17:11:13Z

hivemind/client/allreduce.py

@@ -150,9 +150,7 @@ async def accumulate(self, source: Endpoint, part: torch.Tensor) -> torch.Tensor
        return await self.averaged_part

    def _get(self, peer: Endpoint) -> averaging_pb2_grpc.DecentralizedAveragingStub:


I believe _get is slightly overloaded here, how about _get_peer_stub?

its mostly a convention with DHTProtocol._get, should we rename both?

mryab · 2020-11-17T17:18:07Z

hivemind/utils/grpc.py

+from __future__ import annotations
+import os
+import threading
+from typing import NamedTuple, Sequence, Tuple, Optional, Union, Any, Dict, TypeVar, Type


If we're using from __future__ import annotations, do we need to import tuple and dict from typing?

i agree, but i would request to keep it as is AND write a todo for #98 , ok?

hivemind/utils/grpc.py

mryab · 2020-11-17T17:35:15Z

hivemind/utils/grpc.py

+    MAXIMUM_CHANNELS = os.environ.get("GRPC_PYTHON_MANAGED_CHANNEL_MAXIMUM", 4096)
+    EVICTION_PERIOD_SECONDS = os.environ.get("GRPC_PYTHON_MANAGED_CHANNEL_EVICTION_SECONDS", 10 * 60)
+    logger.debug(f"Eviction period = {EVICTION_PERIOD_SECONDS}s, max channels = {MAXIMUM_CHANNELS}")


These constants should be module-level

(fixed, thanks!)

upd - it complicates testing, added to #98

mryab · 2020-11-17T17:40:39Z

hivemind/utils/grpc.py

+    _eviction_thread: threading.Thread
+    _nearest_expiration_time: DHTExpiration
+    _is_active: bool


These are instance attributes and are declared during init

Co-authored-by: Max Ryabinin <[email protected]>

… into dht_update_nov

justheuristic force-pushed the dht_update_nov branch 2 times, most recently from e7ca5e9 to e9e47d1 Compare November 17, 2020 16:13

justheuristic requested a review from mryab November 17, 2020 16:16

justheuristic marked this pull request as ready for review November 17, 2020 16:16

justheuristic changed the title ~~miscelaneous DHT updates~~ Process-wide channel cache for gRPC+aio Nov 17, 2020

justheuristic and others added 23 commits November 17, 2020 19:21

a harsher maxsize test

f1ac8c1

use rtol/atol

623079b

add protobuf dependency

60769c5

WIP

25a77fd

extract TimedStorage to utils

8d9f707

unused imports

9f51d85

WIP do not start eviction thread

de4334c

clear event

f9a54e7

better event

03dac7c

better event

1bfafe1

better event

12df59d

better event

d83e5c7

4096

5165696

reorder

f87a3c4

better typing

71ff54e

terminate early

ea8733c

lock init

5d7c46f

clarify method name

69376f5

use channel cache in averager

43fe041

super typo

b1164aa

fix size

4e2ddb6

add test for ChannelCache

87efecd

bump verson

f6b484e

justheuristic force-pushed the dht_update_nov branch from 36efba0 to f6b484e Compare November 17, 2020 16:29

justheuristic and others added 3 commits November 17, 2020 20:01

remove rtol

749e71b

review

1931881

rm rtol

3b862de

mryab requested changes Nov 17, 2020

View reviewed changes

justheuristic and others added 7 commits November 17, 2020 20:47

Update hivemind/utils/grpc.py

ea7b603

Co-authored-by: Max Ryabinin <[email protected]>

Update hivemind/utils/grpc.py

1771a37

Co-authored-by: Max Ryabinin <[email protected]>

_get_*_stub

c7b1020

Merge branch 'dht_update_nov' of github.com:learning-at-home/hivemind…

90c9c5e

… into dht_update_nov

make global variables actually global

9eb803a

address review by mryab@

6036139

rollback

9ccc9dc

mryab approved these changes Nov 17, 2020

View reviewed changes

trigger rebuild

ed5e346

justheuristic merged commit 1754792 into master Nov 17, 2020

justheuristic deleted the dht_update_nov branch November 17, 2020 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process-wide channel cache for gRPC+aio #120

Process-wide channel cache for gRPC+aio #120

justheuristic commented Nov 17, 2020 •

edited

Loading

justheuristic commented Nov 17, 2020 •

edited

Loading

mryab Nov 17, 2020

justheuristic Nov 17, 2020

mryab Nov 17, 2020

justheuristic Nov 17, 2020

mryab Nov 17, 2020

justheuristic Nov 17, 2020

justheuristic Nov 17, 2020

mryab Nov 17, 2020

justheuristic Nov 17, 2020

		@@ -150,9 +150,7 @@ async def accumulate(self, source: Endpoint, part: torch.Tensor) -> torch.Tensor
		return await self.averaged_part

		def _get(self, peer: Endpoint) -> averaging_pb2_grpc.DecentralizedAveragingStub:

Process-wide channel cache for gRPC+aio #120

Process-wide channel cache for gRPC+aio #120

Conversation

justheuristic commented Nov 17, 2020 • edited Loading

process-wide channel caching

justheuristic commented Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justheuristic commented Nov 17, 2020 •

edited

Loading

justheuristic commented Nov 17, 2020 •

edited

Loading