You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As before ( #64 ), we have a number of things we may want to do right before the v0.9 release
RemoteMixtureOfExperts uses grid_size while server side uses expert_pattern (e.g. ffn.[0:256].[0:256]), should we switch to expert_pattern everywhere? (@justheuristic )
Can we get rid of return_futures=True option in node.get_*
rename _IntermediateResult to SearchState, rationale: it is not necessarily a result, it also stores search data other than the result (@justheuristic )
currently test_utils is a package containing a single function used on just one occasion, should we just move it there?
should we make our daemons resistant to KeyboardInterrupt when they are running in background?
problem symptom: you're running in jupyter, you press ctrl+C and it kills all dht daemons
applies to: DHT, DecentralizedAverager, Server
naming collision: hivemind.utils.grpc actually points to the external package, not hivemind/utils/grpc.py. This is due to wildcard imports (from hivemind.utils.grpc import *). Should we switch away from them?
DHT.get_my_endpoint() implemented by pinging k random peers
currently we have no limit on gRPC message size in ConnectionHandler , this is a potential vulnerability for open infrastructure. Should we impose an upped limit and send large tensors in chunks?
TimedStorage: add .pop() method
DHT/DecentralizedAverager - rename return_future to something intuitive, e.g. sync=False
MPFuture: make sure await asyncio.create_task(mpfuture) works without extra wrappers
In DecentrlizedAverager there is a lock_averaged_tensor with is not an asyncio-friendly lock. If we acquire it for long, the averager will be paralyzed! We should run it in executor or make a special mp+aio lock.
The text was updated successfully, but these errors were encountered:
As before ( #64 ), we have a number of things we may want to do right before the v0.9 release
RemoteMixtureOfExperts uses grid_size while server side uses expert_pattern (e.g.
ffn.[0:256].[0:256]
), should we switch to expert_pattern everywhere? (@justheuristic )Should we use slots for data structures such as _IntermediateResult
Should we switch to declaring objects bottom-up vs top-down? For instance, _IntermediateResult is used before its declared (@mryab )
theguyshetoldyounottoworryabout rename it? (@mryab )
Can we get rid of return_futures=True option in node.get_*
rename _IntermediateResult to SearchState, rationale: it is not necessarily a result, it also stores search data other than the result (@justheuristic )
rename LocalStorage to TimedStorage (or similar)? reason: it is not always used as node's local storage (@justheuristic )
make alias DHTID.from(x) = DHTID.generate(source=x)? (@justheuristic )
why are we using umsgpack instead of just msgpack? (@justheuristic )
we force the same vector compression rules during forward & backward pass Vector compressing only for forward pass #99 (@Vsevolod-pl )
update documentation on hivemind.dht: subkeys, beam_search, no first_k_active
update documentation on hivemind.server: compression
consider setting maxsize to DHTValueType, e.g. 10k
similar: consider forcing maxsize on any DHT record
rename hivemind.utils.grpc -> hivemind.utils.protobuf ?
tests still raise warning about non-copyable ndarray in (de)serialize_torch_tensor
remove receiver_threads in DHT and DecetralizedAverager, hard-code 1 thread (@justheuristic )
update grpc version, replace grpc.experimental.aio to grpc.aio ( [Aio] Graduation from experimental folder grpc/grpc#23240 )
in hivemind.dht.DHT._get(endpoint) we're currently not caching channels at all. Should we?
Investigate segfault in our circleci env (@mryab @justheuristic )
perhaps we should move all heavy class definitions out of init.py modules (@mryab )
Allow different tensor compression schemes for backward pass? Vector compressing only for forward pass #99
rewrite test_dht_protocol, test_empty_table and test_dht_node to use await instead of loop.run_until_complete
what will the server do if it is given an input with incorrect requires_grad? (not as in schema)
proper channel caching: we could implement a centralized process-local channel cache
since we're using gRPC for all requests, we may be able to share one port for everything
should we replace prints in tests/benchmarks with logger.info? Do we need as many prints in dht tests? (@mryab , see added torch1.7 support, switch to grpc 1.33, grpc bump, improved tests & logging, #116 comments )
currently test_utils is a package containing a single function used on just one occasion, should we just move it there?
should we make our daemons resistant to KeyboardInterrupt when they are running in background?
naming collision: hivemind.utils.grpc actually points to the external package, not hivemind/utils/grpc.py. This is due to wildcard imports (from hivemind.utils.grpc import *). Should we switch away from them?
DHT.get_my_endpoint() implemented by pinging k random peers
currently we have no limit on gRPC message size in ConnectionHandler , this is a potential vulnerability for open infrastructure. Should we impose an upped limit and send large tensors in chunks?
TimedStorage: add .pop() method
DHT/DecentralizedAverager - rename return_future to something intuitive, e.g. sync=False
MPFuture: make sure
await asyncio.create_task(mpfuture)
works without extra wrappersIn DecentrlizedAverager there is a lock_averaged_tensor with is not an asyncio-friendly lock. If we acquire it for long, the averager will be paralyzed! We should run it in executor or make a special mp+aio lock.
The text was updated successfully, but these errors were encountered: