1.0.0: hivemind.Optimizer, improved averaging stability, better logging
What's Changed
- Fix averager speed for TCP connections by @borzunov in #373
- Fix "Too many open files" and load state freezing by @justheuristic in #371
- Prefetch while reading rpc_aggregate_part() outputs by @borzunov in #370
- Use ModeClient in libp2p DHT in case of --client_mode by @borzunov in #374
- Integrate p2pd logs and outputs into hivemind logging by @borzunov in #375
- Split compression strategies into separate classes by @justheuristic in #366
- Implement colored logs by @borzunov in #377
- Parametrize max message size for persistent connections by @deniskamazur in #376
- Make log handlers configurable, shorten entries by @borzunov in #378
- Enable log handler in benchmarks and run_server by @borzunov in #380
- Fix step_tolerance in CollaborativeOptimizer by @justheuristic in #383
- Fix pickle vulnerability by @deniskamazur in #386
- Remove arguments with default values from example instructions by @borzunov in #388
- Implement weight as part of the allreduce protocol, not matchmaking by @justheuristic in #384
- Support different AMP & buffer configurations in one experiment, fix minor bugs by @justheuristic in #389
- Fix codecov_in_develop_mode with pip>=21.2 by @justheuristic in #393
- Fix minor issues in documentation by @borzunov in #392
- Apply averager updates asynchronously by @justheuristic in #395
- Fix schema typing by @justheuristic in #396
- backport PerformanceEMA from server_side_averaging by @justheuristic in #397
- Add an option to pre-schedule averaging by @justheuristic in #398
- Move DHT to dht/dht.py, update DHT figure by @justheuristic in #399
- [hotfix] replace StepControl.can_modify with began_allreduce by @justheuristic in #402
- move PerformanceEMA to utils, TrainingAverager to optim, update utils by @justheuristic in #405
- Add GradientAverager with support for delayed averaging by @justheuristic in #404
- [hivemind.Optimizer] TrainingStateAverager by @justheuristic in #407
- Catch OSError in MPFuture by @artek0chumak in #409
- [hivemind.Optimizer] ProgressTracker by @justheuristic in #408
- Fix minor bugs in GradientAverager by @justheuristic in #410
- Make target group size optional by @justheuristic in #412
- Prepare GradScaler for hivemind.Optimizer by @justheuristic in #413
- Patch recursive cancel in StepControl by @justheuristic in #411
- Replace the invalid link to discord by @artek0chumak in #414
- Implement state sharing priority by @justheuristic in #415
- Implement core functionality of hivemind.Optimizer by @justheuristic in #403
- DHT Benchmark with asynchronous w/r by @MuXauJl11110 in #406
- Hotfix: load_state_from_peers with offload_optimizer by @justheuristic in #417
- Improve Optimizer docs, update quickstart to use Optimizer by @justheuristic in #416
- Quickstart: typos and references by @justheuristic in #420
- Remove trailing dots in log messages and errors by @borzunov in #419
- Do not log caller for INFO messages by @borzunov in #418
- Improve hivemind.optim.experimental and averager stability by @borzunov in #421
- Add minor tweaks learned from the NeurIPS demo run by @justheuristic in #422
- Improve All-Reduce fault-tolerance by @justheuristic in #423
- Fix All-Reduce fault-tolerance: catch Exception instead of BaseException by @justheuristic in #424
- Fix Task was destroeyd but is pending (put items) by @justheuristic in #427
- Use hivemind.Optimizer in examples/albert by @mryab in #426
New Contributors
- @artek0chumak made their first contribution in #409
- @MuXauJl11110 made their first contribution in #406
Full Changelog: 0.10.0...1.0.0