Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/correct bootstrapping #384

Merged
merged 9 commits into from
Oct 11, 2019

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Aug 19, 2019

For issue #375 & #387

K-bucket work at libp2p/go-libp2p-kbucket#38

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

dht.go Outdated
// reset the timer for the k-bucket we just searched in ONLY if there was no error
// so that we can retry during the next bootstrap
bucket := dht.routingTable.BucketForPeer(id)
bucket.ResetLastQueriedAt(time.Now())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'm sorry, I think I got this wrong. I think we have to update buckets based on the key we're querying (and we can do it whether or not the query succeeds, really), not based on whether or not we've contacted a peer in that bucket. Otherwise, we won't fill that bucket, we'll just add a single node.

We'll probably need to do this in several places.

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien Ah... I see exactly what you mean. Any lookup for a key in a bucket will result in attempts to add peers closer to that key to the bucket and that is essentially a lookup/query on the bucket.

Now, a key for a bucket can be either a peer or content since we combine them into the same keyspace for k-buckets. So, we should refresh a bucket when we lookup either of them.

I've updated the following functions in routing.go to refresh a bucket when we lookup a peer/content in it.

  1. PutValue
  2. SearchValue (This covers GetValue as well)
  3. GetValues
  4. Provide
  5. FindProvidersAsync (This covers FindProviders as well)
  6. FindPeer
  7. FindPeersConnectedToPeer

Let me know what you think.

dht_bootstrap.go Outdated
// Note there is a tradeoff between the bootstrap period and the
// number of queries. We could support a higher period with less
// queries.
// BootstrapConfig specifies parameters used for bootstrapping the DHT.
type BootstrapConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hsanjuan, @whyrusleeping, @frrist:

This will be an API breaking change, modifying the BootstrapConfig structure. This change is necessary to bring the DHT's bootstrapping logic in-line with the kademlia paper (properly refreshing each bucket) instead of just making random queries in the dark.

Thoughts? Objections? Questions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the ping, no objection from me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Aug 26, 2019

@Stebalien Have made the changes as per your review. Please take a look. Thanks a lot !

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Thanks for sticking with this.

routing.go Outdated
@@ -74,6 +74,11 @@ func (dht *IpfsDHT) PutValue(ctx context.Context, key string, value []byte, opts
return err
}

// refresh the k-bucket containing this key
defer func() {
dht.routingTable.BucketForID(kb.ConvertKey(key)).ResetRefreshedAt(time.Now())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just move this into GetClosestPeers. Also, we should probably check if the query succeeded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or, if it didn't succeed, that it lasted more than... 30 seconds?)

routing.go Outdated
@@ -157,6 +163,11 @@ func (dht *IpfsDHT) SearchValue(ctx context.Context, key string, opts ...routing
responsesNeeded = getQuorum(&cfg, -1)
}

// refresh the k-bucket containing this key
defer func() {
dht.routingTable.BucketForID(kb.ConvertKey(key)).ResetRefreshedAt(time.Now())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on making sure we don't cancel the request.

routing.go Outdated
// refresh the k-bucket containing this key
defer func() {
dht.routingTable.BucketForID(kb.ConvertKey(key.KeyString())).ResetRefreshedAt(time.Now())
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push into GetClosestPeers.

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Aug 30, 2019

@Stebalien

I've moved the bucket refresh to the 'peripheries' for all the calls :

  1. findProvidersAsyncRoutine (covers FindProviders & FindProvidersAsync)
  2. getValues (covers SearchValue, GetValue & GetValues)
  3. GetClosestPeers (covers PutValue & Provide)
  4. FindPeer & FindPeersConnectedToPeer

Also, we now refresh ONLY when the call is 'successful'.

Please take a look & let me know what you think !

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct. I'm now going to run this on a machine for a few days to see if it works as expected.

@aarshkshah1992
Copy link
Contributor Author

This looks correct. I'm now going to run this on a machine for a few days to see if it works as expected.

Thanks a lot. Do let me know if I can help in some way.

@Stebalien
Copy link
Member

Take a look at https://github.com/libp2p/go-libp2p-kad-dht/tree/feature/correct-bootstrapping-debug. A couple of things I've noticed so far:

  1. Unfortunately, we tend to try bootstrapping the DHT before connecting to any peers. Instead, we should probably bootstrap once we receive our first connection. My implementation in that branch is probably not the best way to do it but it gives you a rough picture.
  2. We might want to consider querying a bucket if it goes from non-empty to empty. The old logic would just repeatedly query everything every 5 minutes but the new logic can leave us in an un-bootstrapped state for up to an hour.
  3. Our DHT queries are slooooow. For now, we might want to set timeouts.

That should be enough to not introduce regressions. Try playing around with that branch.

@Stebalien
Copy link
Member

I think the next step here is #387 (hopefully that issue makes sense).

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Aug 31, 2019

Based on @Stebalien's comments above & #387, here's what we need to accomplish now:

TODO

  • Get Persisting/seeding a routing table #383 merged so we can use the seeder with known bootstrap peers/Default peers to populate the RT when it's empty (This PR is effectively blocked till Persisting/seeding a routing table #383 goes in).

  • Have a dedicated proc for RT recovery. Use the RT peer removed notifications to identify when RT becomes empty & send requests to the proc.

  • Before running periodic bootstrap(query for self / buckets that haven't queried for an hour), use the above proc if RT is empty.

  • Add a Bootstrap option to the dht for passing bootstrapping configuration when constructing the DHT.

  • Deprecate the current bootstrap methods in favour of a single Bootstrap(ctx, options...) method.

  • Tests for RT recovery proc & boostrap queries when RT is empty once Persisting/seeding a routing table #383 is merged.

Questions

  1. Our DHT queries are slooooow. For now, we might want to set timeouts

I'm not sure what you mean here. Our bootstrap query in runBootstrap does have a timeout(defaults to 10 seconds) for a target.

  1. We might want to consider querying a bucket if it goes from non-empty to empty

Do we need this ? The kad paper states that a bucket should usually get queried during the normal flow. In the adverse scenario that it doesn't, we need to step in. Hence, the period of 1 hour to wait & watch. Even in the current implementation, though we do query every 5 minutes, we query randomly & aren't really targeting a specific empty bucket. Though, I do see the value in doing it 12 times an hour.

@aarshkshah1992 aarshkshah1992 force-pushed the feature/correct-bootstrapping branch 2 times, most recently from e1b5f75 to 75ece93 Compare September 1, 2019 15:24
@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Sep 1, 2019

@Stebalien Have finished the TODO tasks that do not depend on 383. Will push @raulk to review 383 & get us there sooner. Please take a look & let me know what you think.

Also, please take a look at the questions. Will add the corresponding tasks to TODO if required.
Thanks !

dht_bootstrap.go Outdated Show resolved Hide resolved
@raulk raulk requested a review from bigs September 3, 2019 16:48
@Stebalien
Copy link
Member

Stebalien commented Sep 3, 2019

Our DHT queries are slooooow. For now, we might want to set timeouts

I'm not sure what you mean here. Our bootstrap query in runBootstrap does have a timeout(defaults to 10 seconds) for a target.

Ah. Ok:

  1. We're not applying the timeout on self bootstrap.
  2. I put my logging statements in the wrong places.

@Stebalien
Copy link
Member

Stebalien commented Sep 3, 2019

We might want to consider querying a bucket if it goes from non-empty to empty

Do we need this ? The kad paper states that a bucket should usually get queried during the normal flow. In the adverse scenario that it doesn't, we need to step in.

You're probably right. However, we do need to handle the case where we completely disconnect from the network. Maybe some kind of threshold?

To back this up: I've been running this patch on a stable peer and it has a pretty good routing table.

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Sep 4, 2019

@Stebalien

However, we do need to handle the case where we completely disconnect from the network

I've added the code to recover from an empty RT in the latest commit. Please take a look. The only thing left there it to call the seeder in #383 to add the default bootstrap peers to the RT.

Copy link
Contributor

@bigs bigs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bootstrapBucket function and related material looks great. i'm generally quite confused by the "recovery" channel you've built. what purpose does it serve? could it be simplified? it seems like the real work is being done by the two goroutines that occasionally bootstrap each bucket and "self walk".

dht.go Outdated
case <-ctx.Done():
return
case <-req.errorChan:
// TODO Do we need to do anything here ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's log error here

for {
select {
case req := <-dht.rtRecoveryChan:
if dht.routingTable.Size() == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check seems redundant, as both processes that send recovery requests down the channel check before sending

edit: i'm actually a bit confused by this code generally. i get the idea of serializing recovery requests, but this will work its way through them in a pretty tight loop, so i could see multiple recovery request being dispatched without error in a row.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, it seems like all this process does is serialize checks of the routing table, but does nothing to actually bootstrap it. it's up to the func that actually creates a rtRecoveryReq to do any work, and i see a few instances where this is essentially just an info log.

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check seems redundant, as both processes that send recovery requests down the channel check before sending

edit: i'm actually a bit confused by this code generally. i get the idea of serializing recovery requests, but this will work its way through them in a pretty tight loop, so i could see multiple recovery request being dispatched without error in a row.

The check is to ensure that if multiple callers 'simultaneously' observe that the RT has become empty & send a request to the channel, only one request results in the RT being seeded & the remaining become no-op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, it seems like all this process does is serialize checks of the routing table, but does nothing to actually bootstrap it. it's up to the func that actually creates a rtRecoveryReq to do any work, and i see a few instances where this is essentially just an info log.

Apologies if this wasn't clear. Yes, we need to add a call to the default seeder implementation in #383 to seed the RT with the 'default bootstrap peers'/'known peers'. There is a TODO for this in the code(line 214) & in the TODO list on this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 totally! it really felt like something was missing! definitely missed the comment. with that in mind, the rest are just small comments/docs improvements. generally looking good :)

if dht.routingTable.Size() == 0 {
logger.Infof("rt recovery proc: received request with reqID=%s, RT is empty. initiating recovery", req.id)
// TODO Call Seeder with default bootstrap peers here once #383 is merged
if dht.routingTable.Size() > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreachable?

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above

dht_bootstrap.go Outdated Show resolved Hide resolved
dht_bootstrap.go Outdated
// checks on the config so that callers aren't oblivious.
if cfg.Queries <= 0 {
return fmt.Errorf("invalid number of queries: %d", cfg.Queries)
seedRTIfEmpty := func(tag string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not essentially a no-op? just logging whether or not the table is full

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recovery proc will seed the RT later as explained here

dht_bootstrap.go Outdated Show resolved Hide resolved
dht_bootstrap.go Outdated Show resolved Hide resolved
dht_bootstrap.go Outdated Show resolved Hide resolved
dht_bootstrap.go Outdated Show resolved Hide resolved
@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Sep 4, 2019

the bootstrapBucket function and related material looks great. i'm generally quite confused by the "recovery" channel you've built. what purpose does it serve? could it be simplified? it seems like the real work is being done by the two goroutines that occasionally bootstrap each bucket and "self walk".

The main purpose of the "rtRecovery" channel/proc is to recover from a state where the Routing Table becomes empty. We can not even 'bootstrap' the buckets/'self walk' from such a state as they wouldn't even know where to start their 'search'/'walk' from. The only way to recover the RT from this state is to use the known 'bootstrap' peers/default peers to seed the RT(#295 ). The PR in #383 has a 'default seeder' implementation that does this and the purpose of the recovery proc is to call that seeder. However, I am waiting for 383 to be merged before I can insert that call(hence the item in the TODO list & in the code here).

There are two places where we pass a request to the recovery proc:

  1. If RT is empty when we receive a peer removed notification from the RT
  2. If RT is empty before 'bootstrapping' a bucket/'self walk' as 1 might still not have fired

Let me know what you think.

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Sep 4, 2019

@bigs

Thank you so much for your time. I've answered your questions on the RT recovery mechanism. Please take a look and let me know what you think. The other changes you've suggested look good to me.

Copy link
Contributor

@bigs bigs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good! once that dependent PR(s) lands, i think we're good to go. thanks so much for helping out!

dht.go Outdated
rt.PeerRemoved = func(p peer.ID) {
cmgr.UntagPeer(p, "kbucket")
go func(rtRecoveryChan chan *rtRecoveryReq) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid launching a goroutine every time we drop a peer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't, we dealock on the Routing Table lock because the rt.RemovePeer is waiting for this callback to return and rt.Size() also needs the same lock.

dht.go Outdated
rt.PeerRemoved = func(p peer.ID) {
cmgr.UntagPeer(p, "kbucket")
go func(rtRecoveryChan chan *rtRecoveryReq) {
if rt.Size() == 0 {
req := mkRtRecoveryReq()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to send a request object with a UUID?

dht.go Outdated
select {
case <-ctx.Done():
return
case <-req.errorChan:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to wait here (might as well fire and forget).

dht.go Outdated
rt.PeerRemoved = func(p peer.ID) {
cmgr.UntagPeer(p, "kbucket")
go func(rtRecoveryChan chan *rtRecoveryReq) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a simpler solution:

First, add a channel (protected with a lock) to signal when bootstrapping is done. Tasks that need to wait for peers in the routing table can wait for this channel to be closed.

On peer add:

  1. If the recovery channel is closed, return.
  2. Otherwise, take the lock.
  3. Check 1 again.
  4. Close the channel (we've recovered).
  5. Trigger a bootstrap run.

On peer remove:

  1. Check if the channel is closed. If not, return as we're currently bootstrapping.
  2. Check to see if the table is empty. If not, return as we have peers.
  3. Take the lock, defer a release of the lock.
  4. Re-run steps 1-2 with the lock held.
  5. Replace the channel with a new open channel.
  6. Launch a recovery goroutine.

In the recovery goroutine, keep recovering, sleeping, recovering, etc. until the recovery channel is closed.

Note: this is just one approach. The key parts are:

  1. We launch fewer goroutines.
  2. We stop the recovery process when we've actually recovered (i.e., the routing table has entries).
  3. We do very little work unless we actually have work to do.

My concerns with the current approach are:

  1. We launch a goroutine every time we drop a peer.
  2. We consider the recovery process "done" when it finishes instead of when the routing table gains a peer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien Have linked this comment in #387. This particular task will now be done as part of that issue.

@aarshkshah1992 aarshkshah1992 force-pushed the feature/correct-bootstrapping branch from e20d579 to f49447d Compare September 5, 2019 15:30
@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Sep 5, 2019

@Stebalien

Have updated the PR.

  1. Code & test for triggering self & bucket bootstrap if RT size is below a threshold upon connecting to a new peer(same as the one in your patch)
  2. Have addressed the formatting & doc suggestions made by @bigs
  3. Commented out the RT recovery proc & removed it's callers. This issue will now be done as a part of Active bootstrapping #387. Have added a TODO to 387 for this

Please let me know if we need any more changes to this before merging. Thanks ! :)

@aarshkshah1992 aarshkshah1992 mentioned this pull request Sep 8, 2019
@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Oct 3, 2019

@Stebalien Why was the routing table able to cross the flooded road ?
🥁 🥁 🥁
Because it had a lot of empty buckets ...

Please take a look this PR when you can :)

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful and correct. Thank you so much for your persistence and patience!

aarshkshah1992 and others added 9 commits October 11, 2019 13:13
2) seed RT if empty before starting bootstrap incase 1 hasn't fired
3) pass bootstrap config as option while creating Dht
4) replace all bootstrap function with 1 function
1) on connecting to a new peer  -> trigger self & bucket bootstrap if RT size goes below thereshold
2) accept formatting & doc suggestions in the review
3) remove RT recovery code for now -> will address in a separate PR once libp2p#383 goes in

changes as per review
@Stebalien Stebalien force-pushed the feature/correct-bootstrapping branch from f49447d to 00fffba Compare October 11, 2019 05:12
@Stebalien Stebalien merged commit 315504e into libp2p:master Oct 11, 2019
@aarshkshah1992
Copy link
Contributor Author

@Stebalien Thank you so much for all your help & time :)
I'll get to #387 once #383 is merged.

var wg sync.WaitGroup
errChan := make(chan error)

for bucketID, bucket := range buckets {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should stagger these lookups, instead of launching them all at once. This can explode at the dialer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulk Please can you explain this in context of #383 (comment) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, because in my head this logic here is not related to #383. I'll explain what I mean anyway. If we have 10 buckets to refresh, this logic performs 10 "find peer" queries, which multiplies into as many as 200 simultaneous dials (worst case scenario), as permitted by the dial queue. I don't think that will scale well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @Stebalien ^^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect 10 concurrent requests to be an issue. The dialer overload issues came from the fact that the gateways are making many concurrent DHT requests per user request. That means 1000s of DHT requests.

We could stagger them when running them in the background but:

  1. I'm not sure if it's really worth it.
  2. We definitely want to fire them all off as fast as possible on start.
  3. We only need to bootstrap when we're not under heavy DHT load. When we're under load, the buckets will all be "fresh" and this code will be a no-op.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 concurrent requests x 20 max. pending dials in dial queue = potentially 200 outstanding dials, which is larger than the default FD limit we ship with (160 dials). In the worst case scenario, the DHT could max out the dialer.

}

func (dht *IpfsDHT) BootstrapSelf(ctx context.Context) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we OK simply dropping methods from the public interface, @Stebalien? We usually frown upon this. Couldn't these methods be turned into shims, and marked Deprecated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid silently breaking APIs or breaking APIs on a whim (e.g., preference, style, etc.). In thi scase:

  1. This function was recently added.
  2. We completely revamped the bootstrap system anyways and had to make breaking changes to the bootstrap config (Feature/correct bootstrapping #384 (comment)).

On the other hand, you're right. This change wasn't completely necessary and probably should have gone through a deprecation period.

@@ -103,6 +103,9 @@ func (dht *IpfsDHT) GetClosestPeers(ctx context.Context, key string) (<-chan pee
}

if res != nil && res.queriedSet != nil {
// refresh the k-bucket containing this key as the query was successful
dht.routingTable.BucketForID(kb.ConvertKey(key)).ResetRefreshedAt(time.Now())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this behave well with the "catch all" bucket?

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type BootstrapConfig struct {
BucketPeriod time.Duration // how long to wait for a k-bucket to be queried before doing a random walk on it
Timeout time.Duration // how long to wait for a bootstrap query to run
RoutingTableScanInterval time.Duration // how often to scan the RT for k-buckets that haven't been queried since the given period
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we refresh every BucketPeriod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. I can not think of why. @Stebalien, would you happen to remember why we agreed on a RoutingTableScanInterval of 30 minutes here ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember. I believe what we really want is for the scan interval to be some fraction of the bucket period. I'd be fine getting rid of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, we technically don't need a scan interval. We could just take the minimum of all bucket update times and add the bucket period to that. That would have the added benefit of spreading bucket polls out over time instead of batching them together.

@hsanjuan
Copy link
Contributor

Hello, I just saw that BootstrapOnce is removed. This is breaking for me (it seems I was pinged on whether a change to a struct was problematic but not on whether dropping functionality was).

I already fought against removal of this method last time. It is very difficult to have peers Join a network in full swing without a way to trigger a dht bootstrap right away when that happens. Has an alternative way to do this been added? How can I make a peer start discovering other peers at a certain point without waiting for a bootstrap round to trigger? cc @Stebalien

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Oct 31, 2019

@hsanjuan A call to Bootstrap will first run a bootstrap round & then schedule it periodically. So, Bootstrap = BootsrapOnce + periodic Bootstrap. Please do let me know if you have any concerns around this.

@hsanjuan
Copy link
Contributor

@aarshkshah1992 it seems calling Bootstrap() several times will launch additional "periodic bootstrap" goroutines?

In this case my peers have already called Bootstrap(), and they need to trigger a single bootstrap round later.

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Oct 31, 2019

@hsanjuan Yup, you are right. Apologies for breaking this. Have raised #401 to fix this (small change).

@aarshkshah1992
Copy link
Contributor Author

@Stebalien Please can you also take a look at some of the concerns/questions by @raulk on this PR when you get time ?

@hsanjuan
Copy link
Contributor

Thanks @aarshkshah1992 , I appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants