Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

firehose connection loadbalance #4083

Merged
merged 1 commit into from
Oct 21, 2022
Merged

firehose connection loadbalance #4083

merged 1 commit into from
Oct 21, 2022

Conversation

mangas
Copy link
Contributor

@mangas mangas commented Oct 20, 2022

Fixes #3879

@mangas mangas force-pushed the 3879-firehose-conn-lb branch from 1710c0a to cce4da9 Compare October 20, 2022 12:59
@leoyvens leoyvens marked this pull request as ready for review October 20, 2022 13:12
@mangas mangas force-pushed the 3879-firehose-conn-lb branch from cce4da9 to fef1e62 Compare October 20, 2022 13:35
@mangas mangas requested review from leoyvens and maoueh October 20, 2022 14:40
volumes:
- ./data/postgres:/var/lib/postgresql/data
# volumes:
# - ./data/postgres:/var/lib/postgresql/data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change to docker-compose?

@@ -75,8 +73,7 @@ impl FirehoseEndpoint {
// Timeout on each request, so the timeout to estabilish each 'Blocks' stream.
.timeout(Duration::from_secs(120));

// Load balancing on a same endpoint is useful because it creates a connection pool.
let channel = Channel::balance_list(iter::repeat(endpoint).take(conn_pool_size as usize));
let channel = Channel::balance_list(vec![endpoint].into_iter());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this let channel = endpoint.connect_lazy(); as it used to be.

@@ -267,10 +264,12 @@ impl FirehoseEndpoints {
self.0.len()
}

// selects the FirehoseEndpoint with the lest amount of references, which will help with spliting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// selects the FirehoseEndpoint with the lest amount of references, which will help with spliting
// selects the FirehoseEndpoint with the least amount of references, which will help with splitting

self.0.iter().choose(&mut rng)
self.0
.iter()
.min_by(|x, y| Arc::strong_count(x).cmp(&Arc::strong_count(y)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_by_key is nicer.

@@ -267,10 +264,12 @@ impl FirehoseEndpoints {
self.0.len()
}

// selects the FirehoseEndpoint with the lest amount of references, which will help with spliting
// the load naively across the entire list.
pub fn random(&self) -> Option<&Arc<FirehoseEndpoint>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One aspect that concerns me is a silent hang if we ever hit the 100 stream limit per connection limit. Perhaps a simple way to make the error explicit would be to add a

const SUBGRAPHS_PER_CONN: usize = 100;

And return an error if the ref count reaches this number. Or even better auto-scale based on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address this on a separate PR as this could benefit from some metrics in order to detect a stall

@mangas mangas force-pushed the 3879-firehose-conn-lb branch from fef1e62 to b45207f Compare October 20, 2022 18:17
.min_by_key(|x| Arc::strong_count(x))
.ok_or(anyhow!("no available firehose endpoints"))?;
if Arc::strong_count(endpoint) > SUBGRAPHS_PER_CONN {
return Err(anyhow!("all connections saturated with {} connections, increase the firehose conn_pool_size", SUBGRAPHS_PER_CONN));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work with the actionable error message!

@leoyvens
Copy link
Collaborator

Oh and lets bump the default to 20.

@mangas mangas requested a review from leoyvens October 21, 2022 11:50
@mangas mangas force-pushed the 3879-firehose-conn-lb branch from 37ce27d to 03c2911 Compare October 21, 2022 16:11
@mangas mangas force-pushed the 3879-firehose-conn-lb branch from abb00a5 to dc91e76 Compare October 21, 2022 17:45
@mangas mangas merged commit 8559a1e into master Oct 21, 2022
@mangas mangas deleted the 3879-firehose-conn-lb branch October 21, 2022 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Firehose connections hang when more than 100 subgraphs are deployed
2 participants