Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry DLL queries upon group 0 concurrent modification error #1197

Merged
merged 1 commit into from
Feb 4, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 47 additions & 1 deletion scylla/src/utils/test_utils.rs
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
use scylla_cql::frame::response::error::DbError;
use tracing::{error, warn};

use crate::client::caching_session::CachingSession;
use crate::client::execution_profile::ExecutionProfile;
use crate::client::session::Session;
use crate::client::session_builder::{GenericSessionBuilder, SessionBuilderKind};
use crate::cluster::ClusterState;
use crate::cluster::NodeRef;
use crate::errors::ExecutionError;
use crate::errors::{ExecutionError, RequestAttemptError};
use crate::network::Connection;
use crate::policies::load_balancing::{FallbackPlan, LoadBalancingPolicy, RoutingInfo};
use crate::policies::retry::{RequestInfo, RetryDecision, RetryPolicy, RetrySession};
use crate::query::Query;
use crate::routing::Shard;
use std::sync::Arc;
Expand Down Expand Up @@ -142,12 +146,54 @@ impl LoadBalancingPolicy for SchemaQueriesLBP {
}
}

#[derive(Debug, Default)]
struct SchemaQueriesRetrySession {
count: usize,
}

impl RetrySession for SchemaQueriesRetrySession {
fn decide_should_retry(&mut self, request_info: RequestInfo) -> RetryDecision {
match request_info.error {
RequestAttemptError::DbError(DbError::ServerError, s)
if s == "Failed to apply group 0 change due to concurrent modification" =>
{
self.count += 1;
// Give up if there are many failures.
// In this case we really should do something about it in the
// core, because it is absurd for DDL queries to fail this often.
if self.count >= 10 {
error!("Received TENTH(!) group 0 concurrent modification error during DDL. Please fix Scylla Core.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file a bug on Scylla Core (and attach logs!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that ~2 months ago: scylladb/scylladb#21779

RetryDecision::DontRetry
} else {
warn!("Received group 0 concurrent modification error during DDL. Performing retry #{}.", self.count);
RetryDecision::RetrySameNode(None)
}
}
_ => RetryDecision::DontRetry,
}
}

fn reset(&mut self) {
*self = Default::default()
}
}

#[derive(Debug)]
struct SchemaQueriesRetryPolicy;

impl RetryPolicy for SchemaQueriesRetryPolicy {
fn new_session(&self) -> Box<dyn RetrySession> {
Box::new(SchemaQueriesRetrySession::default())
}
}

fn apply_ddl_lbp(query: &mut Query) {
let policy = query
.get_execution_profile_handle()
.map(|profile| profile.pointee_to_builder())
.unwrap_or(ExecutionProfile::builder())
.load_balancing_policy(Arc::new(SchemaQueriesLBP))
.retry_policy(Arc::new(SchemaQueriesRetryPolicy))
.build();
query.set_execution_profile_handle(Some(policy.into_handle()));
}
Expand Down