Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster state api fixes - a small refactor #13

Open
wants to merge 14 commits into
base: cluster-state-api-fixes
Choose a base branch
from

Conversation

wprzytula
Copy link

No description provided.

Lorak-mmk and others added 13 commits February 28, 2025 19:44
This method is not used anywhere, and it is not useful for serialization
because columns must be serialized in the correct order anyway.
This function only accepts `T` through a reference, so the `T` itself
doesn't need to have a known size.
In the further commits we will introduce another error condition that
doesn't fail the whole metadata fetch, but only a single keyspace.
Previously it was possible that some positions of `partition_key` and
`clustering_key` would remain unfilled and thus empty strings.

The probability was very low - Scylla would have to return very weird
data - but the possibility was there.

This commit verifies that this is not happening, and returns and error
if it is.
Now that we verify there are no gaps in pk and ck, we can explicitly
provide this guarantee to users.
We want to move token calculation methods in ClusterState to accept
SerializeRow. This in turn means we need to serialize those values,
so we have to create RowSerializationContext from the Table struct.

RowSerializationContext currently needs a slice of ColumnSpec.
Table has no such slice. Instead it has a hashmap from column name to
a ColumnSpec, and a Vec of primary key column names.

We have three options:
- Add a field with the required slice to Table struct.
- Modify RowSerializationContext somehow so it can be created from the
  data that we already have in Table. I'm not sure how to do that, idea
  would be appreciated.
- Hybrid: Modify both Table and RowSerializationContext to make them
  work together.

This commit takes the first approach because it seemed to be the easiest
one. Doing it a different way is of course open for discussion.
There is no easy way for the users to create SerializedValues, which
makes the current APIs cumbersome to use. Instead they should accept
`&dyn SerializeRow` and perform serialization based on table metadata.

This change means that those methods can now also return
SerializationError, and also need to handle a table missing from metadata,
preferably also returning an error in this case.
No existing error type fits here, so either we need to extend some
existing one, or create new one. First idea was extending
PartitionKeyError, but it needs to be convertible to ExecutionError,
in which we don't need those new variants.
For that reason I introduced a new error type for those methods,
called ClusterStateTokenError.
We may sometimes need to benchmark / integration test some private stuff.
There is no good way to do this. The least bad way seems to be to
re-export such private APIs under an unstable feature, and mark tests or
benchmarks that require them as depending on this feature.

It's worth noting that using "required-features" cargo attribute won't
automatically enable the feature - it will just prevent the target from
existing unless the feature is enabled.
It means that after this commit `cargo bench` won't run this benchmark.
Instead you need to run "cargo bench --features unstable-testing" or
"cargo bench --all-features".
…e flag

This will allow us to unpub the original calculate_token_for_partition_key,
which is the last step towards eliminating SerializedValues from public API.
This function operates on SerializedValues, and so is not type-safe.
Given that we already provide token-calculation API on PreparedStatement
and ClusterState, exposing this method seems redundant.

If it turns out that there are users who need this method and can't use
existing APIs then we can think about providing appropriate safe API,
or making this pub once again as a last resort.
@wprzytula wprzytula force-pushed the cluster-state-api-fixes-wprzytula branch from e9ae9d3 to e9dd1a2 Compare March 1, 2025 11:25
The logic is deduplicated (at least a bit), and HashMap is no longer
used - (lightweight) Vec + sorting is a perfect fit for this use case.
@wprzytula wprzytula force-pushed the cluster-state-api-fixes-wprzytula branch from e9dd1a2 to 5c9c50b Compare March 1, 2025 11:45
@Lorak-mmk Lorak-mmk force-pushed the cluster-state-api-fixes branch 2 times, most recently from 73ca2cc to 473cc84 Compare March 1, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants