Cluster state api fixes - a small refactor #13

wprzytula · 2025-03-01T11:23:54Z

No description provided.

This method is not used anywhere, and it is not useful for serialization because columns must be serialized in the correct order anyway.

This function only accepts `T` through a reference, so the `T` itself doesn't need to have a known size.

In the further commits we will introduce another error condition that doesn't fail the whole metadata fetch, but only a single keyspace.

Previously it was possible that some positions of `partition_key` and `clustering_key` would remain unfilled and thus empty strings. The probability was very low - Scylla would have to return very weird data - but the possibility was there. This commit verifies that this is not happening, and returns and error if it is.

Now that we verify there are no gaps in pk and ck, we can explicitly provide this guarantee to users.

We want to move token calculation methods in ClusterState to accept SerializeRow. This in turn means we need to serialize those values, so we have to create RowSerializationContext from the Table struct. RowSerializationContext currently needs a slice of ColumnSpec. Table has no such slice. Instead it has a hashmap from column name to a ColumnSpec, and a Vec of primary key column names. We have three options: - Add a field with the required slice to Table struct. - Modify RowSerializationContext somehow so it can be created from the data that we already have in Table. I'm not sure how to do that, idea would be appreciated. - Hybrid: Modify both Table and RowSerializationContext to make them work together. This commit takes the first approach because it seemed to be the easiest one. Doing it a different way is of course open for discussion.

There is no easy way for the users to create SerializedValues, which makes the current APIs cumbersome to use. Instead they should accept `&dyn SerializeRow` and perform serialization based on table metadata. This change means that those methods can now also return SerializationError, and also need to handle a table missing from metadata, preferably also returning an error in this case. No existing error type fits here, so either we need to extend some existing one, or create new one. First idea was extending PartitionKeyError, but it needs to be convertible to ExecutionError, in which we don't need those new variants. For that reason I introduced a new error type for those methods, called ClusterStateTokenError.

We may sometimes need to benchmark / integration test some private stuff. There is no good way to do this. The least bad way seems to be to re-export such private APIs under an unstable feature, and mark tests or benchmarks that require them as depending on this feature. It's worth noting that using "required-features" cargo attribute won't automatically enable the feature - it will just prevent the target from existing unless the feature is enabled. It means that after this commit `cargo bench` won't run this benchmark. Instead you need to run "cargo bench --features unstable-testing" or "cargo bench --all-features".

…e flag This will allow us to unpub the original calculate_token_for_partition_key, which is the last step towards eliminating SerializedValues from public API.

This function operates on SerializedValues, and so is not type-safe. Given that we already provide token-calculation API on PreparedStatement and ClusterState, exposing this method seems redundant. If it turns out that there are users who need this method and can't use existing APIs then we can think about providing appropriate safe API, or making this pub once again as a last resort.

Those are no longer used in any public API.

The logic is deduplicated (at least a bit), and HashMap is no longer used - (lightweight) Vec + sorting is a perfect fit for this use case.

Lorak-mmk and others added 13 commits February 28, 2025 19:44

RowSerializationContext: remove column_by_name

42d0509

This method is not used anywhere, and it is not useful for serialization because columns must be serialized in the correct order anyway.

SerializedValues::from_serializable: Relax default Sized bound.

1850cb1

This function only accepts `T` through a reference, so the `T` itself doesn't need to have a known size.

metadata.rs: Use new error type for errors of single keyspace

598eb60

In the further commits we will introduce another error condition that doesn't fail the whole metadata fetch, but only a single keyspace.

Table: Describe guarantee about pk and ck column names

77b2537

Now that we verify there are no gaps in pk and ck, we can explicitly provide this guarantee to users.

scylla benchmark: Use calculate_token_for_partition_key behind featur…

92da8b1

…e flag This will allow us to unpub the original calculate_token_for_partition_key, which is the last step towards eliminating SerializedValues from public API.

lib.rs: Remove SerializedValues and SerializedValuesIterator re-exports

dbc4a13

Those are no longer used in any public API.

ClusterState: Document token calculation methods

2c76f0c

fix typo

53c79ce

wprzytula force-pushed the cluster-state-api-fixes-wprzytula branch from e9ae9d3 to e9dd1a2 Compare March 1, 2025 11:25

metadata: deduplicate pk, ck logic and avoid HashMap

5c9c50b

The logic is deduplicated (at least a bit), and HashMap is no longer used - (lightweight) Vec + sorting is a perfect fit for this use case.

wprzytula force-pushed the cluster-state-api-fixes-wprzytula branch from e9dd1a2 to 5c9c50b Compare March 1, 2025 11:45

Lorak-mmk force-pushed the cluster-state-api-fixes branch 2 times, most recently from 73ca2cc to 473cc84 Compare March 1, 2025 17:17

Lorak-mmk mentioned this pull request Mar 1, 2025

Remove SerializedValues from public API of scylla crate. scylladb/scylla-rust-driver#1252

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster state api fixes - a small refactor #13

Cluster state api fixes - a small refactor #13

wprzytula commented Mar 1, 2025

Cluster state api fixes - a small refactor #13

Are you sure you want to change the base?

Cluster state api fixes - a small refactor #13

Conversation

wprzytula commented Mar 1, 2025