Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove remote read proto definitions from Mimir #8424

Merged
merged 10 commits into from
Jun 20, 2024
Merged

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Jun 19, 2024

What this PR does

In Mimir we define many protobuf messages. A part from a special case, they're all used for internal Mimir communication, so it's correct that they're defined in Mimir. The special case is remote read.

The remote read endpoint works with protobuf-encoded messages. It receives in input a protobuf-encoded request and returns protobuf-encoded messages. The remote read endpoint specs are defined by Prometheus, which also defines the protobuf messages.

In Mimir, we define our own remote read protobuf messages which are essentially copy-paste of Prometheus one (except for ReadHints, not defined in Mimir). This may cause drifts between Prometheus and Mimir. Given the remote read endpoint is defined by Prometheus, we want to be Prometheus-compatible.

To reduce risks of drifting, I would like to stop using Mimir-defined protobuf messages for the remote read endpoint. couldn't find any good reason to keep them. Prometheus already offers all the functions to convert data types (e.g. histograms) back and forth the protobuf messages.

In this PR I propose to delete Mimir-defined remote read protobuf messages and use Prometheus ones instead.

Note to reviewers:

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@pracucci pracucci changed the title Remote remote read proto definitions from Mimir Remove remote read proto definitions from Mimir Jun 19, 2024
@pracucci pracucci force-pushed the remove-readrequest-proto branch from b7d481f to bd1c2d1 Compare June 19, 2024 13:33
message QueryRequest {
// This QueryRequest message is also used for remote read requests, which includes a hints field we don't support.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: previously, QueryRequest was referenced by ReadRequest so it was right that the field 4 (hints) was reserved. After this PR, QueryRequest is just used internally in Mimir and it's used anymore to decode a remote read request, so we don't have any Prometheus-compatibility requirement anymore.

@pracucci pracucci force-pushed the remove-readrequest-proto branch from 62ab932 to 5d15381 Compare June 19, 2024 14:45
Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add two tests to remote_read_test.go that make it explicit that the provided Step, StartMs, EndMs hints are ignored. There's already some mock storage.Querier implementations there, should be simple to do. To ensure we're not getting out of sync with assumptions in the frontend.

@pracucci
Copy link
Collaborator Author

pracucci commented Jun 19, 2024

Protobuf comparison between Mimir and Prometheus

I've just stripped comments that were not useful for the comparison purpose.

ReadRequest

Mimir:

message ReadRequest {
  repeated QueryRequest queries = 1;

  enum ResponseType {
    SAMPLES = 0;
    STREAMED_XOR_CHUNKS = 1;
  }
  repeated ResponseType accepted_response_types = 2;
}

Prometheus:

message ReadRequest {
  repeated Query queries = 1;

  enum ResponseType {
    SAMPLES = 0;
    STREAMED_XOR_CHUNKS = 1;
  }

  repeated ResponseType accepted_response_types = 2;
}

Query

Mimir:

message QueryRequest {
  reserved 4;
  reserved "hints";

  int64 start_timestamp_ms = 1;
  int64 end_timestamp_ms = 2;
  repeated LabelMatcher matchers = 3;
  
  uint64 streaming_chunks_batch_size = 100;
}

Prometheus:

message Query {
  int64 start_timestamp_ms = 1;
  int64 end_timestamp_ms = 2;
  repeated prometheus.LabelMatcher matchers = 3;
  prometheus.ReadHints hints = 4;
}

NOTE: streaming_chunks_batch_size in Mimir is unused when unmarshalling the remote read request. Prometheus has hints that we didn't support in Mimir. We're still not supporting them in Mimir after this PR.

ReadResponse

Mimir:

message ReadResponse {
  repeated QueryResponse results = 1;
}

Prometheus:

message ReadResponse {
  repeated QueryResult results = 1;
}

QueryResponse

Mimir:

message QueryResponse {
  repeated cortexpb.TimeSeries timeseries = 1 [(gogoproto.nullable) = false];
}

Prometheus:

message QueryResult {
  repeated prometheus.TimeSeries timeseries = 1;
}

TimeSeries

Mimir:

message TimeSeries {
  repeated LabelPair labels = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "LabelAdapter"];
  repeated Sample samples = 2 [(gogoproto.nullable) = false];
  repeated Exemplar exemplars = 3 [(gogoproto.nullable) = false];
  repeated Histogram histograms = 4 [(gogoproto.nullable) = false];
}

Prometheus:

message TimeSeries {
  repeated Label labels         = 1 [(gogoproto.nullable) = false];
  repeated Sample samples       = 2 [(gogoproto.nullable) = false];
  repeated Exemplar exemplars   = 3 [(gogoproto.nullable) = false];
  repeated Histogram histograms = 4 [(gogoproto.nullable) = false];
}

LabelPair

Mimir:

message LabelPair {
  bytes name  = 1;
  bytes value = 2;
}

Prometheus:

message Label {
  string name  = 1;
  string value = 2;
}

Sample

Mimir:

message Sample {
	// Fields order MUST match promql.FPoint so that we can cast types between them.
  int64 timestamp_ms = 2;
  double value       = 1;
}

Prometheus:

message Sample {
  double value    = 1;
  int64 timestamp = 2;
}

NOTE: in the case of a remote read response we never cast types, so the change in Sample field ordering doesn't matter.

Exemplar

Mimir:

message Exemplar {
  repeated LabelPair labels = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "LabelAdapter"];
  double value = 2;
  int64 timestamp_ms = 3;
}

Prometheus:

message Exemplar {
  repeated Label labels = 1 [(gogoproto.nullable) = false];
  double value = 2;
  int64 timestamp = 3;
}

Histogram

@krajorama Can we skip the histogram comparison? Are they expected to be 1:1 with Prometheus ones, right?

StreamReadResponse

Mimir:

message StreamReadResponse {
  repeated StreamChunkedSeries chunked_series = 1;

  int64 query_index = 2;
}

Prometheus:

message ChunkedReadResponse {
  repeated prometheus.ChunkedSeries chunked_series = 1;

  int64 query_index = 2;
}

StreamChunkedSeries

Mimir:

message StreamChunkedSeries {
  repeated cortexpb.LabelPair labels = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "github.com/grafana/mimir/pkg/mimirpb.LabelAdapter"];
  repeated StreamChunk chunks = 2 [(gogoproto.nullable) = false];
}

Prometheus:

message ChunkedSeries {
  repeated Label labels = 1 [(gogoproto.nullable) = false];
  repeated Chunk chunks = 2 [(gogoproto.nullable) = false];
}

StreamChunk

Mimir:

message StreamChunk {
  int64 min_time_ms = 1;
  int64 max_time_ms = 2;

  enum Encoding {
    UNKNOWN = 0;
    XOR     = 1;
    HISTOGRAM = 2;
    FLOAT_HISTOGRAM = 3;
  }
  Encoding type  = 3;
  bytes data     = 4 [(gogoproto.nullable) = false, (gogoproto.customtype) = "github.com/grafana/mimir/pkg/mimirpb.UnsafeByteSlice"];
}

Prometheus:

message Chunk {
  int64 min_time_ms = 1;
  int64 max_time_ms = 2;

  enum Encoding {
    UNKNOWN         = 0;
    XOR             = 1;
    HISTOGRAM       = 2;
    FLOAT_HISTOGRAM = 3;
  }
  Encoding type  = 3;
  bytes data     = 4;
}

Note: Mimir uses UnsafeByteSlice. It's an optimization for the unmarshalling, but we never unmarshal StreamChunk because we just use it to send the response (so it's just marshalled).

@pracucci
Copy link
Collaborator Author

Please add two tests to remote_read_test.go that make it explicit that the provided Step, StartMs, EndMs hints are ignored.

I've added the assertion to existing tests, in 677c9a3. I suggest to review changes with "hide whitespace changes".

@krajorama
Copy link
Contributor

@krajorama Can we skip the histogram comparison? Are they expected to be 1:1 with Prometheus ones, right?

Yes. Although there's no explicit test for it, which is a little strange, I thought we had one. I'm adding a unit test in a different PR.

@krajorama
Copy link
Contributor

Unit test for checking histogram equivalence: checks the names and types and order of fields, except for Gogo proto meta fields: #8426

krajorama
krajorama previously approved these changes Jun 19, 2024
Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@krajorama krajorama dismissed their stale review June 19, 2024 16:54

more work expected

@pracucci pracucci force-pushed the remove-readrequest-proto branch 3 times, most recently from ded700e to d901c16 Compare June 20, 2024 05:30
@@ -92,8 +93,8 @@ func runBackwardCompatibilityTest(t *testing.T, previousImage string, oldFlagsMa
// Push some series to Mimir.
series1Timestamp := time.Now()
series2Timestamp := series1Timestamp.Add(blockRangePeriod * 2)
series1, expectedVector1, _ := generateFloatSeries("series_1", series1Timestamp, prompb.Label{Name: "series_1", Value: "series_1"})
series2, expectedVector2, _ := generateFloatSeries("series_2", series2Timestamp, prompb.Label{Name: "series_2", Value: "series_2"})
series1, expectedVector1, _ := generateFloatSeries("series_1", series1Timestamp, prompb.Label{Name: "label_1", Value: "label_1"})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this change is not a fix. It just makes it easier to look at the diff in case the assertion fails, otherwise it could be misleading seeing series_1 both as metric name, and as additional key-value label pair.

@@ -1070,13 +1070,3 @@ func getMetricName(lbls []prompb.Label) string {

panic(fmt.Sprintf("series %v has no metric name", lbls))
}

func prompbLabelsToModelMetric(pbLabels []prompb.Label) model.Metric {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: moved to integration/util.go, close to the new utilities.

@pracucci pracucci marked this pull request as ready for review June 20, 2024 05:33
@pracucci pracucci requested a review from a team as a code owner June 20, 2024 05:33
Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except test is reinventing mocking to some extent, I'll propose a PR

@pracucci pracucci force-pushed the remove-readrequest-proto branch from d901c16 to 688bd93 Compare June 20, 2024 09:29
@pracucci pracucci enabled auto-merge (squash) June 20, 2024 09:29
@pracucci pracucci merged commit 491685e into main Jun 20, 2024
29 checks passed
@pracucci pracucci deleted the remove-readrequest-proto branch June 20, 2024 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants