Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-333: GRPC ingest metrics #223

Merged
merged 4 commits into from
Jun 10, 2022
Merged

NETOBSERV-333: GRPC ingest metrics #223

merged 4 commits into from
Jun 10, 2022

Conversation

mariomac
Copy link

@mariomac mariomac commented Jun 3, 2022

Implements the following metrics that are already implemented by the IPFIX ingestor:

flow_decoder_count
flow_process_nf_delay_summary_seconds
flow_process_nf_delay_summary_seconds_sum
flow_process_nf_delay_summary_seconds_count
flow_traffic_summary_size_bytes
flow_traffic_summary_size_bytes_sum
flow_traffic_summary_size_bytes_count
ingest_collector_flow_logs_processed
flow_process_nf_errors_count

It does not implement the following metrics:

  • flow_traffic_bytes, because it's equivalent to existing flow_traffic_summary_size_bytes_sum.
  • flow_process_nf_count, because it's equivalent to existing ingest_collector_flow_logs_processed.
  • flow_process_nf_flowset_sum and flow_process_nf_flowset_records_sum because the GRPC+Protobuf service does not have different message types as IPFIX (DataFlowSet, OptionsDataFlowSet, TemplateFlowSet, etc...). All the values here would be equivalent to ingest_collector_flow_logs_processed.
  • flow_process_nf_templates_count because GRPC+Protobuf does not use templates.
  • flow_traffic_packets_counter because GRPC does not work at packet level and there is no affordable way to inspect GRPC communication at IP/TCP level.
  • flow_summary_decoding_time because GRPC+Protobuf does not provide an affordable way to inspect the raw-->object decoding time.
  • ingest_collector_queue_length value will be always 0, since GRPC ingestor does not implement any internal queue to batch/forward the flow records.

Also, the flow_decoder_count might have very different values as for the IPFIX scenario. This metric specifies how many messages have been decoded. In IPFIX, its value would be the same as ingest_collector_flow_logs_processed, as Goflow2 decodes each flow record individually. In GRPC+Protobuf (eBPF agent), this value would be much lower because flow records are sent in big batches.

@memodi
Copy link

memodi commented Jun 3, 2022

@mariomac - is it possible to include flow_process_nf_errors_count metric as well?

@mariomac
Copy link
Author

mariomac commented Jun 3, 2022

@memodi adding it

@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2022

Codecov Report

Merging #223 (aaeae92) into main (debee4d) will increase coverage by 0.31%.
The diff coverage is 82.00%.

@@            Coverage Diff             @@
##             main     #223      +/-   ##
==========================================
+ Coverage   60.96%   61.28%   +0.31%     
==========================================
  Files          66       66              
  Lines        3784     3833      +49     
==========================================
+ Hits         2307     2349      +42     
- Misses       1336     1339       +3     
- Partials      141      145       +4     
Flag Coverage Δ
unittests 61.28% <82.00%> (+0.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/pipeline/decode/decode_protobuf.go 74.54% <ø> (ø)
pkg/pipeline/ingest/ingest_grpc.go 77.10% <82.00%> (+6.52%) ⬆️
pkg/pipeline/ingest/ingest_collector.go 53.96% <0.00%> (+1.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update debee4d...aaeae92. Read the comment docs.

@mariomac mariomac changed the title (WIP) NETOBSERV-333: GRPC ingest metrics NETOBSERV-333: GRPC ingest metrics Jun 7, 2022
@ronensc ronensc requested review from jotak and ronensc June 9, 2022 07:28
}
trafficLabels := prometheus.Labels{
"type": "protobuf",
"remote_ip": remoteIP,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there an issue with cardinality when adding IPs to the labels?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not refer to the IPs in the flow but the IP of the eBPF agents. The number of remote IPs should be small (one agent per node).

I added this field to replicate the behavior of the equivalent metric in the IPFIX ingestor.

flowDecoderCount = flow.DecoderStats.With(
prometheus.Labels{"worker": "", "name": "protobuf"})
processDelaySummary = flow.NetFlowTimeStatsSum.With(
prometheus.Labels{"version": "protobuf1.0", "router": ""})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move protobuf1.0 to const ???

// Prometheus metrics describing the performance of the eBPF ingest
var (
flowDecoderCount = flow.DecoderStats.With(
prometheus.Labels{"worker": "", "name": "protobuf"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move protobuf to const ... or change to GRPC instead

Copy link
Collaborator

@eranra eranra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. LGTM

@mariomac mariomac merged commit 532816c into netobserv:main Jun 10, 2022
@mariomac mariomac deleted the grpc-metrics branch June 20, 2022 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants