Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal/coreinternal/processor/filterset/regexp] LRU cache is not safe for concurrent access #11829

Closed
matianjun1 opened this issue Jun 30, 2022 · 10 comments · Fixed by #29869
Labels
bug Something isn't working closed as inactive priority:p2 Medium processor/filter Filter processor Stale

Comments

@matianjun1
Copy link
Contributor

Describe the bug
The processor/filterprocessor with a regexp cache may happens to fatal error: concurrent map read and map write

Steps to reproduce
Concurrent to filter metrics by filterprocessor with a regexp and a lru cache

What did you expect to see?
concurrent safe to use regexp filterset with a lru cache

What did you see instead?


fatal error: concurrent map read and map write
fatal error: concurrent map writes

goroutine 310 [running]:
runtime.throw({0x5c1410e?, 0x0?})
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/panic.go:992 +0x71 fp=0xc000e7d6e0 sp=0xc000e7d6b0 pc=0x4038dd1
runtime.mapaccess2(0x56d6f45?, 0xc000e7d8b8?, 0xc0006829f0?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/map.go:476 +0x205 fp=0xc000e7d720 sp=0xc000e7d6e0 pc=0x4010085
github.com/golang/groupcache/lru.(*Cache).Get(0xc000cfe000, {0x584dde0?, 0xc000e7d7d8?})
	/Users/xxx/go/pkg/mod/github.com/golang/[email protected]/lru/lru.go:78 +0x51 fp=0xc000e7d758 sp=0xc000e7d720 pc=0x55245d1
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/processor/filterset/regexp.(*FilterSet).Matches(0xc000cfa300, {0xc000682ab0, 0x30})
	/Users/xxx/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/internal/[email protected]/processor/filterset/regexp/regexpfilterset.go:58 +0x6f fp=0xc000e7d7f8 sp=0xc000e7d758 pc=0x5524def
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/processor/filtermetric.(*nameMatcher).MatchMetric(0xc00375a3f0?, {0xc00375a3c0?})
	/Users/xxx/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/internal/[email protected]/processor/filtermetric/name_matcher.go:47 +0x2b fp=0xc000e7d820 sp=0xc000e7d7f8 pc=0x55671ab
github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor.(*filterMetricProcessor).shouldKeepMetric(0xc00053ae00, {0x400ec87?})
	/Users/xxx/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/filter_processor.go:161 +0x38 fp=0xc000e7d848 sp=0xc000e7d820 pc=0x556c738
github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor.(*filterMetricProcessor).processMetrics.func1.1.1({0x1b1?})
	/Users/xxx/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/filter_processor.go:140 +0x3a fp=0xc000e7d920 sp=0xc000e7d848 pc=0x556c59a
go.opentelemetry.io/collector/pdata/internal.MetricSlice.RemoveIf({0x0?}, 0xc000e7d970)
	/Users/xxx/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/generated_pmetric.go:536 +0x62 fp=0xc000e7d958 sp=0xc000e7d920 pc=0x540daa2
github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor.(*filterMetricProcessor).processMetrics.func1.1({0x0?})
	/Users/xxx/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/filter_processor.go:139 +0x47 fp=0xc000e7d990 sp=0xc000e7d958 pc=0x556c527
go.opentelemetry.io/collector/pdata/internal.ScopeMetricsSlice.RemoveIf({0x0?}, 0xc000e7d9e8)
	/Users/xxx/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/generated_pmetric.go:342 +0x62 fp=0xc000e7d9c8 sp=0xc000e7d990 pc=0x540cfc2
github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor.(*filterMetricProcessor).processMetrics.func1({0x0?})
	/Users/xxx/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/filter_processor.go:138 +0x7e fp=0xc000e7da08 sp=0xc000e7d9c8 pc=0x556c49e
go.opentelemetry.io/collector/pdata/internal.ResourceMetricsSlice.RemoveIf({0x8b745b8?}, 0xc000e7da58)
	/Users/xxx/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/generated_pmetric.go:148 +0x62 fp=0xc000e7da40 sp=0xc000e7da08 pc=0x540c4e2
github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor.(*filterMetricProcessor).processMetrics(0x400ec87?, {0x10?, 0x585d560?}, {0x1?})
	/Users/xxx/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/filter_processor.go:128 +0x3f fp=0xc000e7da78 sp=0xc000e7da40 pc=0x556c39f
github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor.(*filterMetricProcessor).processMetrics-fm({0x72f9240?, 0xc0000a4b80?}, {0x0?})
	<autogenerated>:1 +0x3c fp=0xc000e7daa8 sp=0xc000e7da78 pc=0x556d75c
go.opentelemetry.io/collector/processor/processorhelper.NewMetricsProcessor.func1({0x72f9240, 0xc0000a4b80}, {0x0?})
	/Users/xxx/go/pkg/mod/go.opentelemetry.io/[email protected]/processor/processorhelper/metrics.go:61 +0xf8 fp=0xc000e7db38 sp=0xc000e7daa8 pc=0x5567df8
go.opentelemetry.io/collector/consumer.ConsumeMetricsFunc.ConsumeMetrics(...)
	/Users/xxx/go/pkg/mod/go.opentelemetry.io/[email protected]/consumer/metrics.go:36
go.opentelemetry.io/collector/consumer.(*baseMetrics).ConsumeMetrics(0x0?, {0x72f9240?, 0xc0000a4b80?}, {0x0?})
	<autogenerated>:1 +0x39 fp=0xc000e7db60 sp=0xc000e7db38 pc=0x548bc99
go.opentelemetry.io/collector/processor/processorhelper.(*metricsProcessor).ConsumeMetrics(0x54a1e60?, {0x72f9240?, 0xc0000a4b80?}, {0x400007?})
	<autogenerated>:1 +0x35 fp=0xc000e7db90 sp=0xc000e7db60 pc=0x5568715
go.opentelemetry.io/collector/service/internal/pipelines.(*capMetrics).ConsumeMetrics(0x72f92e8?, {0x72f9240?, 0xc0000a4b80?}, {0x5?})
	<autogenerated>:1 +0x34 fp=0xc000e7dbc0 sp=0xc000e7db90 pc=0x56ee594
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver.(*metricsConsumerGroupHandler).ConsumeClaim(0xc000326540, {0x7301048, 0xc000a8ddd0}, {0x72fc128, 0xc000937bf0})
	/Users/xxx/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/kafka_receiver.go:493 +0x8de fp=0xc000e7ded0 sp=0xc000e7dbc0 pc=0x56d2f1e
github.com/Shopify/sarama.(*consumerGroupSession).consume(0xc000a8ddd0, {0xc000ed7670, 0xe}, 0x6)
	/Users/xxx/go/pkg/mod/github.com/!shopify/[email protected]/consumer_group.go:808 +0x26e fp=0xc000e7df60 sp=0xc000e7ded0 pc=0x4b7a86e
github.com/Shopify/sarama.newConsumerGroupSession.func2({0xc000ed7670?, 0x0?}, 0x0?)
	/Users/xxx/go/pkg/mod/github.com/!shopify/[email protected]/consumer_group.go:733 +0x76 fp=0xc000e7dfb8 sp=0xc000e7df60 pc=0x4b7a0b6
github.com/Shopify/sarama.newConsumerGroupSession.func5()
	/Users/xxx/go/pkg/mod/github.com/!shopify/[email protected]/consumer_group.go:734 +0x31 fp=0xc000e7dfe0 sp=0xc000e7dfb8 pc=0x4b7a011
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000e7dfe8 sp=0xc000e7dfe0 pc=0x406c881
created by github.com/Shopify/sarama.newConsumerGroupSession
	/Users/xxx/go/pkg/mod/github.com/!shopify/[email protected]/consumer_group.go:725 +0x4b4

What version did you use?
Version: v0.54.0

What config did you use?

receivers:
  kafka:
    protocol_version: 0.11.0.2
    brokers: xxxxx:9092
    topic: xxx_metrics

processors:
  filter:
    metrics:
      include:
        match_type: regexp
        regexp:
          cacheenabled: true
          cachemaxnumentries: 2048
        metric_names:
          - prefix_.*

exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    metrics:
      receivers: [kafka]
      processors: [filter]
      exporters: [logging]

Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

Additional context
Add any other context about the problem here.

@matianjun1 matianjun1 added the bug Something isn't working label Jun 30, 2022
@TylerHelmuth
Copy link
Member

pinging @boostchicken @pmm-sumo as code owners of the filterprocessor

@TylerHelmuth TylerHelmuth added priority:p2 Medium processor/filter Filter processor labels Jul 1, 2022
@matianjun1
Copy link
Contributor Author

@TylerHelmuth The problem seems to be internal/coreinternal/processor/filterset/regexp used a package github.com/golang/groupcache/lru.Cache which is not concurrent safe

@pmm-sumo
Copy link
Contributor

pmm-sumo commented Jul 4, 2022

I think we could switch to github.com/hashicorp/golang-lru (which is also used by many other components) which as I understand is using locks and is threadsafe

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 9, 2022
@brc
Copy link

brc commented Apr 7, 2023

Bug is still present in main, though the code has split into internal/filter now.

Too bad the PR auto-closed!

Interestingly, #16103: [chore] spanmetricsprocessor: switch to use non-thread safe lru, already protected by proc mu was merged the day before this very issue went stale: The spanmetrics processor was already using hashicorp/golang-lru but switched to hashicorp/golang-lru/simplelru. @bogdandrutu Do you know off-hand if there is any similarity in situation between these two processors? i.e., can/should the filter processor use the same pattern that you implemented in spanmetrics? Thanks.

@dmitryax dmitryax removed the Stale label May 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@sfc-gh-rscott
Copy link

I maintain an internal fork that uses hashicorp LRU to work around this bug. It would be great to have this fixed.

@TylerHelmuth
Copy link
Member

We are working towards removing the internal/filter code that is impacted by this bug, but that is still a while off. If someone has a fix we would take it.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Sep 11, 2023
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 10, 2023
@TylerHelmuth TylerHelmuth reopened this Dec 13, 2023
bogdandrutu added a commit that referenced this issue Dec 13, 2023
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working closed as inactive priority:p2 Medium processor/filter Filter processor Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants