Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporterhelper] Proposal: enable Persistent Queue feature by default #5457

Closed

Conversation

pmm-sumo
Copy link
Contributor

@pmm-sumo pmm-sumo commented Jun 2, 2022

Description:

Persistent Queue is a feature that currently requires "enable_unstable" tag provided to the the build to enable it. We have been using it in our distro for several months already and it looks stable. It seems that others are using this capability as well (example)

I would like to propose enabling it by default, which also simplifies the build process a little bit.

@pmm-sumo pmm-sumo requested review from a team and tigrannajaryan June 2, 2022 19:06
@codecov
Copy link

codecov bot commented Jun 2, 2022

Codecov Report

Merging #5457 (00a397b) into main (e77f3d4) will decrease coverage by 0.34%.
The diff coverage is 74.76%.

@@            Coverage Diff             @@
##             main    #5457      +/-   ##
==========================================
- Coverage   90.94%   90.59%   -0.35%     
==========================================
  Files         191      196       +5     
  Lines       11375    11877     +502     
==========================================
+ Hits        10345    10760     +415     
- Misses        807      873      +66     
- Partials      223      244      +21     
Impacted Files Coverage Δ
...xporterhelper/internal/persistent_storage_batch.go 80.48% <ø> (ø)
exporter/exporterhelper/queued_retry_inmemory.go 77.53% <70.32%> (-18.06%) ⬇️
...porter/exporterhelper/internal/persistent_queue.go 100.00% <100.00%> (ø)
...rter/exporterhelper/internal/persistent_storage.go 89.58% <100.00%> (ø)
extension/experimental/storage/nop_client.go 16.66% <0.00%> (ø)
extension/experimental/storage/storage.go 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e77f3d4...00a397b. Read the comment docs.

@pmm-sumo pmm-sumo force-pushed the enable-persistent-queue-in-build branch 2 times, most recently from 5c6d9b7 to 7e7b019 Compare June 2, 2022 19:18
@pmm-sumo pmm-sumo changed the title [expoterhelper] Proposal: enable Persistent Queue feature by default [exporterhelper] Proposal: enable Persistent Queue feature by default Jun 2, 2022
Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely we should not drop the in-memory version.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read more about the code, and my previous concern is NOT valid. But queuedRetrySender implementation is different, metrics exposed are different, etc. This has more implications, so please make the change by deleting the "_experimental" file and bring to "_inmemory" file changes needed. So we can see exactly what changes.

@pmm-sumo pmm-sumo force-pushed the enable-persistent-queue-in-build branch from 7e7b019 to 6dd9c1a Compare June 3, 2022 11:02
@pmm-sumo
Copy link
Contributor Author

pmm-sumo commented Jun 6, 2022

I read more about the code, and my previous concern is NOT valid. But queuedRetrySender implementation is different, metrics exposed are different, etc. This has more implications, so please make the change by deleting the "_experimental" file and bring to "_inmemory" file changes needed. So we can see exactly what changes.

Sure thing @bogdandrutu. I updated the PR with requested change

BTW, we had a discussion few months ago on the additional metrics that I proposed to include with persistent queue. Do you think it's the right time to get back to it or we should rather give more time the collector metrics to stabilize?

Comment on lines 92 to 100
func (qrs *queuedRetrySender) fullName() string {
if qrs.signal == "" {
return qrs.id.String()
}
return fmt.Sprintf("%s-%s", qrs.id.String(), qrs.signal)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to use this new "name" for the metric label/attribute, does not seem to be necessary to enable the persistent queue, can it be a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each signal must have a different name for the persistent queue as otherwise we would be writing multiple signal types to the same underlying storage. In any case, I changed the code a bit so for in-memory queue the name is not depending on the signal type (and stays as it was before)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is also used for the "queue_size" label, that's why I don't like this change because you are changing the metrics that we emit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you need to rename the func to "buildStorageName" and call it only when calling the constructor of the NewPersistenQueue the one in the struct fullName should be kept the same since we use that in the metrics.

@@ -54,7 +53,8 @@ func NewDefaultQueueSettings() QueueSettings {
// This is a pretty decent value for production.
// User should calculate this from the perspective of how many seconds to buffer in case of a backend outage,
// multiply that by the number of requests per seconds.
QueueSize: 5000,
QueueSize: 5000,
PersistentStorageEnabled: false,
Copy link
Member

@bogdandrutu bogdandrutu Jun 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to actually ask users for a "extension" name here, instead of enforcing to have only one storage extension and have a "bool" here? Something like what Auth does, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configauth/configauth.go#L32

type QueueSettings struct {
	// Enabled indicates whether to not enqueue batches before sending to the consumerSender.
	Enabled bool `mapstructure:"enabled"`
	// NumConsumers is the number of consumers from the queue.
	NumConsumers int `mapstructure:"num_consumers"`
	// QueueSize is the maximum number of batches allowed in queue at a given time.
	QueueSize int `mapstructure:"queue_size"`
	// StorageID if not empty, uses the component specified as a storage extension.... bla bla bla 
	StorageID config.ComponentID `mapstructure:"storage"`
}

Copy link
Contributor Author

@pmm-sumo pmm-sumo Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall there was a limitation of the file_storage which resulted in a single instance available only (that was due to the original use-case which was related to filelog receiver and did not really require several extensions). I think it's no longer relevant, let me check this and I will update the code

EDIT: I think it should work now, I will prepare an update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if someone has a file_extension and a DB extension (storage) both, still better to force users to specify exactly what extension to use.

@bogdandrutu
Copy link
Member

bogdandrutu commented Jun 6, 2022

BTW, we had #4117 (review) few months ago on the additional metrics that I proposed to include with persistent queue. Do you think it's the right time to get back to it or we should rather give more time the collector metrics to stabilize?

I think we should definitely do this in a separate PR, second I think (need to confirm, but this is my first thought) we should include data type as a separate label instead of adding it as part of full name. Because of that, let's have a separate PR.

@pmm-sumo pmm-sumo force-pushed the enable-persistent-queue-in-build branch from 6dd9c1a to b73e90c Compare June 7, 2022 17:36
@pmm-sumo pmm-sumo force-pushed the enable-persistent-queue-in-build branch from b73e90c to 9800347 Compare June 8, 2022 15:58
Comment on lines 119 to 122
if !qCfg.PersistentStorageEnabled {
qrs.queue = internal.NewBoundedMemoryQueue(qrs.cfg.QueueSize, func(item interface{}) {})
}
// The Persistent Queue is initialized separately as it needs extra information about the component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's initialize both in the same place?

@pmm-sumo pmm-sumo force-pushed the enable-persistent-queue-in-build branch from 9800347 to ce5871e Compare June 9, 2022 13:16
@pmm-sumo pmm-sumo marked this pull request as draft June 9, 2022 15:03
@pmm-sumo pmm-sumo force-pushed the enable-persistent-queue-in-build branch 3 times, most recently from 4ae150b to 9491cdd Compare June 13, 2022 13:18
@pmm-sumo pmm-sumo force-pushed the enable-persistent-queue-in-build branch from 9491cdd to 00a397b Compare June 15, 2022 18:51
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@pmm-sumo
Copy link
Contributor Author

pmm-sumo commented Jul 7, 2022

@swiatekm-sumo FYI

@tigrannajaryan
Copy link
Member

@pmm-sumo do you plan to continue working on this?

@swiatekm
Copy link
Contributor

@tigrannajaryan I'm planning to pick this up from @pmm-sumo sometime this week.

@tigrannajaryan
Copy link
Member

@swiatekm-sumo sounds good, thank you.

@swiatekm
Copy link
Contributor

Allright, from what I can see @pmm-sumo addressed all the review comments from @bogdandrutu and we're mostly missing some tests to make codecov happy. I'll add those in my own branch, then open a new PR referencing this one. Is that ok @bogdandrutu @tigrannajaryan ?

- `persistent_storage_enabled` (default = false): When set, enables persistence via a file storage extension
(note, `enable_unstable` build tag needs to be enabled first, see below for more details)
- `storage` (default = none): When set, enables persistence and uses the component specified as a storage extension for persistent queue
- `persistent_storage_enabled` (default = false): When set, enables persistence using the only available storage extension (fails when no storage extension is enabled or more than one storage extension is available)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this setting necessary? If the storage is unspecified the persistent queue is disabled. Isn't that enough for control? Or do we want to have this so that we can enable the storage without knowing the name of the extension?

Alternate suggestion: can we delete persistent_storage_enabled setting and make * (or some other character sequence) a special value for storage setting which finds the only available storage extension?

Copy link
Contributor Author

@pmm-sumo pmm-sumo Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage is a new setting which allows to explicitly select which storage extension to use. However, when only one storage extension is specified in the config, we want to be able to automatically attach it and having storage: ... specified is not needed. In such case, it would be still helpful to have control over selecting if file-backed or memory-backed queue is used. Hence persistent_storage_enabled. I think any suggestions on how to make it more elegant are more than welcome but I think we want to retain such kind of functionality

package exporterhelper // import "go.opentelemetry.io/collector/exporter/exporterhelper"

// TODO: rename the file to queued_retry_sender.go after merging the PR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this TODO? Why can't we rename before merging the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageID *config.ComponentID `mapstructure:"storage"`
// StorageEnabled describes whether persistence via a file storage extension is enabled using the single
// default storage extension.
StorageEnabled bool `mapstructure:"persistent_storage_enabled"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens is storage is specified and persistent_storage_enabled is also true? Which setting wins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more trickier example is persistent_storage_enabled: false and storage referring some existing storage extension. Basing on the name, I would assume persistent_storage_enabled should have precedence

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand this logic, is that persistent_storage_enabled turns the feature on. Then there's a certain implicit default for storage - if we only have one storage extension, then we use that, otherwise we error. If the storage is explicitly set, we use its value. I believe this is what the code does, and it makes the most intuitive sense to me.

if qCfg.StorageEnabled || qCfg.StorageID != nil {
qrs.queue, qrs.queueStartFunc = internal.NewPersistentQueue(qrs.fullName, qrs.signal, qrs.cfg.QueueSize, qrs.logger, qrs.requestUnmarshaler)
// TODO: following can be further exposed as a config param rather than relying on a type of queue
qrs.requeuingEnabled = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a new functionality that didn't exist before. Can we remove this from this PR and add in a subsequent PR? This PR should be only about enabling Persistent Queue by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be enabled here:

so this actually preserves existing behaviour.

@@ -56,7 +56,7 @@ func TestQueuedRetry_DropOnPermanentError(t *testing.T) {
})

ocs.run(func() {
// This is asynchronous so it should just enqueue, no errors expected.
// This is asynchronous so it should just enqueue, no errors expectedError.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change and other similar changes below intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an overzealous replace, I'll fix it.

@tigrannajaryan
Copy link
Member

Allright, from what I can see @pmm-sumo addressed all the review comments from @bogdandrutu and we're mostly missing some tests to make codecov happy. I'll add those in my own branch, then open a new PR referencing this one. Is that ok @bogdandrutu @tigrannajaryan ?

@swiatekm-sumo sounds good. I also left some comments above.

@tigrannajaryan
Copy link
Member

Is this entirely superseded by #5711
If yes, then close this PR.

@pmm-sumo pmm-sumo closed this Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants