-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporterhelper] Proposal: enable Persistent Queue feature by default #5457
[exporterhelper] Proposal: enable Persistent Queue feature by default #5457
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5457 +/- ##
==========================================
- Coverage 90.94% 90.59% -0.35%
==========================================
Files 191 196 +5
Lines 11375 11877 +502
==========================================
+ Hits 10345 10760 +415
- Misses 807 873 +66
- Partials 223 244 +21
Continue to review full report at Codecov.
|
5c6d9b7
to
7e7b019
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely we should not drop the in-memory version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read more about the code, and my previous concern is NOT valid. But queuedRetrySender
implementation is different, metrics exposed are different, etc. This has more implications, so please make the change by deleting the "_experimental" file and bring to "_inmemory" file changes needed. So we can see exactly what changes.
7e7b019
to
6dd9c1a
Compare
Sure thing @bogdandrutu. I updated the PR with requested change BTW, we had a discussion few months ago on the additional metrics that I proposed to include with persistent queue. Do you think it's the right time to get back to it or we should rather give more time the collector metrics to stabilize? |
func (qrs *queuedRetrySender) fullName() string { | ||
if qrs.signal == "" { | ||
return qrs.id.String() | ||
} | ||
return fmt.Sprintf("%s-%s", qrs.id.String(), qrs.signal) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to use this new "name" for the metric label/attribute, does not seem to be necessary to enable the persistent queue, can it be a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each signal must have a different name for the persistent queue as otherwise we would be writing multiple signal types to the same underlying storage. In any case, I changed the code a bit so for in-memory queue the name is not depending on the signal type (and stays as it was before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is also used for the "queue_size" label, that's why I don't like this change because you are changing the metrics that we emit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you need to rename the func to "buildStorageName" and call it only when calling the constructor of the NewPersistenQueue
the one in the struct fullName
should be kept the same since we use that in the metrics.
@@ -54,7 +53,8 @@ func NewDefaultQueueSettings() QueueSettings { | |||
// This is a pretty decent value for production. | |||
// User should calculate this from the perspective of how many seconds to buffer in case of a backend outage, | |||
// multiply that by the number of requests per seconds. | |||
QueueSize: 5000, | |||
QueueSize: 5000, | |||
PersistentStorageEnabled: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to actually ask users for a "extension" name here, instead of enforcing to have only one storage extension and have a "bool" here? Something like what Auth does, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configauth/configauth.go#L32
type QueueSettings struct {
// Enabled indicates whether to not enqueue batches before sending to the consumerSender.
Enabled bool `mapstructure:"enabled"`
// NumConsumers is the number of consumers from the queue.
NumConsumers int `mapstructure:"num_consumers"`
// QueueSize is the maximum number of batches allowed in queue at a given time.
QueueSize int `mapstructure:"queue_size"`
// StorageID if not empty, uses the component specified as a storage extension.... bla bla bla
StorageID config.ComponentID `mapstructure:"storage"`
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall there was a limitation of the file_storage which resulted in a single instance available only (that was due to the original use-case which was related to filelog receiver and did not really require several extensions). I think it's no longer relevant, let me check this and I will update the code
EDIT: I think it should work now, I will prepare an update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if someone has a file_extension and a DB extension (storage) both, still better to force users to specify exactly what extension to use.
I think we should definitely do this in a separate PR, second I think (need to confirm, but this is my first thought) we should include data type as a separate label instead of adding it as part of full name. Because of that, let's have a separate PR. |
6dd9c1a
to
b73e90c
Compare
b73e90c
to
9800347
Compare
if !qCfg.PersistentStorageEnabled { | ||
qrs.queue = internal.NewBoundedMemoryQueue(qrs.cfg.QueueSize, func(item interface{}) {}) | ||
} | ||
// The Persistent Queue is initialized separately as it needs extra information about the component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's initialize both in the same place?
9800347
to
ce5871e
Compare
4ae150b
to
9491cdd
Compare
9491cdd
to
00a397b
Compare
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
@swiatekm-sumo FYI |
@pmm-sumo do you plan to continue working on this? |
@tigrannajaryan I'm planning to pick this up from @pmm-sumo sometime this week. |
@swiatekm-sumo sounds good, thank you. |
Allright, from what I can see @pmm-sumo addressed all the review comments from @bogdandrutu and we're mostly missing some tests to make codecov happy. I'll add those in my own branch, then open a new PR referencing this one. Is that ok @bogdandrutu @tigrannajaryan ? |
- `persistent_storage_enabled` (default = false): When set, enables persistence via a file storage extension | ||
(note, `enable_unstable` build tag needs to be enabled first, see below for more details) | ||
- `storage` (default = none): When set, enables persistence and uses the component specified as a storage extension for persistent queue | ||
- `persistent_storage_enabled` (default = false): When set, enables persistence using the only available storage extension (fails when no storage extension is enabled or more than one storage extension is available) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this setting necessary? If the storage
is unspecified the persistent queue is disabled. Isn't that enough for control? Or do we want to have this so that we can enable the storage without knowing the name of the extension?
Alternate suggestion: can we delete persistent_storage_enabled
setting and make *
(or some other character sequence) a special value for storage
setting which finds the only available storage extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage
is a new setting which allows to explicitly select which storage extension to use. However, when only one storage extension is specified in the config, we want to be able to automatically attach it and having storage: ...
specified is not needed. In such case, it would be still helpful to have control over selecting if file-backed or memory-backed queue is used. Hence persistent_storage_enabled
. I think any suggestions on how to make it more elegant are more than welcome but I think we want to retain such kind of functionality
package exporterhelper // import "go.opentelemetry.io/collector/exporter/exporterhelper" | ||
|
||
// TODO: rename the file to queued_retry_sender.go after merging the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this TODO? Why can't we rename before merging the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan this is due to this suggestion
StorageID *config.ComponentID `mapstructure:"storage"` | ||
// StorageEnabled describes whether persistence via a file storage extension is enabled using the single | ||
// default storage extension. | ||
StorageEnabled bool `mapstructure:"persistent_storage_enabled"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens is storage
is specified and persistent_storage_enabled
is also true? Which setting wins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the more trickier example is persistent_storage_enabled: false
and storage
referring some existing storage extension. Basing on the name, I would assume persistent_storage_enabled
should have precedence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I understand this logic, is that persistent_storage_enabled
turns the feature on. Then there's a certain implicit default for storage
- if we only have one storage extension, then we use that, otherwise we error. If the storage is explicitly set, we use its value. I believe this is what the code does, and it makes the most intuitive sense to me.
if qCfg.StorageEnabled || qCfg.StorageID != nil { | ||
qrs.queue, qrs.queueStartFunc = internal.NewPersistentQueue(qrs.fullName, qrs.signal, qrs.cfg.QueueSize, qrs.logger, qrs.requestUnmarshaler) | ||
// TODO: following can be further exposed as a config param rather than relying on a type of queue | ||
qrs.requeuingEnabled = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is a new functionality that didn't exist before. Can we remove this from this PR and add in a subsequent PR? This PR should be only about enabling Persistent Queue by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to be enabled here:
qrs.requeuingEnabled = true |
@@ -56,7 +56,7 @@ func TestQueuedRetry_DropOnPermanentError(t *testing.T) { | |||
}) | |||
|
|||
ocs.run(func() { | |||
// This is asynchronous so it should just enqueue, no errors expected. | |||
// This is asynchronous so it should just enqueue, no errors expectedError. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change and other similar changes below intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an overzealous replace, I'll fix it.
@swiatekm-sumo sounds good. I also left some comments above. |
Is this entirely superseded by #5711 |
Description:
Persistent Queue is a feature that currently requires "enable_unstable" tag provided to the the build to enable it. We have been using it in our distro for several months already and it looks stable. It seems that others are using this capability as well (example)
I would like to propose enabling it by default, which also simplifies the build process a little bit.