-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update routine for migration of jaeger remote sampling in version 0.61.0 #1116
Merged
pavolloffay
merged 5 commits into
open-telemetry:main
from
frzifus:remote_sampling_upgrade_routine
Oct 12, 2022
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
3fa287b
upgrade routine for jaeger remote sampling
frzifus c38db31
upgrade routine to fail if jaeger receiver uses remote sapmling
frzifus 0bbe115
improve error message and create event when upgrade fails
frzifus 557a3ff
always inform users about failed upgrades
frzifus 56c792a
follow recommendations
frzifus File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
receivers: | ||
jaeger: | ||
protocols: | ||
grpc: | ||
remote_sampling: | ||
strategy_file: "/etc/strategy.json" | ||
strategy_file_reload_interval: 10s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
receivers: | ||
jaeger: | ||
protocols: | ||
grpc: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package upgrade | ||
|
||
import ( | ||
"errors" | ||
"fmt" | ||
"strings" | ||
|
||
"github.com/open-telemetry/opentelemetry-operator/apis/v1alpha1" | ||
"github.com/open-telemetry/opentelemetry-operator/pkg/collector/adapters" | ||
) | ||
|
||
func upgrade0_61_0(u VersionUpgrade, otelcol *v1alpha1.OpenTelemetryCollector) (*v1alpha1.OpenTelemetryCollector, error) { | ||
if len(otelcol.Spec.Config) == 0 { | ||
return otelcol, nil | ||
} | ||
|
||
otelCfg, err := adapters.ConfigFromString(otelcol.Spec.Config) | ||
if err != nil { | ||
return otelcol, fmt.Errorf("couldn't upgrade to v0.61.0, failed to parse configuration: %w", err) | ||
} | ||
|
||
// Search for removed Jaeger remote sampling settings. (https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/14163) | ||
receiversConfig, ok := otelCfg["receivers"].(map[any]any) | ||
if !ok { | ||
// In case there is no extensions config. | ||
return otelcol, nil | ||
} | ||
|
||
for key, rc := range receiversConfig { | ||
k, ok := key.(string) | ||
if !ok { | ||
continue | ||
} | ||
cfg, ok := rc.(map[any]any) | ||
// check if jaeger is configured | ||
if !ok || !strings.HasPrefix(k, "jaeger") { | ||
continue | ||
} | ||
|
||
// check if remote sampling settings exit | ||
if _, ok := cfg["remote_sampling"]; !ok { | ||
continue | ||
} | ||
|
||
const issueID = "https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/14707" | ||
errStr := fmt.Sprintf( | ||
"jaegerremotesampling is no longer available as receiver configuration. "+ | ||
"Please use the extension instead with a different remote sampling port. See: %s", | ||
issueID, | ||
) | ||
u.Recorder.Event(otelcol, "Error", "Upgrade", errStr) | ||
return nil, errors.New(errStr) | ||
} | ||
return otelcol, nil | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package upgrade_test | ||
|
||
import ( | ||
"context" | ||
_ "embed" | ||
"testing" | ||
|
||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/client-go/tools/record" | ||
|
||
"github.com/open-telemetry/opentelemetry-operator/apis/v1alpha1" | ||
"github.com/open-telemetry/opentelemetry-operator/internal/version" | ||
"github.com/open-telemetry/opentelemetry-operator/pkg/collector/upgrade" | ||
) | ||
|
||
var ( | ||
//go:embed testdata/v0_61_0-valid.yaml | ||
valid string | ||
//go:embed testdata/v0_61_0-invalid.yaml | ||
invalid string | ||
) | ||
|
||
func Test0_61_0Upgrade(t *testing.T) { | ||
|
||
collectorInstance := v1alpha1.OpenTelemetryCollector{ | ||
TypeMeta: metav1.TypeMeta{ | ||
Kind: "OpenTelemetryCollector", | ||
APIVersion: "v1alpha1", | ||
}, | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Name: "otel-my-instance", | ||
Namespace: "somewhere", | ||
}, | ||
Spec: v1alpha1.OpenTelemetryCollectorSpec{}, | ||
} | ||
|
||
tt := []struct { | ||
name string | ||
config string | ||
expectErr bool | ||
}{ | ||
{ | ||
name: "no remote sampling config", // valid | ||
config: valid, | ||
expectErr: false, | ||
}, | ||
{ | ||
name: "has remote sampling config", // invalid | ||
config: invalid, | ||
expectErr: true, | ||
}, | ||
} | ||
|
||
for _, tc := range tt { | ||
t.Run(tc.name, func(t *testing.T) { | ||
collectorInstance.Spec.Config = tc.config | ||
collectorInstance.Status.Version = "0.60.0" | ||
|
||
versionUpgrade := &upgrade.VersionUpgrade{ | ||
Log: logger, | ||
Version: version.Get(), | ||
Client: k8sClient, | ||
Recorder: record.NewFakeRecorder(upgrade.RecordBufferSize), | ||
} | ||
|
||
_, err := versionUpgrade.ManagedInstance(context.Background(), collectorInstance) | ||
if (err != nil) != tc.expectErr { | ||
t.Errorf("expect err: %t but got: %v", tc.expectErr, err) | ||
} | ||
}) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that the message points to the GH issue, but the root cause is not obvious. I would suggest putting either here a sentence that the ports should be changed for receiver and extension or make it super clear in the GH issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second question:
This is the first time when we actually expect the upgrade to fail. How is this error propagated to the end-user? It would be good to have this in the operator logs but as well recoded as k8s evet.
Third question: What happens if the upgrade fails? The instance version in the status will not be upgraded/the image will not be upgraded? If the user fixes the CR by splitting the ports will the upgrade kick in again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i tried to make it clear by saying:
The jaeger remote sampling extension collides when converting old jaeger receiver settings. The problem can be solved by manually switching to another port, but this leads to potentially invalid configurations for reporters.
What would you recommend instead? Or what error message do you have in mind?Good point. All actions are simply ignored. Right now there is only a single log entry.
Your CR status will not change. If you fix this issue manually, its all fine. There is no other upgrade routine for
0.61.0
. Everytime you restart or deploy a newer operator version, all upgrade routines are triggered again.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the fix is done, will the operator upgrade the instance by bumping the OTELcol version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to double check that. But i assume when you modified your CR, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please report back if the upgrade was tested when Jaeger receiver was used with enabled remote sampling. We need to make sure the instance will be upgraded after the CR is fixed (e.g. extension mapped to a different port)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavolloffay i have now been able to test it locally. after a configuration update, the version is not automatically raised. To achieve this, either the operator must be restarted or the collector configuration must be deleted and recreated.
Details...
Operator log message, once the config is detected.
procedure:
Should we simply add that to the issue description, add a auto update flag or do you have other ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good finding, we should definitely document this in the codebase that if the error is thrown during the upgrade then the version is never updated.
There are several ways how we can proceed:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of a failing update, we will now create a log entry and event. See: 557a3ff