Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcplogreceiver corrupts messages during decoding. #24980

Closed
adam-mateen opened this issue Aug 7, 2023 · 9 comments
Closed

tcplogreceiver corrupts messages during decoding. #24980

adam-mateen opened this issue Aug 7, 2023 · 9 comments
Labels
bug Something isn't working help wanted Extra attention is needed receiver/tcplog

Comments

@adam-mateen
Copy link

Component(s)

receiver/tcplog

What happened?

Description

Customer has multiple processes (clients) connecting via TCP to our agent.
They noticed the final "Log Events" were corrupted (i.e. random extra characters and/or replaced characters)

The root cause is this line of code:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/input/tcp/tcp.go#L292

func (t *Input) handleMessage(ctx context.Context, conn net.Conn, log []byte) {
	decoded, err := t.encoding.Decode(log)

There are separate goroutines for each accepted TCP connection.
And each one can call this in parallel.
However, the Encoding struct uses a single shared buffer:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/helper/encoding.go#L30-L52

decodeBuffer: make([]byte, 1<<12),

This is not the case in the open source version of Stanza.
Latest Stanza created a new buffer in each call to Decode():
https://github.com/observIQ/stanza/blob/main/operator/helper/encoding.go#L44-L62

So I recommend sync with latest Stanza.

Steps to Reproduce

  1. Start many client programs that write different "LogEvents" messages. Vary the size and contents.
  2. Monitor the output of the collector/agent regardless of which exporter is used it should reproduce.

Expected Result

  1. The message sent by client is exactly what gets exported.

Actual Result

  1. corruption

Collector version

0.77.0

Environment information

Environment

OS: AL2
Compiler(if manually compiled): go 1.20

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@adam-mateen adam-mateen added bug Something isn't working needs triage New item requiring triage labels Aug 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@adam-mateen
Copy link
Author

This looks like the offending commit - #16027

@djaglowski
Copy link
Member

Thanks for reporting this @adam-mateen. Would you be willing to contribute your fix?

@jefchien
Copy link
Contributor

jefchien commented Aug 8, 2023

@djaglowski I can do it. Please assign the issue to me.

@djaglowski
Copy link
Member

Thanks @jefchien!

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 10, 2023
Copy link
Contributor

github-actions bot commented Dec 9, 2023

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 9, 2023
@djaglowski djaglowski reopened this Dec 11, 2023
@djaglowski djaglowski added the help wanted Extra attention is needed label Dec 11, 2023
@jefchien
Copy link
Contributor

@djaglowski The PR above should've fixed this particular issue. Have there been other reports of message corruption since then?

@djaglowski
Copy link
Member

Thanks @jefchien. I saw the "time out" and didn't scroll up enough to recall that there was already a PR. I agree this solved unless further reports come in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed receiver/tcplog
Projects
None yet
Development

No branches or pull requests

4 participants