Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high memory usage of avro.Parse/avro.ParseWithCache #510

Open
Anton-Kot opened this issue Mar 15, 2025 · 2 comments
Open

high memory usage of avro.Parse/avro.ParseWithCache #510

Anton-Kot opened this issue Mar 15, 2025 · 2 comments

Comments

@Anton-Kot
Copy link

Hi, thanks for this package, I found it quite convenient and effective after goavro in my use case. However, I had to add some staff to fix one problem with the caching of the parsed schema. I launch a consumer for 30 seconds and measure the dump of memory, in particular alloc_space sample of pprof. I got:

  • 7+ Gb for avro.ParseWithCache
    • 4+ Gb for go.Unmarshall
    • 2+ Gb for avro.parseType
  • 4+ Gb for avro.Unmarshall

It does not look very effective and it seems that the cache is not used. When I put the values ​​of the parsed schemas in the package variable, allocations decreased to 300 MB. So I temporarily fixed this problem, but I want to know why the cache is not used.

  • Maybe I use the package incorrectly, calling the avro.Parse() for each iteration of the message processing?
  • Maybe there are problems with caching the schemas made by goavro.Codec?
  • Maybe my measurements are not entirely correct and do not measure real memory consumption?

The original solution, simplified:

func (h *MessagesHandler) HandleMessage(ctx context.Context, message consumer.Message) error {
	decodedValue, err := avrolocal.Decode(message.Value()) // parse version from the header and reject it
	if err != nil {
		return err
	}

	avroCodec, err := h.schemaRegistryClient.GetSchema(decodedValue.SchemaID) // goavro.Codec
	if err != nil {
		return err
	}

	messageSchema, err = hamba.Parse(avroCodec.Schema())
	if err != nil {
		return err
	}

	var e entity.ThreadMessageEvent

	if err := hamba.Unmarshal(messageSchema, decodedValue.Content, &e); err != nil {
		return err
	}

	// call usecases

	return nil
}

My solution with the package var, simplified:

var messageSchema hamba.Schema

func (h *MessagesHandler) HandleMessage(ctx context.Context, message consumer.Message) error {
	...

	if messageSchema == nil {
		messageSchema, err = hamba.Parse(avroCodec.Schema())
		if err != nil {
			return err
		}
	}

	...
}

heapdumps:

Image

Image

Image

@nrwiersma
Copy link
Member

By cache I assume you mean the SchemaCache, and it is very much used, see https://github.com/hamba/avro/blob/main/schema_parse.go#L119. Without it it would not be possible to refer to named schema.

@nrwiersma
Copy link
Member

I do not see anything overly wrong with what you are doing, but the amount of memory it is using is surprising, but I cannot say what is happening without a lot more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants