Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ByteBuf.release() io.netty.util.ResourceLeakDetector - Cosmos SDK 4.0.1-beta.1 #10101

Closed
tikooamit opened this issue Apr 11, 2020 · 2 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. cosmos:v4-item Indicates this feature will be shipped as part of V4 release train Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. pillar-reliability The issue is related to reliability, one of our core engineering pillars. (includes stress testing)

Comments

@tikooamit
Copy link

Hey Guys,
We are using Cosmos SDK 4.0.1-beta.1 and we sporadically receive the exception. The application in which we are observing this exception is using ChangeFeed.
Connection String:

{
  "thread": "main",
  "level": "INFO",
  "loggerName": "com.azure.cosmos.implementation.RxDocumentClientImpl",
  "message": "Initializing DocumentClient with serviceEndpoint [https://xxx:443], connectionPolicy [ConnectionPolicy{requestTimeout=PT1M, mediaRequestTimeout=PT5M, connectionMode=DIRECT, maxPoolSize=1000, idleConnectionTimeout=PT1M, userAgentSuffix='', retryOptions=RetryOptions{maxRetryAttemptsOnThrottledRequests=9, maxRetryWaitTime=PT30S}, enableEndpointDiscovery=true, preferredLocations=null, usingMultipleWriteLocations=true, inetSocketProxyAddress=null}], consistencyLevel [Eventual], directModeProtocol [Tcp]",
  "contextMap": {},
  "timestamp": "2020-04-08 20:42:04.550"
}

Exception:

{
  "thread": "cosmos-rntbd-nio-2-1",
  "level": "ERROR",
  "loggerName": "io.netty.util.ResourceLeakDetector",
  "message": "LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.\nRecent access records: \nCreated at:\n\tio.netty.buffer.PooledByteBufAllocator.newHeapBuffer(PooledByteBufAllocator.java:332)\n\tio.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:168)\n\tio.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:159)\n\tio.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:117)\n\tio.netty.handler.ssl.SslHandler.allocate(SslHandler.java:2136)\n\tio.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1319)\n\tio.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219)\n\tio.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266)\n\tio.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498)\n\tio.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437)\n\tio.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)\n\tio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)\n\tio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)\n\tio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)\n\tio.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)\n\tio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)\n\tio.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tio.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)\n\tio.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)\n\tio.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)\n\tio.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)\n\tio.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\n\tio.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tio.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tio.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tjava.base/java.lang.Thread.run(Unknown Source)",
  "contextMap": {},
  "timestamp": "2020-04-08 20:47:50.751"
}

The application was under continuous load and luckily not in PROD. We have been experiencing the same exception with older versions.
I found a similar issue being reported before see this. But I think the use case there was not CF.

Can you please help us in figuring out what might the issue?

Thanks.

@joshfree joshfree added Client This issue points to a problem in the data-plane of the library. Cosmos cosmos:v4-item Indicates this feature will be shipped as part of V4 release train customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Apr 14, 2020
@joshfree
Copy link
Member

Thanks for reporting this cosmos 4.0.1-beta.1 issue @tikooamit. @kushagraThapar could you please follow up with @tikooamit?

@joshfree joshfree added the pillar-reliability The issue is related to reliability, one of our core engineering pillars. (includes stress testing) label Apr 14, 2020
@kushagraThapar
Copy link
Member

kushagraThapar commented Apr 21, 2020

@tikooamit - this has been fixed and released in version 4.0.1-beta.2
Closing this issue, feel free to reopen if you see this again.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. cosmos:v4-item Indicates this feature will be shipped as part of V4 release train Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. pillar-reliability The issue is related to reliability, one of our core engineering pillars. (includes stress testing)
Projects
None yet
Development

No branches or pull requests

3 participants