-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test failures in version 1.1.0 #170
Comments
Check here #163 |
Am runing antregos linux
|
This is odd, and not something I've seen previously. Sorry you're experiencing this! Obviously based on the stack trace it looks as though netty (which is quite new to docker-java and therefore testcontainers) is being constrained by a lack of available file descriptors. Quite why that is, I'm not sure yet... Some possibilities that we could try and eliminate:
Given that two of you are experiencing this on different environments it's looking like there's clearly something wrong in the libs, though - and I'm confused why this hasn't arisen before! Just to start eliminating (3), please could you check the output of (on OS X):
or on Linux:
|
Just one more thing to check (though I really hope this isn't relevant):
I get:
|
For me (OS X):
|
Docker info Mac
Also For Mac
For linux
|
Thanks, I think we can probably disregard (3) for now. I'm wondering if (1) is correct and we're just seeing the resource limit being hit on your machines earlier than elsewhere. By the looks of it we could be closing down docker client instances after test method/class execution but don't, which could well be the cause of a resource leak. I'll patch with a simple |
If you wouldn't mind, could you try this branch some time and see if it makes a difference? |
Almost the same failures here on that branch:
and the same stack trace:
|
😢 Then I think the problem may not be a leak, but a surge in the number of clients being used at startup. Right now each container has its own Docker Client instance; this used to be required because the Jersey implementation of docker-java was not thread-safe. Now that we're using the netty docker-java interface, apparently it is thread safe so we could go back to a single shared client. That would be less resource heavy. I'll have a go at this tomorrow - sorry for keeping you waiting...! |
On the contrary, thanks for the quick response. |
Your doing a great On the contrary, thanks for the quick response. — You received this message because you are subscribed to the Google Groups |
Just to follow up - I've hit a problem adapting Testcontainers to use a single instance of docker client (which would solve the resource leak otherwise): docker-java/docker-java#632 |
I've hit this as well. It's |
I've hacked Note that I'm not even getting to the point of runnings tests. This is just the initial instantiation of |
Yeah - with Basically the problem/solution is a bit nested:
There's something very odd going on so I'm going to try and narrow down docker-java/docker-java#632 to something that helps identify exactly what the root cause is. I'm partly hopeful that I'll find I've done something silly in the instantiation of DockerClient..! Hopefully we'll get there soon, but it's a bit more difficult than I'd have liked. Sorry to anyone afflicted by this. If anybody has any thoughts/solutions/data points to add, please feel free! |
I'm seeing an initial explosion of ~8000 file descriptors on just the initial creation of |
@valdisrigdon would you mind sending the list of file descriptors (either email to [email protected] or a gist would be fine)? This might yield more clues... |
This should help.
|
Yikes - that definitely doesn't look healthy. I was looking at slow growth of KQUEUE file descriptors but this is rather different. This is what I'm getting right now (mid test on testcontainers core module):
That's all - just 1 TCP connection to the docker daemon! This is running docker-machine (0.8.0-rc1) on OS X. Working out how to reproduce this is going to be half the challenge, I suspect... |
@rnorth I can (obviously) easily reproduce this. I can test out a branch, suggested changes, or run with some extra debug options if you need. |
Thanks @valdisrigdon - I'm afraid I'll probably have to take you up on that. |
I've put together a small test project to do some diagnostics in/around the area that seems to be problematic: https://github.com/rnorth/testcontainers-fd-debug The idea is to narrow down or eliminate exactly where the leak is occurring. Would you mind checking it out, having a look over it, and running, then send the output to me? |
@rnorth Working on running it. Running it through the Maven project worked fine -- I didn't see the huge number of open sockets. When adapting the test to run through the IDE or a Gradle project, I'm still seeing the errors. And of course since I've run out of file descriptors, it's can't run |
After converting |
@valdisrigdon thanks. A conflict with another version of Netty on the classpath sounds like a plausible cause for odd behaviour, so worth looking at. However, I think @dbyron0 and @ihabsoliman were encountering this just when building testcontainers from source. Confirmation of this would be useful, though! |
I am been using just including it from maven but the project am using it in
|
…tions) to resolve as a potential cause of #170
@ihabsoliman thanks Re the commit I've just linked above, since I can't reproduce the issue I've not been able to derive a test where it makes a difference. However, it's one of those things that I think ought to be done anyway. |
Yup, I was building from source. Likely won't get to https://github.com/rnorth/testcontainers-fd-debug til tomorrow. |
FWIW I don't get tons of file handles when running |
So this maybe something. I was able to use
|
I shaded |
Ah, that's great to hear. Thanks for identifying that as a cause of the I'm a little concerned that there's an underlying/alternative root cause |
Shaded netty dependencies, pushed to branch. Testing of this would be much appreciated as always! |
At 6f41fc5 on #170-fix-resource-leak I'm still seeing some failures. I apologize in advance as I've changed versions of some underlying docker stuff since I reported this, so it's not the best test. Here's my current docker info:
And the results:
same stack trace as before... |
Hi David Thanks for running this so quickly. It's useful to know that this is only I think the next logical step is the shared DockerClient model rather than RichardRichard |
I wish I could help more. I'm happy to try the next thing, and the thing after that, etc. -DB |
Tests pass for me on this branch.
|
Hey guys,
|
2nd run
|
@ihabsoliman thanks - that looks like it might be a different problem though. There are timeouts while pulling images. Testcontainers is less aggressive about retrying container startup in this release, and I think what's happened is the startup retries were masking slow pulls of containers before. Now, container startup can more easily timeout if it takes a while to pull any images. This is a bug, and I'll raise a separate ticket to address. |
Shade io.netty dependencies into Testcontainers JAR Improve cleanup of docker clients
I'm in the process of releasing v1.1.1 with the bug fixes discussed above. @dbyron0 I'm afraid this may not work for you still - until we've figured out a way to reproduce the issue you're seeing it's going to be difficult. |
Just gave the head of master (d445d75) a whirl. I still see some failures, but something's different. I see a (> 5 minute) hang here:
Up to that point, here are the failures:
|
Things got better after I ran
|
Same as before with a86827b, effectively 1.1.2. |
David RichardRichard |
As far as we've been able to tell the changes in 1.1.4 should fix this. Will close. Thanks to all for your patience and efforts in diagnosis. |
At commit 248befb, I'm seeing test failures on my local box. There's enough output that I hesitate to paste it all. I'm happy to add more to help figure out what's going on.
with a stack trace that shows up a bunch:
I'm running OS X 10.10.5, using docker toolbox 1.11.2.
The text was updated successfully, but these errors were encountered: