-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrency issues in 1.3.0+ #315
Comments
This seems to be due to my use of multiple regions. Adding a second describe_volumes in the wait_until_volumes_ready w/no filter will return all the volumes for the wrong region. I can work around it by always recreating the ec2 client (there is probably a better way to do it than just blindly re-instantiating the object each time):
|
I have seen this behavior as well. It seems to occur when I am trying to test an instance created from an instance-store backed AMI. If I use an EBS-backed AMI there are no problems. |
This should also be fixed by #343 at least the root issue being seen, there are other concurrency demons but they are already tracked elsewhere. |
I'm seeing concurrency issues in 1.3.0+ that do not occur with 1.2.0. They seem to be related to the "wait_until_volumes_ready" function (or at least that function is where things stall).
The instances get created and then the process will hang "waiting for volumes to be ready". This will continue on until the timeout, then they will be deleted. Checking the EC2 instance manually via the console/etc shows it is alive/ready.
For example, if I start with a concurrency of 5, 2 may work and 3 may hang waiting for volumes (I've gone down to a concurrency of 2 and the problem remains). This seems to only happen on the first initial set of EC2s created. For example, say if have 10 platform/suite combinations, with a concurrency of 5. Some number of the first 5 will fail, but the remaining 5 will all start and continue on successfully. I'm guessing that whatever race condition is happening gets lost by the slight delays in terminating the existing "failed" instances and starting up the next instances.
Is anyone else seeing this?
The text was updated successfully, but these errors were encountered: