Skip to content

Commit

Permalink
acceptance: make docker more resilient to timeout in ContainerStart
Browse files Browse the repository at this point in the history
Docker likes to never respond to us, and we do not usually have cancellations
on the context (which would not help, after all, that would just fail the test
right there). Instead, try a few times.

The problem looks similar to

golang/go#16060
golang/go#5103

Another possibility mentioned in usergroups is that some file descriptor limit
is hit. Since I've never seen this locally, perhaps that's the case on our
agent machines. Unfortunately, those are hard to SSH into.

This may not be a good idea (after all, perhaps `Start()` succeeded) and we'd
have to do something similar for `ContainerWait`. But, at least it should
give us an additional data point: do the retries also just block? Is the
container actually started when we retry?
  • Loading branch information
tbg committed Aug 23, 2017
1 parent b085a91 commit 50554e9
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions pkg/acceptance/cluster/docker.go
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,27 @@ type resilientDockerClient struct {
client.APIClient
}

func (cli resilientDockerClient) ContainerStart(
clientCtx context.Context, id string, opts types.ContainerStartOptions,
) error {
for {
err := func() error {
ctx, cancel := context.WithTimeout(clientCtx, 20*time.Second)
defer cancel()

return cli.APIClient.ContainerStart(ctx, id, opts)
}()

// Keep going if ContainerStart timed out, but client's context is not
// expired.
if err == context.DeadlineExceeded && clientCtx.Err() == nil {
log.Warningf(clientCtx, "ContainerStart timed out, retrying")
continue
}
return err
}
}

func (cli resilientDockerClient) ContainerCreate(
ctx context.Context,
config *container.Config,
Expand Down

0 comments on commit 50554e9

Please sign in to comment.