Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container won't start: Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces #318

Closed
lindycoder opened this issue Feb 15, 2019 · 14 comments

Comments

@lindycoder
Copy link

Hi we've been having issues starting this container for the last 2 to 6 days, it happens about 80% of the time:
Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces

We run this image (latest rabbitmq:management-alpine) alone (not part of a cluster) on a Linux web1 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux with Docker version 18.03.1-ce, build 9ee9f40.

This is part of a continuous integration test so it breaks our pipeline for no reason.

I saw that the image latest was updated 6 days ago, so i'll try to pin it down 1 version an see if it still breaks.

Here's how we launch the image:

version: '3.2'

services:
  rabbitmq:
    image: rabbitmq:management-alpine

    ports:
      - target: 5672
        published: 5672
        protocol: tcp
        mode: host
      - target: 15672
        published: 15672
        protocol: tcp
        mode: host

Here's a log from the container that won't start

  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:23.822818117Z"}
-- | --
  | {"log":"17:53:23.795 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:23.822874637Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:26.251371617Z"}
  | {"log":"17:53:26.251 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:26.251411015Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:26.252264839Z"}
  | {"log":"17:53:26.252 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:26.252276334Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:27.569508861Z"}
  | {"log":"17:53:27.569 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:27.569540686Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:27.573220152Z"}
  | {"log":"17:53:27.573 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:27.573232593Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:28.729536906Z"}
  | {"log":"17:53:28.729 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:28.729569364Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:28.730370551Z"}
  | {"log":"17:53:28.730 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:28.730382041Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:30.085369001Z"}
  | {"log":"17:53:30.085 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:30.085389145Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:30.086226698Z"}
  | {"log":"17:53:30.086 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:30.086237326Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:31.206243372Z"}
  | {"log":"17:53:31.204 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:31.206291953Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:31.2069083Z"}
  | {"log":"17:53:31.206 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:31.206919453Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:32.685782793Z"}
  | {"log":"17:53:32.685 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:32.685825074Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:32.686782748Z"}
  | {"log":"17:53:32.686 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:32.686794627Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:33.948984408Z"}
  | {"log":"17:53:33.948 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:33.949017824Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:33.95096452Z"}
  | {"log":"17:53:33.950 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:33.950976266Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:35.549782544Z"}
  | {"log":"17:53:35.549 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:35.549836484Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:35.550760395Z"}
  | {"log":"17:53:35.550 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:35.550771855Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:36.773777033Z"}
  | {"log":"17:53:36.773 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:36.773821808Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:36.774830053Z"}
  | {"log":"17:53:36.774 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:36.774841081Z"}
  | {"log":"\n","stream":"stderr","time":"2019-02-15T17:53:38.325481248Z"}
  | {"log":"17:53:38.325 [error] Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\n","stream":"stderr","time":"2019-02-15T17:53:38.325515128Z"}
  | {"log":"Distribution failed: {{:shutdown, {:failed_to_start_child, :auth, {'Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces', [{:auth, :init_cookie, 0, [file: 'auth.erl', line: 286]}, {:auth, :init, 1, [file: 'auth.erl', line: 140]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 374]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 342]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:\"rabbitmqcli-49-rabbit@5b33ae66bc93\", :shortnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}\n","stream":"stderr","time":"2019-02-15T17:53:38.343599968Z"}
  | {"log":"2019-02-15 17:53:38.746320 \r\n","stream":"stderr","time":"2019-02-15T17:53:38.75105244Z"}
  | {"log":"[]\r\n","stream":"stdout","time":"2019-02-15T17:53:38.751112274Z"}
  | {"log":"\"Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\"\r\n","stream":"stdout","time":"2019-02-15T17:53:38.751136561Z"}
  | {"log":"{error_logger,error_msg}\r\n","stream":"stdout","time":"2019-02-15T17:53:38.751141634Z"}
  | {"log":"#{label=\u003e{proc_lib,crash},report=\u003e[[{initial_call,{auth,init,['Argument__1']}},{pid,\u003c0.59.0\u003e},{registered_name,[]},{error_info,{error,\"Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\",[{auth,init_cookie,0,[{file,\"auth.erl\"},{line,286}]},{auth,init,1,[{file,\"auth.erl\"},{line,140}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,374}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,342}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,249}]}]}},{ancestors,[net_sup,kernel_sup,\u003c0.46.0\u003e]},{message_queue_len,0},{messages,[]},{links,[\u003c0.57.0\u003e]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,987},{stack_size,27},{reductions,936}],[]]}\r\n","stream":"stdout","time":"2019-02-15T17:53:38.751146067Z"}
  | {"log":"#{label=\u003e{supervisor,start_error},report=\u003e[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{\"Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\",[{auth,init_cookie,0,[{file,\"auth.erl\"},{line,286}]},{auth,init,1,[{file,\"auth.erl\"},{line,140}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,374}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,342}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,249}]}]}},{offender,[{pid,undefined},{id,auth},{mfargs,{auth,start_link,[]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]}\r\n","stream":"stdout","time":"2019-02-15T17:53:38.751155987Z"}
  | {"log":"#{label=\u003e{supervisor,start_error},report=\u003e[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,{shutdown,{failed_to_start_child,auth,{\"Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\",[{auth,init_cookie,0,[{file,\"auth.erl\"},{line,286}]},{auth,init,1,[{file,\"auth.erl\"},{line,140}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,374}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,342}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,249}]}]}}}},{offender,[{pid,undefined},{id,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]}\r\n","stream":"stdout","time":"2019-02-15T17:53:38.751163258Z"}
  | {"log":"#{label=\u003e{proc_lib,crash},report=\u003e[[{initial_call,{application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}},{pid,\u003c0.45.0\u003e},{registered_name,[]},{error_info,{exit,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,auth,{\"Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\",[{auth,init_cookie,0,[{file,\"auth.erl\"},{line,286}]},{auth,init,1,[{file,\"auth.erl\"},{line,140}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,374}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,342}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,249}]}]}}}}},{kernel,start,[normal,[]]}},[{application_master,init,4,[{file,\"application_master.erl\"},{line,138}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,249}]}]}},{ancestors,[\u003c0.44.0\u003e]},{message_queue_len,1},{messages,[{'EXIT',\u003c0.46.0\u003e,normal}]},{links,[\u003c0.44.0\u003e,\u003c0.43.0\u003e]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,987},{stack_size,27},{reductions,184}],[]]}\r\n","stream":"stdout","time":"2019-02-15T17:53:38.751171051Z"}
  | {"log":"#{label=\u003e{application_controller,exit},report=\u003e[{application,kernel},{exited,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,auth,{\"Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\",[{auth,init_cookie,0,[{file,\"auth.erl\"},{line,286}]},{auth,init,1,[{file,\"auth.erl\"},{line,140}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,374}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,342}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,249}]}]}}}}},{kernel,start,[normal,[]]}}},{type,permanent}]}\r\n","stream":"stdout","time":"2019-02-15T17:53:38.753508327Z"}
  | {"log":"{\"Kernel pid terminated\",application_controller,\"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,auth,{\\\"Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\\\",[{auth,init_cookie,0,[{file,\\\"auth.erl\\\"},{line,286}]},{auth,init,1,[{file,\\\"auth.erl\\\"},{line,140}]},{gen_server,init_it,2,[{file,\\\"gen_server.erl\\\"},{line,374}]},{gen_server,init_it,6,[{file,\\\"gen_server.erl\\\"},{line,342}]},{proc_lib,init_p_do_apply,3,[{file,\\\"proc_lib.erl\\\"},{line,249}]}]}}}}},{kernel,start,[normal,[]]}}}\"}\r\n","stream":"stdout","time":"2019-02-15T17:53:39.333745782Z"}
  | {"log":"    args:     format:     label: 2019-02-15 17:53:38.748716 crash_report        2019-02-15 17:53:38.748878 supervisor_report   2019-02-15 17:53:38.749620 supervisor_report   2019-02-15 17:53:38.750308 crash_report        2019-02-15 17:53:38.752482 std_info            Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,auth,{\"Error when reading /var/lib/rabbitmq\r\n","stream":"stderr","time":"2019-02-15T17:53:39.336497708Z"}
  | {"log":"\r\n","stream":"stderr","time":"2019-02-15T17:53:39.336514454Z"}
  | {"log":"Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done\r\n","stream":"stderr","time":"2019-02-15T17:53:39.643190583Z"}
  |  

Thank you!

@wglambert
Copy link

We've only had one other issue with your specific error #114
I'm not able to reproduce it with your compose file, it seems intermittent for those that run into it, but if you could get a reproducer then we could probe around

@michaelklishin
Copy link
Collaborator

What is very weird here is that cookie file read operations fail for several seconds, so it's not a cookie generation/read race condition that a basic retry would work around.

@lindycoder
Copy link
Author

lindycoder commented Feb 18, 2019

Just seen it happen with the rabbitmq:3.7.10-management-alpine that was updated 19 days ago, so it's probably not a problem from the last builds... although it started happening only a couple days ago. Will try reproduce and bump the docker version maybe...

@lindycoder
Copy link
Author

Hi, i managed to get the erlang crash dump from a failure, it's at https://gist.github.com/lindycoder/e16fd1a23ceaf292311978fc3d2620e5

was reproduced with

20:10:06      [exec] 3.7.10-management-alpine: Pulling from library/rabbitmq
20:10:11      [exec] Digest: sha256:3c6558f3f463cf6ad4f9e679228d713eed1f4f01aca823a5a6c97bdf28f86a67
20:10:11      [exec] Status: Downloaded newer image for rabbitmq:3.7.10-management-alpine

@michaelklishin
Copy link
Collaborator

@lindycoder the dump doesn't add much to what you already posted. The node terminates very early because it fails to load the cookie file. There are no RabbitMQ modules loaded on the node, which is a good hint at how early it happens.

I'm not a Docker expert by any stretch but it sounds like a volume initialization race condition with RabbitMQ node startup.

@lindycoder
Copy link
Author

Not a bad suggestion, since the dockerfile declares a VOLUME $RABBITMQ_DATA_DIR
i'll give it a try with more recent docker version, thanks

@michaelklishin
Copy link
Collaborator

Not sure what in a more recent Docker version would help. @tianon @yosifkit have you seen this before? Should the image inject a short delay to try to work such issues around (as unfortunate as it is)? Is there a different way to await volume initialisation?

@lindycoder
Copy link
Author

I'm also retrying with that directory mounted, so that it's a bound volume, will see if it gets better, if it does it might be the volume driver... #shotinthedark

@tianon
Copy link
Member

tianon commented Feb 19, 2019

Honestly, I've never seen a delay in Docker's volume/bind mounting -- it does all that before it ever launches the user process, so there shouldn't ever be a timing issue there. 😕

@yosifkit
Copy link
Member

Random idea: AppArmor is now "helpfully" getting in the way?

@lindycoder
Copy link
Author

Well seems it magically solved itself for no apparent reason, it might be a problem with the hypervisor of the vm running the docker i was running this with, i don't know. It might be a just a weird behavior, if it comes back i might just put a restart: unless-stopped in the compose file cause simply restarting the container does fix the issue.

Thanks for the help!

@Antiarchitect

This comment has been minimized.

@egakupi
Copy link

egakupi commented Nov 16, 2020

The same issue #114. It's not always reproduced.

Adding user: rabbitmq to docker-compose file helps

@nigelhorne

This comment has been minimized.

@docker-library docker-library locked as resolved and limited conversation to collaborators Sep 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants