Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct reply-to during a rolling upgrade v3.12.14 -> v3.13.3: function_clause error in mc_compat #11380

Closed
lukebakken opened this issue Jun 5, 2024 Discussed in #11365 · 1 comment
Assignees
Labels
Milestone

Comments

@lukebakken
Copy link
Collaborator

Discussed in #11365

Originally posted by stoft June 4, 2024

Summary

When performing a rolling upgrade from v3.12.14 to v3.13.2/3 we get a function_clause error on v3.13.3:

Error
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0> ** Reason for termination ==
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0> ** {function_clause,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>        [{mc_compat,convert_to,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>             [mc_amqpl,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>              {delivery,false,false,<14531.1563.0>,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                  {basic_message,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                      {resource,<<"/">>,exchange,<<>>},
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                      [<<"amq.rabbitmq.reply-to.g1h2AA5yZXBseUAyNzIyNTAxMAAABrcAAAAAZl7V7w==.oyvwJ9+5nczD2h7KpiJN0Q==">>],
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                      {content,60,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                          {'P_basic',<<"application/json">>,<<"utf8">>,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                              [{<<"sequence_end">>,bool,true}],
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                              undefined,undefined,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                              <<"1d35fbb5-690a-4364-b617-500a53d0c8a2">>,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                              <<"insurala.local.node.73171.response.queue">>,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                              undefined,undefined,1717491235294,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                              <<"test.rabbot.request.reply">>,undefined,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                              undefined,undefined},
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                          <<230,96,16,97,112,112,108,105,99,97,116,105,111,110,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            47,106,115,111,110,4,117,116,102,56,0,0,0,15,12,115,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            101,113,117,101,110,99,101,95,101,110,100,116,1,36,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            49,100,51,53,102,98,98,53,45,54,57,48,97,45,52,51,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            54,52,45,98,54,49,55,45,53,48,48,97,53,51,100,48,99,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            56,97,50,40,105,110,115,117,114,97,108,97,46,108,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            111,99,97,108,46,110,111,100,101,46,55,51,49,55,49,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            46,114,101,115,112,111,110,115,101,46,113,117,101,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            117,101,0,0,1,143,226,116,121,222,25,116,101,115,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            116,46,114,97,98,98,111,116,46,114,101,113,117,101,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                            115,116,46,114,101,112,108,121>>,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                          rabbit_framing_amqp_0_9_1,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                          [<<"{\"foo\":\"bar asdfghjklkjhgfdsasdfghjkjhgfdsasdfghjkjhgfdsasdfghjkjhgfdsasdfghjkjhgfdsasdfghjkjhgfdsasdfghj\"}">>]},
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                      <<>>,false},
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>                  undefined,noflow}],
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>             [{file,"mc_compat.erl"},{line,134}]},
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>         {rabbit_channel,handle_cast,2,
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>             [{file,"rabbit_channel.erl"},{line,694}]},
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>         {gen_server2,handle_msg,2,[{file,"gen_server2.erl"},{line,1056}]},
rabbitmq2-1  | 2024-06-04 08:53:55.294611+00:00 [error] <0.1719.0>         {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}

Our setup

  • We are using the official docker images (rabbitmq:v3.x-management) and a custom init script adapted for running RabbitMQ on ECS Fargate.
  • Classic clustering with known nodes.
  • Clients in both .NET and NodeJS (using amqplib).

When the error presents

So far the error only presents under these circumstances:

  • Interaction scenario: request-response with direct replies
  • Responding client (NodeJS) on rabbit node1 (v3.12.14)
  • Requesting client (.NET) on rabbit node2 (v3.13.2 and v3.13.3)

The error does not present when:

  • The responding client is .NET
  • The responding client is NodeJS but connected to v3.13.x
  • When the cluster is fully upgraded.

We've reproduced this locally using docker compose, see attached log files.

node1v3_12.log
node2v3_13_2.log
node2v3_13_3.log

@lukebakken lukebakken added the bug label Jun 5, 2024
@michaelklishin michaelklishin changed the title Rolling upgrade v3.12.14 -> v3.13.3: function_clause error in mc_compat Direct reply-to during a rolling upgrade v3.12.14 -> v3.13.3: function_clause error in mc_compat Jun 5, 2024
@michaelklishin
Copy link
Member

From #11365 we know that the condition is

  • Client A connected to a 3.12.x node using direct reply-to
  • Client B is connected to a 3.13.x node also using direct reply-to

And preliminary findings suggest that Direct reply-to specifically needs a feature flag check for mixed version clusters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants