Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Alpine-based images quit with fatal error on aarch64 #23306

Closed
2 of 3 tasks
ozangunalp opened this issue Sep 13, 2024 · 6 comments
Closed
2 of 3 tasks

[Bug] Alpine-based images quit with fatal error on aarch64 #23306

ozangunalp opened this issue Sep 13, 2024 · 6 comments
Labels
triage/lhotari/important lhotari's triaging label for important issues or PRs type/bug The PR fixed a bug or issue reported a bug
Milestone

Comments

@ozangunalp
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

Official Pulsar images with 3.3.0 and 3.3.1

Minimal reproduce step

Running alpine-based container images on aarch64 machine.
We could reproduce it on RHEL 8 and raspberrypi but not not M1.

What did you expect to see?

Pulsar server continue to run

What did you see instead?

Here is the complete log of the container:

Here is the log: pulsar.txt

Last lines of log before fatal error :

2024-09-10T14:02:58,193+0000 [pulsar-io-18-4] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0xa1215d54, L:/127.0.0.1:6650 - R:/127.0.0.1:34536] [SR:127.0.0.1, state:Connected]] Subscribing on topic persistent://public/default/__change_events / reader-936c229a0f. consumerId: 0
2024-09-10T14:02:58,269+0000 [pulsar-io-18-4] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/default/persistent/__change_events
2024-09-10T14:02:58,271+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImpl - Creating '/managed-ledgers/public/default/persistent/__change_events'
2024-09-10T14:02:58,340+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [192.168.144.2:46605] for ledger: 1
2024-09-10T14:02:58,344+0000 [BookKeeperClientWorker-OrderedExecutor-18-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/__change_events] Created ledger 1 after closed null
2024-09-10T14:02:58,352+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [public/default/persistent/__change_events] Successfully initialize managed ledger
2024-09-10T14:02:58,394+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/__change_events] Disabled replicated subscriptions controller
2024-09-10T14:02:58,428+0000 [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/__change_events] Cursor __compaction recovered to position 1:-1
2024-09-10T14:02:58,444+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/__change_events] Opened new cursor: ManagedCursorImpl{ledger=public/default/persistent/__change_events, name=__compaction, ackPos=1:-1, readPos=1:0}
2024-09-10T14:02:58,455+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.pulsar.broker.service.BrokerService - Created topic persistent://public/default/__change_events - dedup is disabled
2024-09-10T14:02:58,501+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [192.168.144.2:46605] for ledger: 2
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000ffffa0b43e78, pid=10, tid=280
#
# JRE version: OpenJDK Runtime Environment Corretto-21.0.3.9.1 (21.0.3+9) (build 21.0.3+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-21.0.3.9.1 (21.0.3+9-LTS, mixed mode, tiered, compressed class ptrs, z gc, linux-aarch64)
# Problematic frame:
# 2024-09-10T14:03:28,153+0000 [pulsar-io-18-5] INFO  org.apache.pulsar.broker.service.ServerCnx - [/192.168.144.1:44996] Closing consumer: consumerId=0
2024-09-10T14:03:28,154+0000 [pulsar-io-18-5] INFO  org.apache.pulsar.broker.service.ServerCnx - [/192.168.144.1:44996] Closed consumer before its creation was completed. consumerId=0
2024-09-10T14:03:28,174+0000 [pulsar-io-18-5] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /192.168.144.1:44996
2024-09-10T14:03:28,174+0000 [pulsar-io-18-1] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /192.168.144.1:44986

Anything else?

Originally posed on quarkusio/quarkus#43187

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@ozangunalp ozangunalp added the type/bug The PR fixed a bug or issue reported a bug label Sep 13, 2024
@lhotari
Copy link
Member

lhotari commented Sep 16, 2024

Thanks for reporting this issue @ozangunalp.
Most of the Pulsar developers use Macs with Apple Silicon so I guess that's why we haven't caught this issue earlier.

Running alpine-based container images on aarch64 machine.
We could reproduce it on RHEL 8 and raspberrypi but not not M1.

Any hints for what would be a practical to reproduce this? Using a cloud VM on aarch64? Any recommendations?

@lhotari lhotari added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Sep 16, 2024
@lhotari lhotari added this to the 4.0.0 milestone Sep 16, 2024
@ozangunalp
Copy link
Author

Most of the Pulsar developers use Macs with Apple Silicon so I guess that's why we haven't caught this issue earlier.

Same for me. I was able to reproduce it with a Raspberry Pi running podman :
Raspberry Pi 5 Model B Rev 1.0
Linux raspberrypi 6.1.0-rpi7-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux

But yes a cloud VM on aarch64 should work.

@lhotari
Copy link
Member

lhotari commented Oct 14, 2024

I tried to reproduce on GCP t2a-standard-1 / Ampere Altra Arm64 with Debian Bookworm and docker installed with instructions from https://docs.docker.com/engine/install/debian/. I couldn't reproduce the issue.

@lhotari
Copy link
Member

lhotari commented Oct 14, 2024

I tried to reproduce on GCP t2a-standard-1 / Ampere Altra Arm64 with Debian Bookworm and podman and couldn't reproduce the issue.

@lhotari
Copy link
Member

lhotari commented Oct 14, 2024

It didn't reproduce with RHEL 9 on GCP t2a-standard-1 / Ampere Altra Arm64
GCP doesn't have RHEL 8 image available for Arm64, so I used RHEL 9 Arm64 image.

[lari_hotari@instance-20241014-100511 ~]$ uname -a
Linux instance-20241014-100511 5.14.0-427.37.1.el9_4.aarch64 #1 SMP PREEMPT_DYNAMIC Fri Sep 13 17:15:09 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

used these commands

yum install -y podman tmux
tmux
# in one tmux window
podman run --rm -it docker.io/apachepulsar/pulsar:3.3.1 bin/pulsar standalone
# in another CTRL-B C
podman exec -it pulsar bin/pulsar-perf produce test

@ozangunalp Do you have any suggestions for reproducing on a cloud VM? Which commands should I use?

@lhotari lhotari added triage/lhotari/important lhotari's triaging label for important issues or PRs and removed release/blocker Indicate the PR or issue that should block the release until it gets resolved labels Oct 14, 2024
@lhotari lhotari modified the milestones: 4.0.0, 4.1.0 Oct 14, 2024
@lhotari
Copy link
Member

lhotari commented Jan 5, 2025

This is most likely resolved with #23762 and will be included in Pulsar 3.3.4 and Pulsar 4.0.2 releases.

@lhotari lhotari closed this as completed Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/lhotari/important lhotari's triaging label for important issues or PRs type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants