Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server terminates with "out of memory" periodically #762

Closed
bselwe opened this issue Mar 20, 2023 · 6 comments
Closed

Server terminates with "out of memory" periodically #762

bselwe opened this issue Mar 20, 2023 · 6 comments
Labels
bug Something isn't working need info Need to look into this

Comments

@bselwe
Copy link

bselwe commented Mar 20, 2023

Hey @vlidholt, we're using Serverpod with around 200 daily users. Each user is connected to Serverpod through the WebSocket connection and we are running periodic future calls (every 1 minute per each user). We've been experiencing "Out of memory" crashes every few hours, consistently. The server is running in a Docker container on AWS, with Dart 2.19.2, Serverpod 0.9.18, and --old_gen_heap_size=0 to encourage garbage collection more aggresively.

One thing that we managed to reproduce locally is the increasing memory usage of the long-running server. We found out that it was due to a large number of allocations of _ZLibInflateFilter and _ZLibDeflateFilter instances, which are related to compressions of the WebSocket messages. This happens when reconnecting WebSocket clients frequently. See below:

Incresing memory usage due to WebSocket compressions CleanShot 2023-03-17 at 16 49 36@2x

By disabling the compression of WebSocket messages, we were able to fix the above issue and get stable memory usage of the server. It's just a workaround though, ideally, the server's memory usage should be stable with the compression enabled. See below:

Memory usage graph with WebSocket compression enabled CleanShot 2023-03-17 at 15 32 39@2x
Memory usage graph with WebSocket compression disabled (fix) CleanShot 2023-03-17 at 15 31 05@2x

With the WebSocket compression enabled, the server crashed periodically at around 25-30% of the total available memory. With the disabled WebSocket compression we are getting stable memory usage, but the "Out of memory" failures still seem to occur periodically, every 2-3 hours. These failures happen in the main isolate and we aren't able to reproduce them locally. The issue is most likely not related to WebSocket compressions as the "Out of memory" exception occurs in both cases. Even though the server is crashing with "Out of memory", the DevTools are not showing any increased memory usage, as seen from the graphs.

Our issue seems to be very similar to dart-lang/sdk#50642 where the memory usage is normal and the "Out of memory" issue is thrown.

It is quite difficult to provide any reproduction steps for this issue. Any tips or insights on how to troubleshoot this issue would be very helpful and appreciated.

Below are one of the error details that we are getting.

Error output and stack trace
Exhausted heap space, trying to allocate 279280 bytes.
--
  | 2023-03-13 23:47:45.945221Z Internal server error. Zoned exception.
  | Out of Memory
  | #0 _rootRun (dart:async/zone.dart:1390:47)
  | #1 _CustomZone.run (dart:async/zone.dart:1300:19)
  | #2 _CustomZone.runGuarded (dart:async/zone.dart:1208:7)
  | #3 _BufferingStreamSubscription._sendDone (dart:async/stream_impl.dart:402:7)
  | #4 _HttpClientResponse.listen. (dart:_http/http_impl.dart:714:15)
  | #5 _rootRun (dart:async/zone.dart:1390:47)
  | #6 _CustomZone.run (dart:async/zone.dart:1300:19)
  | #7 _CustomZone.runGuarded (dart:async/zone.dart:1208:7)
  | #8 _BufferingStreamSubscription._sendDone (dart:async/stream_impl.dart:402:7)
  | #9 _ConverterStreamEventSink.close (dart:convert/chunked_conversion.dart:81:18)
  | #10 _SinkTransformerStreamSubscription._handleDone (dart:async/stream_transformers.dart:132:24)
  | #11 _rootRun (dart:async/zone.dart:1390:47)
  | #12 _CustomZone.run (dart:async/zone.dart:1300:19)
  | #13 _CustomZone.runGuarded (dart:async/zone.dart:1208:7)
  | #14 _BufferingStreamSubscription._sendDone (dart:async/stream_impl.dart:402:7)
  | #15 _FilterSink.close (dart:io/data_transformer.dart:534:11)
  | #16 _ConverterStreamEventSink.close (dart:convert/chunked_conversion.dart:81:18)
  | #17 _SinkTransformerStreamSubscription._handleDone (dart:async/stream_transformers.dart:132:24)
  | #18 _rootRun (dart:async/zone.dart:1390:47)
  | #19 _CustomZone.run (dart:async/zone.dart:1300:19)
  | #20 _CustomZone.runGuarded (dart:async/zone.dart:1208:7)
  | #21 _BufferingStreamSubscription._sendDone (dart:async/stream_impl.dart:402:7)
  | #22 _rootRun (dart:async/zone.dart:1390:47)
  | #23 _CustomZone.run (dart:async/zone.dart:1300:19)
  | #24 _CustomZone.runGuarded (dart:async/zone.dart:1208:7)
  | #25 _BufferingStreamSubscription._sendDone (dart:async/stream_impl.dart:402:7)
  | #26 _StreamController.close (dart:async/stream_controller.dart:630:5)
  | #27 _HttpParser._closeIncoming (dart:_http/http_parser.dart:1147:18)
  | #28 _HttpParser._doParse (dart:_http/http_parser.dart:813:11)
  | #29 _HttpParser._parse (dart:_http/http_parser.dart:319:7)
  | #30 _rootRunUnary (dart:async/zone.dart:1406:47)
  | #31 _CustomZone.runUnary (dart:async/zone.dart:1307:19)
  | #32 _CustomZone.runUnaryGuarded (dart:async/zone.dart:1216:7)
  | #33 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339:11)
  | #34 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:774:19)
  | #35 _StreamController._add (dart:async/stream_controller.dart:648:7)
  | #36 _Socket._onData (dart:io-patch/socket_patch.dart:2355:41)
  | #37 _rootRunUnary (dart:async/zone.dart:1406:47)
  | #38 _CustomZone.runUnary (dart:async/zone.dart:1307:19)
  | #39 _CustomZone.runUnaryGuarded (dart:async/zone.dart:1216:7)
  | #40 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339:11)
  | #41 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:774:19)
  | #42 _StreamController._add (dart:async/stream_controller.dart:648:7)
  | #43 _RawSecureSocket._sendReadEvent (dart:io/secure_socket.dart:1111:19)
  | #44 _rootRun (dart:async/zone.dart:1390:47)
  | #45 _CustomZone.run (dart:async/zone.dart:1300:19)
  | #46 _CustomZone.runGuarded (dart:async/zone.dart:1208:7)
  | #47 _CustomZone.bindCallbackGuarded. (dart:async/zone.dart:1248:23)
  | #48 _rootRun (dart:async/zone.dart:1398:13)
  | #49 _CustomZone.run (dart:async/zone.dart:1300:19)
  | #50 _CustomZone.bindCallback. (dart:async/zone.dart:1232:23)
  | #51 Timer._createTimer. (dart:async-patch/timer_patch.dart:18:15)
  | #52 _Timer._runTimers (dart:isolate-patch/timer_impl.dart:398:19)
  | #53 _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:192:26)
  | #54 _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:192:26)
  | ===== CRASH =====
  | si_signo=Segmentation fault(11), si_code=128, si_addr=(nil)
  | version=2.19.2 (stable) (Tue Feb 7 18:37:17 2023 +0000) on "linux_x64"
  | pid=1, thread=18, isolate_group=main(0x55b22cc9a000), isolate=main(0x55b22cd3d000)
  | os=linux, arch=x64, comp=no, sim=no
  | isolate_instructions=55b22a57ccc0, vm_instructions=55b22a57ccc0
  | pc 0x000055b22a5699ce fp 0x00007f60f90fdb40 dart::bin::Builtin_Filter_Processed(_Dart_NativeArguments*)+0xae
  | pc 0x000055b22a75281b fp 0x00007f60f90fdba0 dart::NativeEntry::AutoScopeNativeCallWrapperNoStackCheck(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*))+0x8b
  | pc 0x00007f6120702dc4 fp 0x00007f60f90fdbe0 Unknown symbol
  | pc 0x00007f60f9c38564 fp 0x00007f60f90fdc58 Unknown symbol
  | pc 0x00007f60f912a648 fp 0x00007f60f90fdcf8 Unknown symbol
  | pc 0x00007f60f5c9d13a fp 0x00007f60f90fdd30 Unknown symbol
  | pc 0x00007f60f9b644db fp 0x00007f60f90fddc0 Unknown symbol
  | pc 0x00007f60f91378bb fp 0x00007f60f90fde10 Unknown symbol
  | pc 0x00007f60f50c4ffd fp 0x00007f60f90fdec0 Unknown symbol
  | pc 0x00007f60f86dd292 fp 0x00007f60f90fdf08 Unknown symbol
  | pc 0x00007f60f86e14cc fp 0x00007f60f90fdf40 Unknown symbol
  | pc 0x00007f60f502e31f fp 0x00007f60f90fdf78 Unknown symbol
  | pc 0x00007f60f50243ed fp 0x00007f60f90fdfa0 Unknown symbol
  | pc 0x00007f612070738e fp 0x00007f60f90fdfd0 Unknown symbol
  | pc 0x00007f60f50144e0 fp 0x00007f60f90fe018 Unknown symbol
  | pc 0x00007f60f500b9fc fp 0x00007f60f90fe058 Unknown symbol
  | pc 0x00007f60f917c778 fp 0x00007f60f90fe088 Unknown symbol
  | pc 0x00007f60f9b644db fp 0x00007f60f90fe118 Unknown symbol
  | pc 0x00007f60f91378bb fp 0x00007f60f90fe168 Unknown symbol
  | pc 0x00007f60f50c4ffd fp 0x00007f60f90fe218 Unknown symbol
  | pc 0x00007f60f914c677 fp 0x00007f60f90fe250 Unknown symbol
  | pc 0x00007f60f9b6448b fp 0x00007f60f90fe2e0 Unknown symbol
  | pc 0x00007f60f91378bb fp 0x00007f60f90fe330 Unknown symbol
  | pc 0x00007f60f50c4ffd fp 0x00007f60f90fe3e0 Unknown symbol
  | pc 0x00007f60f914c677 fp 0x00007f60f90fe418 Unknown symbol
  | pc 0x00007f60f78889a8 fp 0x00007f60f90fe480 Unknown symbol
  | pc 0x00007f60f91719f3 fp 0x00007f60f90fe4a8 Unknown symbol
  | pc 0x00007f60f9173313 fp 0x00007f60f90fe4e8 Unknown symbol
  | pc 0x00007f612070300c fp 0x00007f60f90fe560 Unknown symbol
  | pc 0x000055b22a6f5fb9 fp 0x00007f60f90fe600 dart::DartEntry::InvokeCode(dart::Code const&, unsigned long, dart::Array const&, dart::Array const&, dart::Thread*)+0x139
  | pc 0x000055b22a6f5e35 fp 0x00007f60f90fe660 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)+0x145
  | pc 0x000055b22a6f8194 fp 0x00007f60f90fe6a0 dart::DartLibraryCalls::HandleMessage(long, dart::Instance const&)+0x144
  | pc 0x000055b22a71b778 fp 0x00007f60f90fec30 dart::IsolateMessageHandler::HandleMessage(std::__2::unique_ptr>)+0x348
  | pc 0x000055b22a74400a fp 0x00007f60f90fecb0 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)+0x15a
  | pc 0x000055b22a74472b fp 0x00007f60f90fed00 dart::MessageHandler::TaskCallback()+0x1db
  | pc 0x000055b22a86f95b fp 0x00007f60f90fed80 dart::ThreadPool::WorkerLoop(dart::ThreadPool::Worker*)+0x13b
  | pc 0x000055b22a86fda8 fp 0x00007f60f90fedb0 dart::ThreadPool::Worker::Main(unsigned long)+0x78
  | pc 0x000055b22a7e1216 fp 0x00007f60f90fee70 dart+0x2228216
  | -- End of DumpStackTrace
  | pc 0x0000000000000000 fp 0x00007f60f90fdbe0 sp 0x0000000000000000 [Stub] CallAutoScopeNative
  | pc 0x00007f60f9c38564 fp 0x00007f60f90fdc58 sp 0x00007f60f90fdbf0 [Unoptimized] [email protected]
  | pc 0x00007f60f912a648 fp 0x00007f60f90fdcf8 sp 0x00007f60f90fdc68 [Optimized] [email protected]
  | pc 0x00007f60f5c9d13a fp 0x00007f60f90fdd30 sp 0x00007f60f90fdd08 [Unoptimized] [email protected]
  | pc 0x00007f60f9b644db fp 0x00007f60f90fddc0 sp 0x00007f60f90fdd40 [Optimized] _rootRun@4048458
  | pc 0x00007f60f91378bb fp 0x00007f60f90fde10 sp 0x00007f60f90fddd0 [Optimized] _rootRun@4048458
  | pc 0x00007f60f50c4ffd fp 0x00007f60f90fdec0 sp 0x00007f60f90fde20 [Optimized] [email protected]
  | pc 0x00007f60f86dd292 fp 0x00007f60f90fdf08 sp 0x00007f60f90fded0 [Optimized] _BufferingStreamSubscription@4048458._sendDone@4048458
  | pc 0x00007f60f86e14cc fp 0x00007f60f90fdf40 sp 0x00007f60f90fdf18 [Optimized] [email protected]
  | pc 0x00007f60f502e31f fp 0x00007f60f90fdf78 sp 0x00007f60f90fdf50 [Optimized] [email protected]
  | pc 0x00007f60f50243ed fp 0x00007f60f90fdfa0 sp 0x00007f60f90fdf88 [Optimized] _SuspendState@4048458._returnAsyncStar@4048458
  | pc 0x00007f612070738e fp 0x00007f60f90fdfd0 sp 0x00007f60f90fdfb0 [Stub] ReturnAsyncStar
  | pc 0x00007f60f50144e0 fp 0x00007f60f90fe018 sp 0x00007f60f90fdfe0 [Optimized] _SuspendState@4048458._createAsyncStarCallback@4048458.
  | pc 0x00007f60f500b9fc fp 0x00007f60f90fe058 sp 0x00007f60f90fe028 [Optimized] [email protected]
  | pc 0x00007f60f917c778 fp 0x00007f60f90fe088 sp 0x00007f60f90fe068 [Optimized] [email protected]
  | pc 0x00007f60f9b644db fp 0x00007f60f90fe118 sp 0x00007f60f90fe098 [Optimized] _rootRun@4048458
  | pc 0x00007f60f91378bb fp 0x00007f60f90fe168 sp 0x00007f60f90fe128 [Optimized] _rootRun@4048458
  | pc 0x00007f60f50c4ffd fp 0x00007f60f90fe218 sp 0x00007f60f90fe178 [Optimized] [email protected]
  | pc 0x00007f60f914c677 fp 0x00007f60f90fe250 sp 0x00007f60f90fe228 [Optimized] [email protected].
  | pc 0x00007f60f9b6448b fp 0x00007f60f90fe2e0 sp 0x00007f60f90fe260 [Optimized] _rootRun@4048458
  | pc 0x00007f60f91378bb fp 0x00007f60f90fe330 sp 0x00007f60f90fe2f0 [Optimized] _rootRun@4048458
  | pc 0x00007f60f50c4ffd fp 0x00007f60f90fe3e0 sp 0x00007f60f90fe340 [Optimized] [email protected]
  | pc 0x00007f60f914c677 fp 0x00007f60f90fe418 sp 0x00007f60f90fe3f0 [Optimized] [email protected].
  | pc 0x00007f60f78889a8 fp 0x00007f60f90fe480 sp 0x00007f60f90fe428 [Optimized] _startMicrotaskLoop@4048458
  | pc 0x00007f60f91719f3 fp 0x00007f60f90fe4a8 sp 0x00007f60f90fe490 [Optimized] _startMicrotaskLoop@4048458
  | pc 0x00007f60f9173313 fp 0x00007f60f90fe4e8 sp 0x00007f60f90fe4b8 [Optimized] _RawReceivePort@1026248._handleMessage@1026248
  | pc 0x00007f612070300c fp 0x00007f60f90fe560 sp 0x00007f60f90fe4f8 [Stub] InvokeDartCode

cc @rafaelortizzableh

@vlidholt vlidholt added bug Something isn't working need info Need to look into this labels Mar 23, 2023
@bselwe
Copy link
Author

bselwe commented Apr 4, 2023

@vlidholt Have you had some time to look at this maybe? This issue is very critical for us as it prevents the server from scaling in any way, these "out of memory" crashes happen with the stable memory usage. Please let me know if there's anything I can do, I'd be happy to provide any more details about this issue.

@vlidholt
Copy link
Collaborator

vlidholt commented Apr 4, 2023

Hi @bselwe! I tried to reproduce this issue, but couldn't. Do you have a minimal test case? That would help immensely. I have a Pixorama server which has been running for months without a restart, and the memory profile on that looks very stable. There are serval thousands of connections made to the server.

@vlidholt
Copy link
Collaborator

vlidholt commented Apr 5, 2023

@bselwe a thought. Have you tried running the server with another value for --old_gen_heap_size=0? It don't think it makes Dart run the GC more aggressively, but it allows Dart to use as much memory it wants. The default limit is somewhat low, so you may want to experiment setting it slightly lower than the amount of memory you have available on your server instance.

@lukehutch
Copy link
Contributor

Possibly related?

dart-lang/sdk#44009

See especially this comment:

dart-lang/sdk#44009 (comment)

@SandPod SandPod moved this to Bug fixes 🪲 in Serverpod Roadmap Jun 13, 2024
@Isakdl
Copy link
Collaborator

Isakdl commented Nov 12, 2024

I did some investigation on this issue, there seem to be some weird interplay between docker and dart with websockets. I'm not able to reproduce this outside of a Docker container.

dart-lang/sdk#27414 (comment)

@Isakdl
Copy link
Collaborator

Isakdl commented Nov 29, 2024

This problem has now been resolved in the dart SDK, meaning next dart version should resolve this bug. Closing this issue as resolved with that fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working need info Need to look into this
Projects
Archived in project
Development

No branches or pull requests

4 participants