Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak when handling WebSocket at server side #27414

Closed
jimmyshiau opened this issue Sep 22, 2016 · 25 comments
Closed

Memory Leak when handling WebSocket at server side #27414

jimmyshiau opened this issue Sep 22, 2016 · 25 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-io type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@jimmyshiau
Copy link

jimmyshiau commented Sep 22, 2016

Simple test case

import "dart:io";
import "dart:async";

void main() {
  HttpServer
  .bind(InternetAddress.ANY_IP_V4, 8081)
  .then((server) {
    server.autoCompress = true;
    server.listen((HttpRequest request) async {
      if (request.uri.path == "/q") {
        await socket(request);
      } else {
        request
        ..response.headers.contentType = ContentType.HTML
        ..response.write('<html><body>Hello, world!'
          '<script type="text/javascript">'
          'var ws = new WebSocket("ws://"+window.location.host+"/q");'
          'ws.onopen = function() {console.log("open");};'
          '</script>'
          '</body></html>');
      }
      request.response.close();
    });
  });
}

Future socket(HttpRequest request) async {
  return WebSocketTransformer.upgrade(request)
  .then(handleWebSocket);
}


void handleWebSocket(WebSocket socket){
  print('handleWebSocket');
  socket.listen(_onSocketData, onDone: _onSocketDone);
}

void _onSocketData(data) {print(data);}
void _onSocketDone() {print('_onSocketDone');}

Server RES

66080
133060
147224
193972
193932
224692
245348
256536
278660
259632
261072
281472
261316
263444
266912
266896
287424
287484
288996
304852
345812
347388
358788
screen shot 2016-09-22 at 11 49 33 am

At each run, I opened 100 browser tabs and then close them all. Also, before checking memory use, I ran GC first.

@jimmyshiau
Copy link
Author

Dart Observatory

Uint8List Instances
23
1282
1798

screen shot 2016-09-22 at 10 58 22 am
screen shot 2016-09-22 at 11 07 21 am
screen shot 2016-09-22 at 11 39 15 am

@kasperl kasperl added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Sep 22, 2016
@johnmccutchan
Copy link
Contributor

@zanderso

@zanderso
Copy link
Member

@jimmyshiau Could you say what version of the VM and what OS you're running on? Thanks!

@tomyeh
Copy link

tomyeh commented Sep 22, 2016

Dart 1.19.1 and Linux (Jimmy is out of the office)

@zanderso
Copy link
Member

zanderso commented Sep 23, 2016

I've repro'd on 1.19.1, but this doesn't repro at top of tree. It also doesn't repro in 1.19.1 with tcmalloc patched in. After ~10 runs RSS is reasonably stable at about 180MB, whereas before it would grow ~10-15MB per run as in the report.

@jimmyshiau @tomyeh Can you try on your end with a recent 1.20 dev? E.g.: https://storage.googleapis.com/dart-archive/channels/dev/release/1.20.0-dev.7.0/sdk/dartsdk-linux-x64-release.zip. Thanks!

@jimmyshiau
Copy link
Author

jimmyshiau commented Sep 26, 2016

Hi @zanderso

1.20.0-dev.7.0 is better, but it still grew up about 1MB per run: 100 > 101 > 102

Dart VM version: 1.20.0-dev.7.0 (Tue Sep 20 14:27:56 2016) on "linux_x64"

http://screencast.com/t/RHy2EGpO
http://screencast.com/t/BliLRjpmA

RES

67652
71780
86164
95600
80064
93920
95140
95596
96196
96712
97516
97748
98556
99776
100296
101068
102068
102640
103416
103660
104436
105220
screen shot 2016-09-26 at 10 57 45 am

Dart Observatory

Uint8List Instances
853 > 1859
Stackmap Instances
455 > 1093

screen shot 2016-09-26 at 10 27 13 am
screen shot 2016-09-26 at 10 45 34 am

@tomyeh
Copy link

tomyeh commented Oct 11, 2016

Hi, any update? Can you reproduce it? Or, we mis-interpreted something here?

@zanderso
Copy link
Member

When I tried at top-of-tree a couple of weeks ago, RSS stabilized eventually, but I can take another look. If there is any script you are using to drive the browser, that may help me carry out the exact same steps that you are following.

@jimmyshiau
Copy link
Author

You can save your page link as a bookmark, then put it in a bookmark folder and make 100 copies

screen shot 2016-10-12 at 2 38 23 pm

right click on the folder and select open all bookmarks
screen shot 2016-10-12 at 2 37 29 pm

http://screencast.com/t/RHy2EGpO

@zanderso zanderso added the type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) label Nov 6, 2016
@tomyeh
Copy link

tomyeh commented Dec 30, 2016

Some background information:
In our production environment, the memory usage of one VM grows from 200M to over 1GB within a week (on Dart 1.20/1.21 -- much better than 1.19). And, we can observe a similar pattern in our test environment. If we turned off WebSocket and used periodic Ajax polling instead (in our test environ), the problem was gone (to some extent). It is why this tiny program was created.

Please tell us know if anything we can help to clarify the issue. Thanks.

@mraleph
Copy link
Member

mraleph commented Apr 24, 2018

I am assuming this issue is stale. Please reopen if more work need to happen in this area.

@mraleph mraleph closed this as completed Apr 24, 2018
@mraleph mraleph added the closed-obsolete Closed as the reported issue is no longer relevant label Apr 24, 2018
@tomyeh
Copy link

tomyeh commented Apr 25, 2018

No, the problem is still there. We're forced to restart the server if exceeding a limit. Please look at it. Thanks.

@mraleph mraleph removed the closed-obsolete Closed as the reported issue is no longer relevant label Apr 25, 2018
@mraleph
Copy link
Member

mraleph commented Apr 25, 2018

@zanderso is this something you might have free cycles to look into?

@tomyeh is the program from the first comment representative of the memory leak you are seeing?

@mraleph mraleph reopened this Apr 25, 2018
@tomyeh
Copy link

tomyeh commented Apr 25, 2018

The program in the first comment is what we tried to reproduce with a simple case. The real program is of course much more complicated. Here is what we have done. Hope it helps.

  1. We found memory leak in the production environment
  2. We reproduced it with a stress test locally
  3. Once tracking down, we suspected it was caused by WebSocket, so we ran the stress test again by turning off WebSocket (i.e., we use Ajax to pull periodically). There was no memory leak found in this test.
  4. So, we created a very simple case as shown in the first comment to reproduce the issue. And, it did.

@tomyeh
Copy link

tomyeh commented Jun 20, 2018

I made another simplified test code as follows. From server's log, you'll find RSS keeps growing. It seems the more concurrent connections, the faster memory use grows.

//server
import "dart:io";

void main() {
  HttpServer
  .bind(InternetAddress.ANY_IP_V4, 8081)
  .then((server) {
    server.listen((request) {
      int count = 0;
      WebSocketTransformer.upgrade(request)
      .then((socket){
        socket.listen((data) {
          count += data.length;
        }, onDone: () {print('Done: $count; rss=${ProcessInfo.currentRss~/(1024*1024)}');});
        return socket.done;
      });
    });
  });
}

and

//client
import "dart:io";
import "dart:async";

main() async {
  for (int i = 0; i++ < 10;) {
    final ops = <Future>[];
    for (int j = 0; j++ < 100;) {
      ops.add(WebSocket.connect("ws://localhost:8081")
        .then((socket) {
          for (int k = 0; k++ < 100;)
            socket.add("data: $i,$j,$k".codeUnits);
          socket.close();
        }));
    }
    await Future.wait(ops);
   }
}

@tomyeh
Copy link

tomyeh commented Jul 20, 2018

I found a workaround: cancel subscription when the WebSocket connection is closing.

var subpt
subpt = socket.listen(..., onDone: subpt.cancel);

Not sure if memory consumption is stabilised (I just watched it a day), but at least the growth of memory consumption is reduced from 900M to 400M after online 20 hours.

It seems be cyclic reference issue. Hope this information helps.

jumperchen added a commit to rikulo/socket.io-dart that referenced this issue Jul 20, 2018
@sgehrman
Copy link

sgehrman commented Jun 4, 2020

I'm on the latest flutter beta and latest dart.

Flutter (Channel beta, 1.18.0-11.1.pre, on Linux, locale en_US.UTF-8)
• Flutter version 1.18.0-11.1.pre at /home/steve/google/flutter
• Framework revision 2738a1148b (3 weeks ago), 2020-05-13 15:24:36 -0700
• Engine revision ef9215ceb2
• Dart version 2.9.0 (build 2.9.0-8.2.beta)

I'm seeing a massive memory leak using websockets. I'm sending a 1gb file to my shelf server and I dart observatory shows 3gb of _Uint8Lists leaking. If I close the server, a ton of ram is released, but the server runs all the time, I can't close it everytime someone uploads a file. What can I do?

My server runs in it's own isolate if that matters. The server is very simple. It receives the UInt8List data and writes it to a file.

@tomyeh
Copy link

tomyeh commented Jun 4, 2020

@sgehrman Did you try Dart 2.8 at server side? Based on our experiences, the memory usage is much better than older versions. Our server can run days under heavy load without significant memory surge. However, we don't use Dart 2.9 in production system yet, so no sure if any difference.

@zanderso zanderso removed their assignment Jun 4, 2020
@zanderso
Copy link
Member

zanderso commented Jun 4, 2020

@sgehrman can you share a small code example for the situation you describe that causes that much memory to be retained. (for example like the one in #27414 (comment))

@zichangg @a-siva

@sgehrman
Copy link

sgehrman commented Jun 4, 2020

@tomyeh The server is running in my flutter mobile app. I have a feature where my app hosts a web page locally and uses can view and drag and drop files from their browser on the desktop. The files are send with websockets sendBlob. So, I don't know how easy it is to downgrade dart. I read Dart version is tied to the Flutter SDK?

@zanderso I'm using all these packages like shelf, shelf_web_socket, web_socket_channel. It's not super easy to make a small code example. The code is very simplistic.

The browser side:

final html.WebSocket webSocket =
       html.WebSocket('ws://${hostUri.host}:8764');
  
    // after socket opens
   _transferSocket.sendBlob(File(path));

The server side:

import 'package:shelf/shelf_io.dart' as io;
import 'package:web_socket_channel/web_socket_channel.dart';

Future<void> _serve() async {
    final wsh = webSocketHandler(_webSocketHandler);

    final cascade = shelf.Cascade().add(wsh);

    _server = await io.serve(cascade.handler, InternetAddress.anyIPv4, port);
  }

  void _webSocketHandler(WebSocketChannel webSocket) {
    webSocket.stream.listen((dynamic message) {
      if (message is Uint8List) {
       File(path).writeAsBytesSync(message);
    }
}

If I posted an apk would that be enough to debug it on your end?

I could also add you as a github collaborator if you wanted to get the code and run the app. The code is private. Hoping to release it in a few weeks.

@zichangg
Copy link
Contributor

zichangg commented Jun 8, 2020

@sgehrman Hi, I'd like to help!
Unfortunately, I don't have a mobile device to test with my WFH environment. What platform your app is running? And can you also try not to write into a file? It is possible that file.writeAsBytesSync() is leaking.

If you can also try your code with Http? Package_http is very easy to use. I just wanna make sure the websocket is indeed the culprit.

@Isakdl
Copy link

Isakdl commented Nov 12, 2024

I have similar problems as reported here, after extensive testing, I found no issues running locally while monitoring with Dart DevTools. However, in a Docker container, memory issues arise if the client does not explicitly call socket.close() before disconnecting. In such cases, memory usage on the server increases to about 50% of the allocated 1 GB. Performance degrades with the server handling requests very slowly, ultimately leading to an out-of-memory (OOM) kill by Docker, despite reported memory usage remaining at ~50%.

Note that it also seems like the docker container keeps the reported memory usage relatively constant after new allocations. E.g. I'm not seeing big reductions in memory usage. But when running locally both in JIT and AOT mode I observe that the process gets a big reduction of memory usage now and then.

Another peculiar fact is that the size of the data sent on the socket also seems to impact how fast I can reach the problems.

Dart version: 3.5

I used the code by @sgehrman for my testing, and the official docker example (https://hub.docker.com/_/dart):
Server code

import "dart:async";
import "dart:io";

void main() {
  HttpServer.bind(InternetAddress.anyIPv4, 8081).then((server) {
    server.listen((request) {
      int count = 0;
      late StreamSubscription subscription;
      WebSocketTransformer.upgrade(request).then((socket) {
        socket.pingInterval = const Duration(seconds: 1);
        subscription = socket.listen(
          (dynamic data) {
            count += (data as List).length;
          },
          onDone: () async {
            print(
              'Done: $count; rss=${ProcessInfo.currentRss ~/ (1024 * 1024)}',
            );
            await subscription.cancel(); //This solution does not help
          },
        );
        return socket.done;
      });
    });
  });
}

Client code

import "dart:io";
import "dart:async";

final String data = '<a long string>';

main() async {
  for (int i = 0; i++ < 100;) {
    final ops = <Future>[];
    for (int j = 0; j++ < 100;) {
      ops.add(WebSocket.connect("ws://localhost:8081").then((socket) {
        for (int k = 0; k++ < 100;) {
          socket.add(data.codeUnits);
        }
        //socket.close(); // If this call is not done we get memory problems
      }));
    }
    print('awaiting ops');
    await Future.wait(ops);
  }

  exit(0);
}

Dockerfile

FROM dart:stable AS build

WORKDIR /app
COPY pubspec.* ./
RUN dart pub get

COPY . .
RUN dart pub get --offline
RUN dart compile exe lib/websocket_server.dart -o bin/server

FROM scratch
COPY --from=build /runtime/ /
COPY --from=build /app/bin/server /app/bin/

EXPOSE 8081
CMD ["/app/bin/server"]

@mraleph
Copy link
Member

mraleph commented Nov 18, 2024

I took a look at this today using the code provided by @Isakdl above. It clearly was leaking for me inside and outside of the Docker container: what I saw is _NativeSocket objects leaking.

This seems to be happening because something is wrong with shutdown sequence: we shutdown read direction on the socket first, and then we shutdown write direction - but then we don't actually recognize that both directions are now shutdown and we need to close and destroy the socket. And the socket is left lingering in the zombie state.

The code in shutdownWrite says:

   void shutdownWrite() {
    if (!isClosing && !isClosed) {
      if (closedReadEventSent) {
        close();
      } else {
        sendToEventHandler(1 << shutdownWriteCommand);
      }
      isClosedWrite = true;
    }
  }

We arrive here with isClosedRead but !closedReadEventSent - which I think happens because we have also disabled read events by that time. Because readEventsEnabled is false we will never set closedReadEventSent to true and as a result we will never fully close the socket in this case. This seems wrong.

UPDATE: My initial analysis was incorrect here. It is by design that shutting down read direction does not simply discard the data and instead expects the reader to drain the socket before dispatching closedRead event. The actual problem was that the socket had no outstanding incoming data when its read direction is closed. This means that there are we are not going to schedule any read events to be delivered. The fix is to issueReadEvent when we close read direction.

@mraleph
Copy link
Member

mraleph commented Nov 20, 2024

This CL should fix the leak of _NativeSocket objects. My previous analysis was slightly incorrect, I have amended the comment above for posterity.

While the leak seems to be fixed with my CL, I still observe somewhat suboptimal behavior from the GC - we don't seem to trigger mark sweep GC often enough which means we keep accumulating some external memory (IO buffers?) which causes RSS to grow.

@rmacnak-google could you take a look if we could tweak our GC heuristics here? You would need to apply my CL to fix the leak and then just run the code from the comment above (no Docker is needed to reproduce this).

copybara-service bot pushed a commit that referenced this issue Nov 27, 2024
 - Don't treat heap_growth_max as a lower bound when using the ratio heuristic. For small heaps this can cause us to exceed the working set size by more than 10x.
 - Stick with the ratio heuristic when we're staying under the desired time fraction.
 - Avoid tiny initial thresholds by growing at least by one new-space.

TEST=golem
Bug: #27414
Change-Id: I64b2994ea9a32ce8a8bbec15cd89d8b46a01b1bb
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/397460
Reviewed-by: Slava Egorov <[email protected]>
Commit-Queue: Ryan Macnak <[email protected]>
@Isakdl
Copy link

Isakdl commented Nov 29, 2024

@mraleph Good job! Thank you so much for this! 😀 👍 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-io type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

9 participants