Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cypress 10.x.x hangs under Linux + Docker using cypress/included:10.2.0 #22506

Closed
robrich7 opened this issue Jun 24, 2022 · 21 comments
Closed

Cypress 10.x.x hangs under Linux + Docker using cypress/included:10.2.0 #22506

robrich7 opened this issue Jun 24, 2022 · 21 comments
Labels
E2E Issue related to end-to-end testing prevent-stale mark an issue so it is ignored by stale[bot] Triaged Issue has been routed to backlog. This is not a commitment to have it prioritized by the team. type: performance 🏃‍♀️ Performance related

Comments

@robrich7
Copy link

robrich7 commented Jun 24, 2022

Current behavior

Hello all,

we have the problem that Cypress local runs without problems. All tests are executed and the test run includes all test specs.
If we run the same code in a pipeline with the Docker image and Pipeline integration , it doesn't work anymore.
The problem is that Cypress doesn't do anything anymore except logging the CPU usage even though there are still test specs and tests to run.
Sometimes this happens during the first test spec, sometimes after another one. It is always different.
Via DEBUG=cypress:* npx cypress run I logged everything and there is no error in the logs when Cypress stops working and only logs CPU usage until the pipeline times out.
This looks like this in the logs:

cypress:server:util:process_profiler current & mean memory and CPU usage by process group:
  cypress:server:util:process_profiler ┌─────────┬───────────────────┬──────────────┬────────────────────┬────────────┬────────────────┬──────────┬──────────────┬─────────────┐
  cypress:server:util:process_profiler │ (index) │       group       │ processCount │        pids        │ cpuPercent │ meanCpuPercent │ memRssMb │ meanMemRssMb │ maxMemRssMb │
  cypress:server:util:process_profiler ├─────────┼───────────────────┼──────────────┼────────────────────┼────────────┼────────────────┼──────────┼──────────────┼─────────────┤
  cypress:server:util:process_profiler │    0    │    'Electron'     │      1       │       '256'        │   19.08    │     15.89      │  373.65  │    278.85    │   373.65    │
  cypress:server:util:process_profiler │    1    │ 'electron-shared' │      4       │ '36, 192, 37, 213' │   16.49    │     13.73      │  254.15  │    245.68    │   260.08    │
  cypress:server:util:process_profiler │    2    │     'cypress'     │      1       │        '30'        │   55.27    │     47.89      │  248.22  │    243.28    │   255.86    │
  cypress:server:util:process_profiler │    3    │     'plugin'      │      1       │       '238'        │    0.05    │      2.53      │  146.84  │    171.95    │    241.1    │
  cypress:server:util:process_profiler │    4    │     'ffmpeg'      │      1       │       '251'        │    1.25    │      1.2       │  76.43   │    72.46     │    76.43    │
  cypress:server:util:process_profiler │    5    │      'other'      │      2       │    '999, 1000'     │     0      │      0.03      │   3.4    │     3.39     │    3.45     │
  cypress:server:util:process_profiler │    6    │      'TOTAL'      │      10      │        '-'         │   92.14    │     80.68      │  1102.7  │    999.74    │   1106.26   │
  cypress:server:util:process_profiler └─────────┴───────────────────┴──────────────┴────────────────────┴────────────┴────────────────┴──────────┴──────────────┴─────────────┘ +10s

I can't tell for sure since the logs don't indicate anything but it seems like the test runner is crashing or Cypress is losing connection to it, or it is a memory leak, but unfortunately it is impossible to tell from the missing log entries. It is very hard to trace but happens with every test run. We have had this problem for over a year now....

I can't tell for sure since the logs don't indicate anything but it seems like the test runner is crashing, the Chrome crahed (Aw, snap!) or Cypress is losing connection to it. It is very hard to trace but happens with every test run. We have had this problem for over a year now....

If you run the Docker image locally, without pipeline, then it works as well. So it must be the interaction between pipeline and Docker image.

Currently we are using Cypress 10.2.0 with Chrome 100, but we had the problem also with Cypress 8.3.0
This happens also in Electron Browser. i've tried "video": false, "numTestsKeptInMemory": 1 and 0, and our index.js in cypress/e2e/plugins looks like this:

module.exports = (on, config) => {
  on("before:browser:launch", (browser = {}, launchOptions) => {
    if (browser.name === "chrome") {
      launchOptions.args.push("--disable-gpu");
      launchOptions.args = launchOptions.args.filter(
        (Element) => Element !== "--disable-dev-shm-usage"
      );
    }
    return launchOptions;
  });
};

Desired behavior

Cypress no longer crashes during the pipeline run or spits out proper logs, after which you can see what the problem is.

Test code to reproduce

Since it fails on a different spec each time, it doesn't seem to be related to actual test code

Cypress Version

10.x.x

Other

Cypress seems to be very resource-heavy. Our local computers run with 8 CPUs and the pipeline with 6 CPUs. in my opinion it cannot be due to our hardware resources

@robrich7
Copy link
Author

robrich7 commented Jun 24, 2022

@jennifer-shehane In my opinion, this is a really serious bug. I have picked out tickets that describe the same or a similar problem: #18885
#19617

@cypress-bot cypress-bot bot added the stage: investigating Someone from Cypress is looking into this label Jun 24, 2022
@emilyrohrbough
Copy link
Member

@pontilicious Thank you for logging an issue. To confirm this hanging is indeed related to the browser crashing or hanging, can you run DEBUG=cypress* to capture the debug & verbose debug logs and share them here?

I was able to produce the hang when Chrome crashes, but want to double-check this truly a Chrome issue.

@robrich7
Copy link
Author

robrich7 commented Jul 1, 2022

@emilyrohrbough I have an older log file here. Since the problem seems to occur since Cypress 4.2, it should still be a reference.
you said it was a problem in Chrome. I suspect it affects all Chromium browsers because we also have the problem in the Electron browser.
log.txt

@Light07
Copy link

Light07 commented Aug 17, 2022

we encountered exactly same issue as you, it lasts for several builds, now it got worse and the logs doesn't help at all. Is there anyone could help on this issue?

@robrich7
Copy link
Author

we encountered exactly same issue as you, it lasts for several builds, now it got worse and the logs doesn't help at all. Is there anyone could help on this issue?

#22631

@mschile mschile added triage and removed triage labels Aug 18, 2022
@Light07
Copy link

Light07 commented Aug 22, 2022

Thanks @robrich7 ! , I read this post, yet seems there is nothing we could do except wait for cypress official team's fix.

@robrich7
Copy link
Author

Thanks @robrich7 ! , I read this post, yet seems there is nothing we could do except wait for cypress official team's fix.

unfortunately you are right

@amcsi
Copy link

amcsi commented Sep 8, 2022

I'm also getting hanging of Cypress run in CI. When running locally, the browser crashes regardless of whether it's Chrome or Electron. The crash would happen around (but not exactly) the same part of the test.

Here's some of the output thanks to DEBUG=cypress*:

  cypress:server:socket-base socket-disconnecting transport close +2m
  cypress:server:socket-base socket-disconnect transport close +0ms
  cypress-verbose:server:browsers:cri-client:recv:[<--] received CDP message { method: 'Inspector.targetCrashed', params: {} } +296ms
  cypress:server:util:socket_allowed allowed socket closed, removing { localPort: 37912 } +5s

Node version: 18.8.0
Cypress version: 10.7.0
numTestsKeptInMemory=5 did not help at all.

One thing that may be relevant: this is one test that's being run that's very long and involves a lot of clicking around and navigating the site, loading a lot of data on the way. The test is supposed to take around 5-10 minutes to complete.
And I'm not getting the same issue with another test file that has a similar "set of tests", but are split into 10-20 tests rather than being 1 single test.

@abezzubets
Copy link

We also started getting this issue a couple of months ago.
Now it became even worse, every time some of the tests hang and almost every time different test.

CI: CircleCI
Docker Image: cimg/node:14.20-browsers
Node: 14.20
Cypress: 10.8.0
Chrome: 105

Errors from DEBUG=cypress:*

<--- JS stacktrace ---> +56s
  cypress:launcher:browsers chrome stderr: [0914/090841.552019:ERROR:v8_initializer.cc(730)] V8 process OOM (Oilpan: Reserving memory.). +1ms
  cypress:launcher:browsers chrome stderr: [0914/090841.664210:ERROR:file_io_posix.cc(144)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2)
[0914/090841.664381:ERROR:file_io_posix.cc(144)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2) +112ms
  cypress:launcher:browsers chrome stderr: [0914/090841.667434:ERROR:directory_reader_posix.cc(42)] opendir /tmp/Crashpad/attachments/6065a18c-d8b9-45e4-99b1-2d86220f0d9d: No such file or directory (2) +3ms
  cypress:server:util:socket_allowed allowed socket closed, removing { localPort: 60208 } +7s
  cypress:server:util:socket_allowed allowed socket closed, removing { localPort: 60226 } +203ms
  cypress:server:util:process_profiler current & mean memory and CPU usage by process group:
  cypress:server:util:process_profiler ┌─────────┬───────────────────┬──────────────┬───────────────────────────────────────────────────────┬────────────┬────────────────┬──────────┬──────────────┬─────────────┐
  cypress:server:util:process_profiler │ (index) │       group       │ processCount │                         pids                          │ cpuPercent │ meanCpuPercent │ memRssMb │ meanMemRssMb │ maxMemRssMb │
  cypress:server:util:process_profiler ├─────────┼───────────────────┼──────────────┼───────────────────────────────────────────────────────┼────────────┼────────────────┼──────────┼──────────────┼─────────────┤
  cypress:server:util:process_profiler │    0    │     'Chrome'      │      8       │ '1517, 1521, 1522, 1528, 1541, 1529 ... 2 more items' │    3.32    │      4.21      │ 4100.73  │   2397.43    │   4312.32   │
  cypress:server:util:process_profiler │    1    │     'cypress'     │      1       │                        '1093'                         │    0.17    │      0.45      │  172.5   │    239.27    │   297.34    │
  cypress:server:util:process_profiler │    2    │     'plugin'      │      1       │                        '1293'                         │     0      │      0.02      │  159.61  │    169.95    │   516.51    │
  cypress:server:util:process_profiler │    3    │     'ffmpeg'      │      1       │                        '1595'                         │    0.19    │      0.51      │  130.23  │    138.1     │   142.32    │
  cypress:server:util:process_profiler │    4    │ 'electron-shared' │      4       │               '1095, 1247, 1096, 1313'                │     0      │       0        │  79.53   │    192.96    │   198.76    │
  cypress:server:util:process_profiler │    5    │      'other'      │      2       │                     '3314, 3315'                      │     0      │       0        │   3.41   │     3.44     │    3.54     │
  cypress:server:util:process_profiler │    6    │      'TOTAL'      │      17      │                          '-'                          │    3.68    │      5.07      │   4646   │   3071.78    │   5047.8    │
  cypress:server:util:process_profiler └─────────┴───────────────────┴──────────────┴───────────────────────────────────────────────────────┴────────────┴────────────────┴──────────┴──────────────┴─────────────┘ +10s
  cypress:server:util:socket_allowed allowed socket closed, removing { localPort: 60116 } +1s
  cypress:server:util:socket_allowed allowed socket closed, removing { localPort: 47730 } +23ms
  cypress:server:util:socket_allowed allowed socket closed, removing { localPort: 47740 } +0ms
  cypress:server:socket-base socket-disconnecting transport close +2m
  cypress:server:socket-base socket-disconnect transport close +1ms
  cypress:server:util:socket_allowed allowed socket closed, removing { localPort: 48428 } +6ms

and then the log is full of recurring of the following message until CircleCI will fail job by no output timeout

cypress:server:util:process_profiler current & mean memory and CPU usage by process group:
  cypress:server:util:process_profiler ┌─────────┬───────────────────┬──────────────┬───────────────────────────────────────────────────────┬────────────┬────────────────┬──────────┬──────────────┬─────────────┐
  cypress:server:util:process_profiler │ (index) │       group       │ processCount │                         pids                          │ cpuPercent │ meanCpuPercent │ memRssMb │ meanMemRssMb │ maxMemRssMb │
  cypress:server:util:process_profiler ├─────────┼───────────────────┼──────────────┼───────────────────────────────────────────────────────┼────────────┼────────────────┼──────────┼──────────────┼─────────────┤
  cypress:server:util:process_profiler │    0    │     'Chrome'      │      7       │ '1517, 1521, 1522, 1528, 1541, 1529 ... 1 more items' │   94.06    │      6.51      │  171.32  │   2340.35    │   4312.32   │
  cypress:server:util:process_profiler │    1    │     'plugin'      │      1       │                        '1293'                         │     0      │      0.02      │  159.61  │    169.68    │   516.51    │
  cypress:server:util:process_profiler │    2    │     'cypress'     │      1       │                        '1093'                         │    0.14    │      0.44      │  151.41  │    237.07    │   297.34    │
  cypress:server:util:process_profiler │    3    │     'ffmpeg'      │      1       │                        '1595'                         │     0      │      0.5       │  130.23  │    137.9     │   142.32    │
  cypress:server:util:process_profiler │    4    │ 'electron-shared' │      4       │               '1095, 1247, 1096, 1313'                │     0      │       0        │  81.79   │    190.18    │   198.76    │
  cypress:server:util:process_profiler │    5    │      'other'      │      2       │                     '3351, 3352'                      │     0      │       0        │   3.4    │     3.44     │    3.54     │
  cypress:server:util:process_profiler │    6    │      'TOTAL'      │      16      │                          '-'                          │    94.2    │      7.3       │  697.75  │   3012.43    │   5047.8    │
  cypress:server:util:process_profiler └─────────┴───────────────────┴──────────────┴───────────────────────────────────────────────────────┴────────────┴────────────────┴──────────┴──────────────┴─────────────┘ +10s

@abezzubets
Copy link

@robrich7 Could you please try disabling the Command Log:
https://docs.cypress.io/guides/references/troubleshooting#Disable-the-Command-Log

It is helped me to solve the issue with hanging tests

@PeterDekkers
Copy link

On top of disabling the Command Log, disabling video recording has helped me to greatly reduce the frequency of hanging tests.

@robrich7
Copy link
Author

@robrich7 Could you please try disabling the Command Log: https://docs.cypress.io/guides/references/troubleshooting#Disable-the-Command-Log

It is helped me to solve the issue with hanging tests

Yes, this is a possibility to avoid the crashes, but in this case many of my tests fail, because certain values are no longer passed, which I need for my tests. So this is not a valid solution for me.

@jdimmerman
Copy link

I know that this isn't a solution for everyone, but disabling the command log fixed this issue for me. I'd love to understand how we can relieve the memory constraint another way in order to save these logs for troubleshooting later.

@todd-miller
Copy link

todd-miller commented Oct 4, 2022

On top of disabling the Command Log, disabling video recording has helped me to greatly reduce the frequency of hanging tests.

I had some luck doing this as well as turning off parallel running and manually splitting out my runs over different spec files. This helps when something does timeout the re-run failed works pretty well, where when running in parallel I don't have that ability it seems.

edit: I have also disabled the Command Log (did not solve the issue) and turned on Debug logging

@robrich7
Copy link
Author

robrich7 commented Nov 1, 2022

in the last update to 10.11.0 a bugfix was included:

image
#6170

this now shows when the browser crashes and then continues with the next test spec. This is a good temporary solution, but it crashes every time I run a test spec, which is just not satisfying.
It looks like this:
image

we use Tekton Pipeline with the following performance:

resources:
              requests:
                memory: 2000Mi
                cpu: 1500m
              limits:
                memory: 2500Mi
                cpu: 3500m

and the following code in the index.js:

module.exports = (on, config) => {
  on("before:browser:launch", (browser = {}, launchOptions) => {
    if (browser.name === "chrome") {
      launchOptions.args.push("--disable-gpu");
      launchOptions.args = launchOptions.args.filter(
        (item) => item !== "--disable-dev-shm-usage"
      );
    }
    return launchOptions;
  });
};

EDIT: I think the problem is with the Docker image for Cypress 10.11.0, which uses Chrome 105, which has the OOM problem (#23391). Maybe it will be better in the future when a Docker image with Chrome 107+ is available.

@robrich7
Copy link
Author

robrich7 commented Nov 2, 2022

tested it with the cypress/browsers:node16.18.0-chrome107-ff106-edge image and cypress 10.11.0. At the first Testspec the chrome browser crashed. it would have been too good to be true 😀

@robrich7
Copy link
Author

robrich7 commented Nov 8, 2022

i have found a solution that works for me.
I use a Docker image (Docker Images) with an installed Chrome under Chrome version 100. In my case I use the Chrome 99 with the image: cypress/browsers:node16.14.0-chrome99-ff97
For this I install Cypress 10.11.0 for the test run via pipeline. I have also tried Docker images with Chrome 100-103 but there again memory problems occurred and specs broke.
With this setup the tests run through and there are no more memory problems during the test runs.

@nagash77
Copy link
Contributor

possibly related to #23391 which is being actively worked on. We will double check to see if this issue is resolved when work on the linked ticket is complete.

@nagash77 nagash77 added the prevent-stale mark an issue so it is ignored by stale[bot] label Apr 3, 2023
@nagash77 nagash77 added E2E Issue related to end-to-end testing Triaged Issue has been routed to backlog. This is not a commitment to have it prioritized by the team. and removed routed-to-e2e labels Apr 19, 2023
@jennifer-shehane
Copy link
Member

Since this issue hasn't had activity in a while, we'll close the issue until we can confirm this is still happening. Please comment if there is new information to provide concerning the original issue and we'd be happy to reopen.

@sk25469
Copy link

sk25469 commented Feb 12, 2024

I am facing the same issue with cypress-included-9.7.0. I am running tests inside a docker container with chrome browser in headless mode, and it is going till a certain point and getting stuck after that. I tried disabling the COMMAND_LOGS but it didn't solve the issue. Are there are known resolutions for this?

@sk25469
Copy link

sk25469 commented Feb 13, 2024

I am facing the same issue with cypress-included-9.7.0. I am running tests inside a docker container with chrome browser in headless mode, and it is going till a certain point and getting stuck after that. I tried disabling the COMMAND_LOGS but it didn't solve the issue. Are there are known resolutions for this?

I changed my chrome version to < 100, and it is working now. So I think it is a proper solution for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E2E Issue related to end-to-end testing prevent-stale mark an issue so it is ignored by stale[bot] Triaged Issue has been routed to backlog. This is not a commitment to have it prioritized by the team. type: performance 🏃‍♀️ Performance related
Projects
None yet
Development

No branches or pull requests