Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix-copy-closure invoked by nixops runs out of memory (OOM) on low mem systems #38808

Closed
elitak opened this issue Apr 11, 2018 · 11 comments
Closed
Labels
2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md

Comments

@elitak
Copy link
Contributor

elitak commented Apr 11, 2018

Issue description

When I run nixops deploy, some of my smaller systems (1GB ram VPSes) fail because the nix-copy-closure step runs out of memory. I've monitored the process and it indeed does consume over a GB while working. Adding some swap lets me work around the issue.

I didn't have this problem ever, using the master branch a month or two ago. Why daes nix-copy-closure suddenly consume so much memory? Why does it do so without considering maybe backing off or having a setting to limit its allocations. Is this a simple unintended regression? I can't understand why it would even need this much; it should be buffering compressed streams to disk, then unpacking them. That takes almost no memory.

Steps to reproduce

  1. Have low mem system (1GB or less, with no swap)
  2. Do a big nix-copy-closure --use-substitutes onto it

Technical details

  • system: "x86_64-linux"
  • host os: Linux 4.14.28, NixOS, 18.09.git.50dad060420 (Jellyfish)
  • multi-user?: yes
  • sandbox: no
  • version: nix-env (Nix) 2.0
  • nixpkgs: (master)
@andrewchambers
Copy link
Contributor

andrewchambers commented Apr 12, 2018

NixOS/nix#1681 related?

@elitak
Copy link
Contributor Author

elitak commented Apr 12, 2018

If nix-copy-closure invokes nix copy or uses the same code, I suppose it could be a dupe. Again though, this only started happening a month or two ago on master, so that would indicate that something changed recently with nix-copy-closure in particular.

@lheckemann
Copy link
Member

This is probably caused by the upgrade to nix 2.0, and its increased memory consumption on path imports in general (not just those performed by nix copy).

@elitak
Copy link
Contributor Author

elitak commented Apr 16, 2018

Where can I find the discussion on how the huge overhead is warranted? Better yet, please point me towards a nix-daemon config option that limits the threads or memory during massive parallel imports.

@lheckemann
Copy link
Member

There is no good reason for this AFAIK. Comments on NixOS/nix#1681 reference a couple of commits that should fix this, they're just not in 2.0.

@nh2
Copy link
Contributor

nh2 commented Jun 2, 2018

I'm using this to boot my VMs initially into 17.09 instead, which has the old nix which doesn't have memory problems: NixOS/nix#1988 (comment)

That way I can use nixops again against AWS, at least for the initial deployment (my deployment then itself puts nix 2.0 on it which still has memory issues but at least that one you can patch easily).

Right now I'm still having problems with error: out of memory. I tried the nix on commit 54b1c59 which has a couple memory fixes but it still fails when I upload stuff to the target machine via SSH (in this case, with the libvirt nixops backend). So maybe only the HTTP substituter code was fixed to stream to disk so far.

nh2 added a commit to nh2/nix that referenced this issue Jun 3, 2018
Fixes `error: out of memory` of `nix-store --serve --write`
when receiving packages via SSH (and perhaps other sources).

See NixOS#1681 NixOS#1969 NixOS#1988 NixOS/nixpkgs#38808.

Performance improvement on `nix-store --import` of a 2.2 GB cudatoolkit closure:

When the store path already exists:
  Before:
    10.82user 2.66system 0:20.14elapsed 66%CPU (0avgtext+0avgdata   12556maxresident)k
  After:
    11.43user 2.94system 0:16.71elapsed 86%CPU (0avgtext+0avgdata 4204664maxresident)k
When the store path doesn't yet exist (after `nix-store --delete`):
  Before:
    11.15user 2.09system 0:13.26elapsed 99%CPU (0avgtext+0avgdata 4204732maxresident)k
  After:
     5.27user 1.48system 0:06.80elapsed 99%CPU (0avgtext+0avgdata   12032maxresident)k

The reduction is 4200 MB -> 12 MB RAM usage, and it also takes less time.
@nh2
Copy link
Contributor

nh2 commented Jun 3, 2018

Here's a nix PR that works for my use case: NixOS/nix#2206

With convenience to try the patch out: NixOS/nix#2206 (comment)

@nh2
Copy link
Contributor

nh2 commented Jun 27, 2019

I have backported @edolstra's memory fixes to Nix 2.0.4 (because I'm still using that in one place):

NixOS/nix@2.0.4...nh2:nh2-2.0.4-issue-1681-cherry-pick

Note this fixes the case where the machine that's running nixops runs out of memory.

@nh2
Copy link
Contributor

nh2 commented Jun 27, 2019

@elitak Do you still observe issues with nix 2.2?

With the above backport to 2.0.4 I highlight some of the commits that solved this issue for me (but that's concerning the RAM of the source machine I run nixops on, not of the target machines I deploy to; for those there are some other fixes in the history and the target machines must also be running nix 2.2).

Related: Asking whether people still observe problems in NixOS/nix#1681 (comment)

@stale
Copy link

stale bot commented Jun 2, 2020

Thank you for your contributions.

This has been automatically marked as stale because it has had no activity for 180 days.

If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.

Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse.
  3. Ask on the #nixos channel on irc.freenode.net.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 2, 2020
@peterhoeg
Copy link
Member

This is no longer the case with 20.03.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md
Projects
None yet
Development

No branches or pull requests

5 participants