Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With millions of files, process_lint() causes shell to segfault #236

Closed
Nemykal opened this issue Jun 9, 2017 · 8 comments
Closed

With millions of files, process_lint() causes shell to segfault #236

Nemykal opened this issue Jun 9, 2017 · 8 comments
Labels

Comments

@Nemykal
Copy link

Nemykal commented Jun 9, 2017

Hi,

I run rmlint on Arch and noticed it recently got upgraded to 2.6.0.

When I run rmlint -c sh:hardlink -g /storage/backups on a directory that has millions of files underneath it completes without error... but when I run the produced rmlint.sh script, it segfaults since the new version.

e.g. rmlint.sh[4460]: segfault at 7ffc011d4ff0 ip 00007f25cf2c942c sp 00007ffc011d4fe0 error 6 in libc-2.25.so[7f25cf24e000+19c000]

I got the same crash with the dry-run setting. I also tried using busybox sh and zsh - both got the same segfault.

This is a pretty weird one because when I looked into the coredump, bash dies during a call to malloc():

                #0  0x00007fcf842c942c _int_malloc (libc.so.6)
                #1  0x00007fcf842cafb8 malloc (libc.so.6)
                #2  0x000000000047224e xmalloc (bash)
                #3  0x000000000043ca91 copy_command (bash)
                #4  0x000000000043cbe8 copy_command (bash)
                #5  0x000000000043cbe8 copy_command (bash)
                #6  0x000000000043cbe8 copy_command (bash)
                #7  0x000000000043cbe8 copy_command (bash)
                #8  0x000000000043cbe8 copy_command (bash)
... (same line repeated until #63)

I tried this on an Alpine Linux system (i.e. not using glibc's malloc) and got the same crash, so I think it is probably some kind of posix limitation on stuff in a single function. To be fair, there's a lot:

$ wc -l rmlint.sh
14287545 rmlint.sh

It's definitely a regression because I tried reverting back to rmlint 2.4.6 and confirmed the issue doesn't happen. Moving the commands into a function appears to be the cause.

@SeeSpotRun
Copy link
Collaborator

SeeSpotRun commented Jun 9, 2017

Looks like a bash problem.

#!/bin/bash

do_something() {
    :
}

call_something_lots() {
    do_something
    do_something
    do_something
    # (repeat 31817 times)
}

call_something_lots

...that works ok but if I add one more call to do_something (ie 31818 calls) then it segfaults. If I change the shebang line to #!/bin/zsh then it works fine, even with a million calls.

So bash does something odd with stack allocation when calling too many functions from within another function. But only if they are discrete function calls; the following doesn't segfault:

 #!/bin/bash
 
 do_something() {
     :
 }
 
 call_something_n_times() {
     for i in $(seq 1 $1); do
         do_something
     done
 }
 
 call_something_n_times 1000000

Frustrating.

Edit: no limitation found with #!/bin/mksh or #!/bin/dash either.

@sahib
Copy link
Owner

sahib commented Jun 10, 2017

Wow, this bug report resides in my top 10 of weird stuff happening...

@SeeSpotRun: Told you not to use a function. 😂 (j/k)

There are two possible solutions here:

  • Revert back to not using a function (i.e. seek back in script to fill the progress info)
  • Always use dash since it's anyways the shell we expect. Some systems like arch do not seem to install it by default (/bin/sh links to bash), while some other systems might not even call dash (but sh).

I would prefer solution 1 since it's stupid but should work somewhat reliably.

Any thoughts on this?

@sahib sahib added the bug label Jun 10, 2017
@SeeSpotRun
Copy link
Collaborator

I don't particularly like either solution but think the first is preferable to the second.

Other options:

  • Abandon progress indicator in the shell script; implement progress indicator in python script
  • Write file count and end of script make the script parse itself to find total count for progress indicator
  • Write shell script at end of rmlint rather than concurrently with duplicate finding

@SeeSpotRun
Copy link
Collaborator

btw have filed this: http://savannah.gnu.org/support/?109328

Not sure if that's the right place...

@sahib
Copy link
Owner

sahib commented Jun 10, 2017

Other options:

Abandon progress indicator in the shell script; implement progress indicator in python script

The python script is not widely used sadly. Matter of defaults I guess...

Write file count and end of script make the script parse itself to find total count for progress indicator

That feels even hackier than seek-back-fix to me...

Write shell script at end of rmlint rather than concurrently with duplicate finding

This would mean to cache either all RmFiles or all shell script lines in the formatter.
Something I'd like to avoid for big runs...

btw have filed this: http://savannah.gnu.org/support/?109328
Not sure if that's the right place...

Cool! 👍 You might want to also include the info there you'll get from running bashbug. For me:

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' -DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib  -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin' -DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc' -DSYS_BASH_LOGOUT='/etc/bash.bash_logout' -DNON_INTERACTIVE_LOGIN_SHELLS -Wno-parentheses -Wno-format-security
uname output: Linux werkstatt 4.10.13-1-ARCH #1 SMP PREEMPT Thu Apr 27 12:15:09 CEST 2017 x86_64 GNU/Linux
Machine Type: x86_64-unknown-linux-gnu

Bash Version: 4.4
Patch Level: 12
Release Status: release

@sahib
Copy link
Owner

sahib commented Jun 10, 2017

This issue should be solved as of f546812.
A big run ran through for me on /usr but I'd be happy if someone else could verify.

@Nemykal
Copy link
Author

Nemykal commented Jun 11, 2017

Yep. Just tried the latest development branch - the script now runs properly using bash. I guess Busybox also needs a bug report for their shell implementation.

@sahib
Copy link
Owner

sahib commented Jun 11, 2017

That's good to hear. I'll add a note for myself to open a bug report on ash...
I think the original issue can be closed then. The fix will be in the 2.6.1 bugfix release soon to be released.
Please re-open if any problems persist.

@sahib sahib closed this as completed Jun 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants