-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace AST Vmalloc with standard malloc #396
Comments
I just found core dumps with |
You should be able to use system malloc() functions by setting |
Maybe, but that doesn't really address the issue. We need to eliminate the dependency on the AST Vmalloc code. No one is going to actively maintain that subsystem. It is a potential source of bugs. It is unlikely to be optimized for new architectures like ARM or PPC. It may not interact well with tools like Valgrind and Asan. It is a complicated interface that no one outside of the AST authors is familiar with. The fact that some of its APIs which mirror the UNIX malloc APIs also zeroes the allocated memory while the UNIX malloc version does not is a source of confusion and bugs. Vmalloc has to be eliminated as a ksh dependency. I modified the Some piece of code is incorrectly modifying a buffer that appears adjacent to a |
In theory, yes. But that results in this compilation failure (on macOS):
And despite many hours of trying to figure out why replacing vmalloc with standard malloc causes the |
Jeebus H. Christ! Investigating this problem keeps revealing more bogosities. Like the fact that the macro Also, Too, given that the macro
Note the |
Why should anybody be interested? You are acting like a one-man wrecking crew.. You are doing a great job in assassinating the software legacy of David Korn! |
There's a use after free bug in the code. I added a lot of debug Namval_t *nq = shp->last_table, *mp = (Namval_t *)np->nvenv; Even when Running the test on macOS with env var P.S., David Korn, while I respect him as much as I do Edsger W. Dijkstra or Donald Knuth, is not a god. He did not write all of the AST code. Like all software engineers he sometimes makes mistakes. Removing dependencies on things like the AST Vmalloc subsystem is not "assassinating the software legacy of David Korn". Please have some perspective. If you want to use and maintain AST Vmalloc you're welcome to do so in the |
If I stub out the
And shortly thereafter the address is used in if (!nv_isattr(np, NV_MINIMAL | NV_EXPORT) && np->nvenv) {
Namval_t *nq = shp->last_table, *mp = (Namval_t *)np->nvenv; Because the structure pointed to by address 0x7fe5f4c146f0 wasn't freed due to my stubbing out |
More info: I augmented the Whether the bug is in |
@fpmurphy, Are you interested in helping fix the "use after free()" bug I've found that happens to be masked by the behavior of the AST Vmalloc subsystem? If not is your recommendation to just continue using Vmalloc which masks the bug so that it manifests less often which makes it even harder to debug? |
I removed the "bug" label because the bug has nothing to do with removing the |
@jelmd and @fpmurphy: Do you think we should retain the dependency on AST Vmalloc? If so, why? If we should retain that dependency are you, or your colleagues, willing to add the equivalent of the GNU |
My two cents on the continued use of VMALLOC. Multithreading can be a high no-joke environment. Either core dumps or hangs (on locks) happen in a flash if the correct coding procedures are not followed. As far as I know, the main KSH thread itself performs |fork(2)|s without always following immediately on w/ an |execxxx(2)|. It seems to do this sort of thing when creating sub-shells (or other reasons also). A potential conflict can arise if a background thread touches any of the MALLOC interface subroutines while the main KSH executes its |fork2)|. This is OK if (and it is a big IF) the correct fork-safety procedures are followed in any library facilities (including any MALLOC subsystem) that guard themselves for multithread-safety by using any lock mechanisms (usually a mutex of some sort). Also, some system (LIBC or other) calls that call |fork(2)| may be made (due to legacy library use) from any thread in the process and then its child might go on to touch a MALLOC interface subroutine (because it has always done so in the past) assuming already that the MALLOC subsystem was both multithread-safe and fork-safe. This may be due to legacy system library code that had assumed that its own system MALLOC facility was in place (or at least that some multithread-safe and fork-safe MALLOC facility had replaced it). In general, it was not a good idea to replace a multithread-safe and fork-safe system library MALLOC facility with one that was neither. The MALLOC facility is just too central to the whole of user-space code for anyone to play around with it lightly (making it either not multithread-safe or not fork-safe, by accident or otherwise). The short of it is that either VMALOC should be made both multithread-safe and fork-safe if it is to be continued OR any applications with multithreaded built-ins or background threads will have to continue to compile out VMALLOC in the future just as we have had to do so in the past. In my opinion, KSH should be changed to have a multithread-safe and fork-safe MALLOC facility by default. In this way both single-threaded and multi-threaded built-ins, and background threads, can be used without having to recompile KSH itself (getting rid of VMALLOC) for the multithreaded cases. Making VMALLOC both multithread-safe and fork-safe can certainly be done, but I am just much more comfortable with using the default system MALLOC facility which has already been acid tested in heavy multithreaded and fork environments. In related matters, is VMALLOC really faster, or so much more faster, than the latest default system MALLOC facilities? Default UNIX system MALLOC facilities have generally been carefully tuned to provide the best trade-offs in both speed and memory usage (or waste) for the most general user work-load cases. Of course, "faster" does not help anybody when your process group (collection of cooperating processes) either core dumps or hangs! Enough said (for now). |
@DavidMorano, It is very unlikely that AST vmalloc is faster or otherwise better than the native malloc implementation. As you note the malloc subsystem provided by most distros is already highly optimized. Also, because it will be a dependency of 99.9% of the programs on a given distro it is a safe bet it is bug free. And if a problem is found a huge number of people will pounce on the problem and it will be fixed quickly. Which is not going to be true for AST vmalloc. It is also unlikely that AST vmalloc will be optimal on a new architecture like the open source RISC-V that was created seven years ago. In fact, there is a good chance that due to the age of the AST vmalloc code it will actually behave incorrectly, or at least sub-optimally, on new CPU architectures like RISC-V. Too, So I ask again: @fpmurphy and @jelmd, Can you make even a marginally convincing argument that AST vmalloc should remain a |
@krader1961 wrote:
Yes, I very much agree with this. KSH should be fixed to work correctly with any MALLOC subsystem, and not just with AST VMALLOC. In the interest of full disclosure, since I am not using the AST VMALLOC subsystem in my KSH (having compiled it out for the native MALLOC subsystem instead), I have an ulterior motive for KSH to work correctly without using the AST VMALLOC. But regardless, I would hope that everyone has an interest in seeing all bugs removed from KSH whether or not the bugs are readily manifesting bad problems at the moment with the present use of VMALLOC. It could be that some future order of heap allocations could cause existing bugs to manifest worse behavior: core dumps or just subtly incorrect behavior. |
@DavidMorano, Do you run the ksh unit tests in your build that uses the platform malloc rather than AST vmalloc? I ask because my mechanical transformation of the code (rather than relying on things like |
It has been some years now, so I may not remember correctly, but:
|
Gah! Building with
on macOS results in building the
Which suggests that trying to build the project with standard (i.e., platform) malloc is broken on anything other than SysV (i.e., ATT) or Linux platforms. It definitely means that the existing |
Yes, I built on a SysV (AT&T) derived system. I do not really remember now, but I may have hacked up more than a single #if somewhere in order to get VMALLOC out of the build, obsoleting -D_AST_std_malloc=1. It was about maybe 15 years or more since I first hacked out VMALLOC, so I do not remember what I even did at the time (and had carried forward since). |
FWIW, I was able to build |
In light of the recent work by @siteshwar to fix the use-after-free bugs I resurrected my commit from February to replace Vmalloc with the system malloc. Imagine my surprise when the
The second I doubt this is the only example of a unit test that is treating broken behavior as correct behavior. |
Regarding my previous comment about the topen.c unit test there is another bug. A use-after-free bug. That's because the |
And, not surprisingly, there are other tests, like src/cmd/tests/sfio/ttmpfile.c, which have the same mistake. |
All the tests are now passing on Linux except one: |
You may have noticed that while
That is from this block in src/cmd/ksh93/tests/signal/sigtst1:
Interestingly with this change from Vmalloc to the system malloc those errors disappear. In their place the |
I have no doubt this change will reveal new bugs. The latest Coverity scan now identifies several previously undiagnosed memory leaks because we're no longer using the AST Vmalloc API (which Coverity doesn't know is a replacement for stdlib malloc). On the other hand this will make debugging easier and eliminate one source of leaks. That source being a dynamically allocated buffer returned by a libc function. Prior to this change the |
There is no good reason that
ksh
and related commands should depend on the AST Vmalloc subsystem. In theory just doing-D_std_malloc
when building should use the standard malloc subsystem. But that obviously hasn't been tested in several years since it doesn't build.I did all the mechanical transformations necessary to use standard malloc rather than vmalloc. Only one unit test fails with that change which was not failing before:
sh_match
(all three variants). The failure only happens sporadically. The failure is a SIGSEGV in this statement from line 3177 in src/cmd/ksh93/sh/name.c:Obviously that should never result in a SIGSEGV unless the data structure has been corrupted. Capturing a core file shows that
mp->nvname
is sometimes an invalid address such as 0x8. In fact, every core file I've examined has that pointer set to 0x8 (which is the ASCII BS character). Without a doubt there is an off by one bug somewhere that is causing memory to be corrupted. Something likely to be papered over by the Vmalloc subsystem if it pads the blocks differently than the native malloc subsystem or otherwise alters how dynamically allocated blocks are arranged.The text was updated successfully, but these errors were encountered: