Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use jemalloc's default prefix #18678

Closed
wants to merge 1 commit into from
Closed

use jemalloc's default prefix #18678

wants to merge 1 commit into from

Conversation

thestinger
Copy link
Contributor

By default, jemalloc only uses a prefix on OS X, iOS and Windows where
attempting to replace the platform allocator is broken. The lack of a
prefix allows jemalloc to override the weak symbols from the C standard
library to replace the system allocator.

This results in a performance improvement for memory allocation in C
along with reduced fragmentation. For example, the time spent on LLVM
passes in the Rust compiler on Linux is cut by 10% and peak memory usage
is reduced by 15%. Even on platforms like FreeBSD where jemalloc is the
system allocator, this eliminates the inevitable external fragmentation
and performance hit caused by using two general purpose allocators.

Closes #18676

By default, jemalloc only uses a prefix on OS X, iOS and Windows where
attempting to replace the platform allocator is broken. The lack of a
prefix allows jemalloc to override the weak symbols from the C standard
library to replace the system allocator.

This results in a performance improvement for memory allocation in C
along with reduced fragmentation. For example, the time spent on LLVM
passes in the Rust compiler on Linux is cut by 10% and peak memory usage
is reduced by 15%. Even on platforms like FreeBSD where jemalloc is the
system allocator, this eliminates the inevitable external fragmentation
and performance hit caused by using two general purpose allocators.

Closes #18676
@rust-highfive
Copy link
Collaborator

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

@brson
Copy link
Contributor

brson commented Nov 6, 2014

I'm a bit wary of this. Replacing malloc is pretty sneaky and I can imagine it causing unintended side effects (I feel like this has come up before - the first time we integrated jemalloc we did not use a custom prefix and had malloc/free mismatches). Like to get some other feedback.

@thestinger
Copy link
Contributor Author

Replacing malloc is pretty sneaky

It's explicitly supported by the C standard library implementations by going out of the way to use weak symbols for the platform allocator. It's the normal way to use jemalloc and tcmalloc and it isn't a hack or sneaky at all.

the first time we integrated jemalloc we did not use a custom prefix and had malloc/free mismatches

It defaults to a je_ prefix on Windows and OS X so the problems that were encountered there were being wrongly attributed to jemalloc. It wasn't being used on those platforms at all.

@thestinger
Copy link
Contributor Author

Rust is causing undefined behaviour by opting out of replacing the system allocator but not disabling jemalloc's support for dss. It's not well-defined to call brk / sbrk from outside of malloc.

@thestinger thestinger added I-slow Issue: Problems and improvements with respect to performance of generated code. A-libs labels Nov 6, 2014
@alexcrichton
Copy link
Member

I'm also somewhat wary of this, but it has been quite some time since the bugs we had last. (cc #9925 and #9933). A lot has changed about the runtime/compiler since then, particularly in the form of how we link all our libraries together. We're also not ourselves calling malloc or free any more, but rather some of the more flavorful jemalloc functions.

I'd be willing to give this a shot as it's pretty easy to undo, I'd just want to keep a close eye on the queue for the next few days. Changes like this often have quite surprising fallout. The wins seems significant enough that it may be worth just trying out.

@brson, does that sound ok to you?

@richo
Copy link
Contributor

richo commented Nov 6, 2014

It seems that all things considered, given that Rust itself will under the hood still use jemalloc for allocation, the key change to this would be foreign code linked into a rust binary? Not suggesting this is bad, but wanted to check I was reading this correctly.

@thestinger
Copy link
Contributor Author

@richo: Yes, this means code using the C allocator API (like LLVM) will make use of jemalloc.

Mixed allocator usage is pretty much the definition of external fragmentation as unused capacity in one allocator is not usable by the other. It also means that virtual memory is fragmented into smaller spans managed by either allocator which hurts large allocations / reallocations. Finally, it means that it's not possible to have jemalloc safely use dss (which has the potential to speed up reallocations).

@brson
Copy link
Contributor

brson commented Nov 6, 2014

@thestinger If it is a problem for jemalloc to be calling sbrk then we could change it to use mmap if we had to. Do you have a source for your assertion that the current behavior is not ok, and do you know why it is working anyway on all platforms?

Are you aware of precedent in other language stacks for overriding the C allocator by default, e.g. does Java override malloc, python?

If we were to do this then it would be a Linux-specific behavior for a major subsystem.

@thestinger
Copy link
Contributor Author

Do you have a source for your assertion that the current behavior is not ok, and do you know why it is working anyway on all platforms?

It's not okay to call sbrk / brk from outside of the malloc implementation. The default for jemalloc is dss:secondary so it will only be called when mmap fails. That's an edge case, but that doesn't make it okay - jemalloc defends against races expanding the heap by leaking, but it can't do anything about a race shrinking the heap. It's possible to set dss:primary at runtime which should be expected to work but it will not be sane with how Rust is configuring jemalloc.

http://lifecs.likai.org/2010/02/sbrk-is-not-thread-safe.html (or see the man pages)

Are you aware of precedent in other language stacks for overriding the C allocator by default, e.g. does Java override malloc, python?

I'm not aware of a language shipping an alternative allocator implementation like Rust. I'm also not aware of a language implementation providing features like reallocate_inplace or sized deallocation (C++14 has sized deallocation, but implementations don't leverage it). I would think that if another language was providing their own allocator, they would want to eliminate external fragmentation caused by a secondary general purpose allocator in the same process. Replacing the platform allocator is explicitly supported by OS X (zone allocator API) and other *nix operating systems (via weak symbols). It's how it's intended to be done by the people who wrote the standard C libraries.

If we were to do this then it would be a Linux-specific behavior for a major subsystem.

No, it's not Linux-specific. It will be doing it on all platforms that aren't explicitly blacklisted by jemalloc like FreeBSD and the Android variant of Linux. It also works on Solaris, AIX and most other *nix operating systems. The inability to replace the Windows allocator is considered an important missing feature upstream and will be fixed at some point - hopefully soon.

On OS X, jemalloc replaces the system allocator by registering itself as the default zone allocator. Rust isn't passing --disable-zone-allocator so it's already attempting to replace the platform allocator there but I don't have access to OS X to verify whether it already works.

This is the normal way to make use of jemalloc because mixing general purpose allocators is bad for performance and memory usage. On FreeBSD (and perhaps NetBSD), the system allocator is jemalloc and Rust is causing two instances of it to be used in the same process. It means external fragmentation (2 fully independent allocators), lower performance (loss of data locality, separate caches) and fragmented virtual memory.

@pcwalton
Copy link
Contributor

pcwalton commented Nov 7, 2014

Java, V8, and SpiderMonkey certainly do ship their own allocators and don't replace the C allocator, although at least V8 could in theory. I'm not going to declare myself opposed to this, but I would like to understand why Rust is different from those three languages in this regard. Replacing malloc with jemalloc for the whole process seems rather intrusive for e.g. embedding scenarios. Of course it should be possible to do so for the reasons you describe, but I suspect that, for example, a Java user would not expect this behavior by default.

@thestinger
Copy link
Contributor Author

Replacing malloc with jemalloc for the whole process seems rather intrusive for e.g. embedding scenarios.

I don't see any problems with this in an embedding scenario. The platform C libraries are designed with support for cleanly replacing the system allocator. Rust already provides the choice being using the C standard library API or the non-standard jemalloc API for allocation via the configure script. If it's using a bundled jemalloc then it doesn't add any bloat to have it replace the system allocator.

If Rust ever intends to use the system jemalloc as required by Debian / Fedora packaging standards (#13945), then it needs to use the default prefix because distribution packages don't add a non-default prefix to the APIs.

Rust mixing two general purpose allocators in the same process increases memory usage through external fragmentation, fragmented spans of virtual memory and poor data locality. If the system allocator is jemalloc, then what Rust is doing is strictly worse than calling into that. I think it's far less intrusive to have jemalloc use the system provided methods for replacing the platform allocator (weak symbols, OS X zone allocator API) than it is to have it live alongside it and fight with it for resources.

I'm not going to declare myself opposed to this, but I would like to understand why Rust is different from those three languages in this regard.

It uses an allocator with the same kind of design and performance characteristics as the platform allocator. Using jemalloc is justified because it provides a consistent performance profile across platforms and allows Rust to leverage advanced API features like sized deallocation, alignment support, in-place reallocation and more. However, that value proposition isn't as strong if jemalloc is going to be hamstrung with a non-default configuration flag that's significantly hurting performance and memory usage.

Java, V8, and SpiderMonkey certainly do ship their own allocators and don't replace the C allocator, although at least V8 could in theory.

Garbage collectors have drastically different performance and memory usage characteristics. Both Chromium and Firefox replace the system allocators with TCMalloc / jemalloc despite shipping those garbage collector implementations.

Of course it should be possible to do so for the reasons you describe, but I suspect that, for example, a Java user would not expect this behavior by default.

I doubt that many Java users have expectations about whether the system allocator is replaced. It wouldn't really matter if they did, because it's a low-level implementation detail with no API impact. The impact is a performance improvement, reduction in memory usage and far more accurate allocator statistics since in practice it will cover everything other than stacks (I'll get to that later...) and memory mapped files (might be interesting to manage those via the same chunk allocator).

@pcwalton
Copy link
Contributor

pcwalton commented Nov 7, 2014

I think it's far less intrusive to have jemalloc use the system provided methods for replacing the platform allocator (weak symbols, OS X zone allocator API) than it is to have it live alongside it and fight with it for resources.

I'm not disagreeing that we should support this mode, but I'm questioning why it should be the default.

Both Chromium and Firefox replace the system allocators with TCMalloc / jemalloc despite shipping those garbage collector implementations.

Neither are designed for embedding.

I doubt that many Java users have expectations about whether the system allocator is replaced.

I'm not sure that's true.

It wouldn't really matter if they did, because it's a low-level implementation detail with no API impact.

What if your C app has two libraries you want to use, one written in Rust and one written in some other language that uses tcmalloc?

@brson
Copy link
Contributor

brson commented Nov 8, 2014

Building for Fedora etc. using the system allocator can be done just by disabling jemalloc in the build. That point at least doesn't seem relevant here.

Likewise I don't understand the point about Rust on systems that already use jemalloc. For such systems wouldn't we want to disable our build of jemalloc and use the system allocator?

You claimed that the current configuration is 'significantly hurting performance and memory usage'. I can imagine the memory usage aspect since there are two active allocators, but how does it hurt performance? Aren't we using jemalloc because it has better performance than the system allocator?

Have you done measurements on the increased memory usage we are causing by having two active allocators? ISTM that Rust programs by default should not be calling malloc much (if at all), so the impact may not be great.

I'm still not convinced that it is Rust's place to be messing with C's allocator. If we did this then are others still able to override the allocator in the expected way or do they have to use Rust's allocator?

@thestinger
Copy link
Contributor Author

Building for Fedora etc. using the system allocator can be done just by disabling jemalloc in the build. That point at least doesn't seem relevant here.

The goal would to be build it against the distribution's jemalloc package via the same non-standard API with features like sized deallocation, not the normal system allocator. It's certainly relevant here because the only choice Rust is offering right now is between lower performance in Rust with the system allocator (whether or not it is jemalloc) and inefficient mixed allocator usage.

Likewise I don't understand the point about Rust on systems that already use jemalloc. For such systems wouldn't we want to disable our build of jemalloc and use the system allocator?

We would still want to use the non-standard API for the significant performance advantages. I would expect the benefits of sized deallocation to cause a widening of the performance gap between the APIs once jemalloc gains support for arena caches.

You claimed that the current configuration is 'significantly hurting performance and memory usage'. I can imagine the memory usage aspect since there are two active allocators, but how does it hurt performance? Aren't we using jemalloc because it has better performance than the system allocator?

Mixing the allocators spreads out data more sparsely, and it's not keeping the thread caches as hot as they would be if the C memory allocations were also hitting it. Since both allocators are grabbing virtual memory via mmap, the virtual memory is fragmented between them. In jemalloc, virtual memory is reused rather than unmapped and it's important for it to be as contiguous as possible to satisfy huge allocations / reallocations via the existing memory. It also means that jemalloc can't safety make use of dss, and that would offer significant performance advantages for reallocations.

Have you done measurements on the increased memory usage we are causing by having two active allocators? ISTM that Rust programs by default should not be calling malloc much (if at all), so the impact may not be great.

Rust programs often call malloc via C libraries, such as the usage of LLVM in rustc. Switching to jemalloc for Rust allocations only reduced memory usage by a few percent, while switching for both reduces it by a total of ~20%. I haven't measured the impact of having two instances of jemalloc in the same process but it means that there are 2x the number of arenas without any concurrency benefit, as each thread is assigned to 2 independent arenas rather than 1.

I'm still not convinced that it is Rust's place to be messing with C's allocator. If we did this then are others still able to override the allocator in the expected way or do they have to use Rust's allocator?

It's still possible to override the system allocator by linking a library like TCMalloc before liballoc. I don't think it makes much sense to mix allocators in the same process though.

However, I don't think it would make sense for someone to do this. Mixing allocators in the same process is not a good idea and it would make far more sense to build Rust with the system allocator API if the intent is to use another allocator like TCMalloc.

@thestinger
Copy link
Contributor Author

I think it makes sense to offer two choices:

  • use the default jemalloc symbols via the non-standard API, with the ability to use a sufficiently up-to-date distribution package if available
  • use the system allocator from Rust - eliminating the dependency on an external / internal jemalloc while hurting performance

I don't think it makes sense to offer the middle ground that Rust is currently using. It still has the disadvantage of an extra library dependency. It doesn't need to have the disadvantage of mixed allocator usage. I don't see any benefits to avoiding the normal configuration where it replaces the system allocator on platforms with support for doing so. It's a significant performance and memory usage loss relative to a default build.

@thestinger
Copy link
Contributor Author

I'm not disagreeing that we should support this mode, but I'm questioning why it should be the default.

It could build against the platform allocator by default, but I don't see why this crippled build of jemalloc should be supported.

Neither are designed for embedding.

I don't think significantly hurting performance / memory usage in processes making heavy use of the C allocator is friendly to embedding. Rust supports using the platform allocator but if it's built against the non-standard jemalloc API instead then it makes sense to leverage the ability to replace the system allocator that's provided by the platform.

What if your C app has two libraries you want to use, one written in Rust and one written in some other language that uses tcmalloc?

It will work correctly. Only one of the allocators will replace the system allocator, depending on which was loaded first - that's how weak symbols work. Rust doesn't permit mixing the alloc::heap API with the C allocator API so it won't cause any problems for the C allocator to be TCMalloc. However, it's really not ideal to have multiple allocators in the same process. It would make more sense to disable TCMalloc in that language or build Rust against the platform allocator API so it also uses jemalloc. The current non-default jemalloc prefix wouldn't make things any better.

@eddyb
Copy link
Member

eddyb commented Nov 8, 2014

Was the main concern about embedding the possibility that jemalloc might replace the system allocator during the runtime of a C program?
While that would be extremely dangerous, it wouldn't happen if you were to simply load a dynamic library, e.g. as a plugin.
If it's possible, it would require manual fiddling inside the GOT and PLT sections.
Maybe @thestinger could have been more clear about that, but he did just answer it.

I am not seeing other potential disadvantages being mentioned, and 20% less memory usage is huge, especially given that we get it by doing nothing, instead of sabotaging jemalloc.

@thestinger
Copy link
Contributor Author

It's not possible to dlopen jemalloc because it marks the TLS variables as initial-exec to get pointer offsets instead of linker calls. So it's already not possible to load a dynamic library bundling a jemalloc-based liballoc as a plugin. The lack of a prefix doesn't really make things any different.

@thestinger
Copy link
Contributor Author

@eddyb: It's ~10-15% less peak rustc memory usage from disabling the custom prefix, the 20% figure is the total win over a build using glibc's allocator everywhere.

@thestinger
Copy link
Contributor Author

The Rust compiler and LLVM make a lot of small allocations so metadata overhead is a big deal. They also do a lot of repeated vector reallocations, so fragmented virtual memory doesn't help.

@pcwalton
Copy link
Contributor

pcwalton commented Nov 9, 2014

I will bring this up during the meeting. My current position is that I am not in favor of doing this by default for libraries because I am concerned about surprising embedders. Sure, jemalloc may well be the best allocator ever, but it's still a somewhat antisocial thing to do. If I'm invited over to your house for dinner, and when I get there I suddenly smash your old CRT TV and replace it with a brand new flat screen, the fact that I upgraded your TV is not going to stop me from getting kicked out of your house and arrested.

Executables, however, are a separate story, because it makes sense that Rust "owns" the runtime environment of a Rust executable. So I would be fine with doing this for executables. (Note that this would address the rustc use case.)

@thestinger
Copy link
Contributor Author

I don't see how it could just be done for executable crates but not library ones.

@pcwalton
Copy link
Contributor

I don't see how it could just be done for executable crates but not library ones.

Well, we could just get rid of the prefix for jemalloc and only link it in for executable crates (at the top level). That way, library crates would use whatever the system allocator is (which, for Rust executable projects, would be jemalloc; for non-Rust executable projects that would be jemalloc if the embedder explicitly opts in and otherwise would be their native allocator).

So it's already not possible to load a dynamic library bundling a jemalloc-based liballoc as a plugin. The lack of a prefix doesn't really make things any different.

Wait, does this mean that it's impossible to dlopen most Rust code? This is a huge issue for embedding, and if it cannot be solved that leads me toward the conclusion that we should not use jemalloc for Rust plugins at all without the embedder opting in.

However, it's really not ideal to have multiple allocators in the same process.

The issue is that that's for the embedder to decide; we should not be making that decision for them if we're a guest in their process.

@thestinger
Copy link
Contributor Author

Well, we could just get rid of the prefix for jemalloc and only link it in for executable crates (at the top level). That way, library crates would use whatever the system allocator is (which, for Rust executable projects, would be jemalloc; for non-Rust executable projects that would be jemalloc if the embedder explicitly opts in and otherwise would be their native allocator).

That won't work because we want to be calling into jemalloc's non-standard API from those libraries.

Wait, does this mean that it's impossible to dlopen most Rust code? This is a huge issue for embedding, and if it cannot be solved that leads me toward the conclusion that we should not use jemalloc for Rust plugins at all without the embedder opting in.

It's one of several reasons that dlopen of Rust code doesn't work correctly (without #![no_std]).

The issue is that that's for the embedder to decide; we should not be making that decision for them if we're a guest in their process.

Rust doesn't make the decision for them. It has a configure option for using the platform allocator API instead of jemalloc's non-standard API. I don't see any way to make the choice at runtime between the more performant jemalloc-specific API (sized deallocation, etc.) and the platform API.

@pcwalton
Copy link
Contributor

That won't work because we want to be calling into jemalloc's non-standard API from those libraries.

Can't we make weak symbol shims over jemalloc's nonstandard API that call into the system allocator and are overridden by jemalloc if it's linked in?

It's one of several reasons that dlopen of Rust code doesn't work correctly (without #![no_std]).

Well, I'd prefer to fix that instead of sealing off the possibility entirely.

Rust doesn't make the decision for them. It has a configure option for using the platform allocator API instead of jemalloc's non-standard API.

Having to rebuild Rust is an awfully big sledgehammer to hit this problem with.

@pepp-cz
Copy link

pepp-cz commented Nov 12, 2014

@thestinger The problem I see is that the system is already "broken" (fragile may be better word) even for C and C++ but people are used to it. The concern here is that Rust libraries should fit into usual brokeness boundaries. With your change the breakage may be worse.

But I realize that I made wrong conclusion from your last previous post. If the Rust dynamic library uses only global malloc and free then it can free anything that it receives from the the binary. When the alignment restrictions of malloc are more strict then requirements of the freex then the binary can free anything it receives from the library. The rallocx may fall into the same reasoning category.

@thestinger
Copy link
Contributor Author

The problem I see is that the system is already "broken" (fragile may be better word) even for C and C++ but people are used to it. The concern here is that Rust libraries should fit into usual brokeness boundaries. With your change the breakage may be worse.

This pull request doesn't break anything.

But I realize that I made wrong conclusion from your last previous post. If the Rust dynamic library uses only global malloc and free then it can free anything that it receives from the the binary. When the alignment restrictions of malloc are more strict then requirements of the freex then the binary can free anything it receives from the library. The rallocx may fall into the same reasoning category.

I don't really think we're talking about the same things.

@Thiez
Copy link
Contributor

Thiez commented Nov 12, 2014

@bill-myers when you are in another language and you free memory that was allocated by Rust you do not run the drop code, so that is already very very wrong (and could potentially leak a lot of resources, break invariants, etc.). Just say 'no' to freeing the memory allocated by other languages :p

@pepp-cz
Copy link

pepp-cz commented Nov 12, 2014

@Thiez Of course the performance of the Rust library will be worse with only the malloc/free. The key part is the assumption that it can be done so that library uses the malloc/free interface to allocator while the binary uses mallocx/freex interface to the same allocator and nothing breaks.

The other assumtion is that when you decide to use dynamic library then you do not care about performance so much.

@thestinger
Copy link
Contributor Author

Anyway, it seems clear from this thread that using the globally visible malloc and free would be slower than using the special api, even if jemalloc is behind malloc and free, so it would be preferable to avoid that. Wouldn't it be better to just document prominently that when mixing Rust code with other languages, one should not go around freeing memory that Rust has allocated, and the Rust code should do the same thing? You probably don't want to do that anyway, because when (in C, for example) you simply free a boxed value you received from Rust you do not run the destructor.

It's already documented, and is not something that would be a good idea to permit in any scenario. It's a really bad idea to mix allocators across library boundaries this way on Windows. There is often more than one "platform" allocator in the same process there and it varies across the library boundaries. You need to free stuff with the API provided by the library / language itself.

In general, it's not something that should be permitted because it would tie the hands of Rust in the future. It would always need to use an allocator stack fully interchangeable with "the" (which?) platform allocator if that was a guarantee. It would hamstring future improvements by forcing the whole allocator to be based around a legacy API.

@bill-myers
Copy link
Contributor

Also keep in mind that some platform-specific C libraries might have broken code that relies on unspecified behavior of the system malloc, such as the way it aligns allocations, the way it rounds sizes up and the portion of address space it uses (esp. where it is located relative to the 2GB/4GB limit).

If the system allocator is replaced with jemalloc, then jemalloc has to provide those same guarantees, which might possibly result in wasting memory if the system allocator rounds sizes up a lot.

@Thiez
Copy link
Contributor

Thiez commented Nov 12, 2014

@bill-myers in that case the broken library should be fixed. If it's broken for us it will also be broken programs that also use an alternative allocator, and probably on systems that use jemalloc as the system allocator (e.g. freebsd). I see no reason why Rust should suffer to support what is essentially hypothetical wrong third-party code.

@bill-myers
Copy link
Contributor

Well, for instance the binary-only Adobe Flash on Linux famously used memcpy() on overlapping memory areas and relied on glibc copying in the forward direction.

Since the allocator is usually never replaced by normal C programs, I would definitely expect that there is code that does things like allocate 24 bytes and use 32 and that works reliably because the system malloc rounds up to powers of two.

Thinking more about it, I think prefixing jemalloc, never replacing the system allocator and thus always having two allocators running is the only thing that is both guaranteed to be bug-free and prevents libraries from depending on free on Boxes working.

@thestinger
Copy link
Contributor Author

I think it would help your case if you stopped for a minute and explained how the whole thing works point-by-point in one place, rather than responding to individual comments in 10 different threads.
This is a complicated subject and things that are obvious to you may be not quite so obvious to someone who did not spend several weeks tuning jemalloc and thinking about allocators.

All of the relevant information is in the original pull request. It doesn't make any changes to the user-facing semantics that are exposed today. It is an under the hood performance improvement, and 95% of the discussion here is off-topic for this pull request. The pull request has nothing to do with the ability to choose the allocator without compiling Rust and has no impact on the ability to dlopen Rust code. It is not a step towards or away from the ability to do these things. I noted that dlopen was already broken for other reasons when that was brought up along with explaining that the lack of a prefix is not incompatible with it.

I've read this exchange twice now and I still don't understand what is going to happen as a result of this change with all different OSes and linking modes involved.

That's exactly what was intended:

It's not a technical disagreement. It's a repeated pattern of obstructionism on my pull requests via misrepresentation of the issues and grasping for any dubious counter-argument you can think of. As your bogus information is refuted countless times you grasp for more mud to sling. Eventually, the waters are muddied enough that it's hard for anyone to tell what the real facts are. It doesn't change the fact that you're being repeatedly dishonest and manipulative.

I have to keep addressing misinformation and repeated misrepresentation of my statements, so there is no way that anyone is going to be able to follow the conversation here.

Why would loading Rust modules into a C program be safe? Is this because Rust-allocated memory never gets free'ed outside of Rust modules (and vice-versa)? Or is this because jemalloc does something to interoperate with the system malloc?

Rust's allocator API is entirely separate from the C allocator API. It is explicitly documented as not being interchangeable with it for reasons that I have gotten into above. Dynamically loading a library with dlopen does not clobber the existing symbols so it really has no impact.

If foreign code is using a function like mallocx, then it is already linked against jemalloc without a prefix and Rust has no say in the matter. It doesn't make any sense to me that it's being used as an argument against this...

How would one in practice ensure that when linking with Rust static libraries, the whole program's allocator does not get replaced with jemalloc?

Rust already provides the option of the alloc::heap API using the C allocator API instead of the jemalloc API. The ability to make that decision for each executable / library is covered by #18838 and is orthogonal to this pull request.

What is "dss", "initial-exec", "zone allocator", and other allocator-specific jargon?

dss (data storage segment) refers to the heap data section, which is managed by sbrk. It is only possible to use it from a single allocator or there will be unsoundness. This article covers some of the issues involved: http://lifecs.likai.org/2010/02/sbrk-is-not-thread-safe.html

jemalloc has a runtime option setting the preference of mmap vs. sbrk. The default is dss:secondary so it will call it if mmap fails to provide the memory it wants. It can be set to dss:primary at runtime via an environment variable (or the mallctl API) and there's a performance argument for doing that by default. However, this is completely unsound if there's another allocator expanding / shrinking the heap with sbrk which is the case if the glibc allocator is being used in the same process.

The zone allocator API is an API provided by OS X / iOS for the explicit purpose of replacing the system allocator. Rust is currently making use of it, since it doesn't pass --disable-zone-allocator for configure. I have been mentioning this because I find it odd that people are totally fine with doing that but yet oppose the same thing on other platforms.

initial-exec is not an allocator thing, it's a model for thread-local variables where the code is guaranteeing that it is not loaded dynamically. Since jemalloc currently uses this for the thread local storage, it is not possible to dlopen it without clobbering TLS. This really has nothing to do with this pull request, but I was countering other off-topic claims.

@thestinger
Copy link
Contributor Author

Since allocators are never replaced by C programs

That's not true. This is the normal way to use the TCMalloc and jemalloc libraries, and C programs already need to cope with endless churn in the design of the glibc allocator and portability to various other allocators.

Thinking more about it, I think prefixing jemalloc, never replacing the system allocator and thus always having two allocators running is the only thing that is both guaranteed to be bug-free and prevents libraries from depending on free on Boxes working.

I don't understand how you've managed to draw that conclusion. The FreeBSD, glibc, Windows and OS X allocators have been drastically changed over the years and it has never resulted in what you're claiming.

@thestinger
Copy link
Contributor Author

Rust is already doing stuff like enabling full ASLR by default and having the linker remove unused sections which is far more relevant to the issue of breaking code that depends on undefined behaviour. Rust makes the assumption that unsafe code is correct, and making life friendly to broken code with memory corruption issues has never been part of the design process. The jemalloc guarantees about stuff like alignment are compatible with the operating system's allocator.

@reem
Copy link
Contributor

reem commented Nov 12, 2014

It is also trivially easy to define a CBox<T> which acts just like Box<T>, or at least it will be when we get DerefMove, that allocates using libc malloc and free and just use that to create allocations meant to be freed by C code.

My impression from past discussion was that C code being able to free Box<T> was an explicit non-goal, and that one should use the appropriate allocator if that is the desired use case, as the Rust allocator is not defined as part of the Rust API.

@pepp-cz
Copy link

pepp-cz commented Nov 12, 2014

@thestinger After rereading the commit and some key parts of the discussion my understading is that the concern is that dynamic Rust libraries become "poisonous" because of the malloc/free. My question is: would it be possible to export unprefixed jemalloc symbols only for Rust binaries and use prefixed symbols in all Rust code? That way the Rust programs will benefit from single memory allocator with low risk of breaking any external code and on the other side dlopening Rust library will not change behaviour at the cost of using additional allocator alongside the system allocator (but when you decide to use dynamic library than performance presumably does not matter so much).

@thestinger
Copy link
Contributor Author

dlopen of a Rust library will not change the system allocator with this pull request.

The current situation is that liballoc can be compiled with or without --cfg jemalloc (--disable-jemalloc configure flag) which toggles whether it uses the jemalloc API and bundles jemalloc or builds on top of the C API. There is an issue filed about making the allocator configurable without recompiling all of Rust (#18838) but doing it well requires some thought.

My question is: would it be possible to export unprefixed jemalloc symbols only for Rust binaries and use prefixed symbols in all Rust code?

I don't know how linking would work in this case. If the libraries use the prefixed symbols then those symbols need to provided by something for linking to complete.

@thestinger thestinger closed this Nov 12, 2014
@bill-myers
Copy link
Contributor

If there are any widely-used large programs (Firefox or Chrome perhaps?) replacing the default allocator on Linux, that would give some confidence that at least the basic system libraries are not broken with non-default allocators.

In addition to the bugs issue, there is also the issue of things starting to depend on malloc() and the Rust allocator being interchangeable.

If we really want to allow replacing the system allocator I think it should be optional, and ideally a choice that can be overridden at runtime with an environment variable.

Maybe what could be done is keep jemalloc prefixed or dlopen it with RTLD_LOCAL, and add code to Rust executables (not libraries) to use either glibc malloc hooks or GNU ifuncs to replace the system allocator, depending on both an environment variable and a default setting specified at compile time.

@thestinger
Copy link
Contributor Author

If there are any widely-used large programs (Firefox or Chrome perhaps?) replacing the default allocator on Linux, that would give some confidence that at least the basic system libraries are not broken with non-default allocators.

Firefox, Chrome and MariaDB among others. It's the normal way to use TCMalloc / jemalloc and putting either of them in LD_PRELOAD works perfectly fine across Linux distributions.

@thestinger thestinger deleted the jemalloc branch November 12, 2014 11:11
@netvl
Copy link
Contributor

netvl commented Nov 12, 2014

I'm not experienced in allocation stuff, but I just do not understand why everyone states that this pull request affects dlopen while the pull request author states it does not. Again, I don't know in detail how allocators and dlopen-related functionality works, but it seems very clear to me from @thestinger's explanations that just renaming symbols per this PR does not affect dlopening, and dlopen is broken now anyway due to jemalloc implementation.
I urge everyone to read the discussion more thoroughly, because there seems to be a great misunderstanding here.

@pepp-cz
Copy link

pepp-cz commented Nov 12, 2014

@bill-myers I have added jemalloc to couple large in-house developed programs running on linux without any problems. You can easily google results of benchmarks that many people did by preloading jemalloc or tcmalloc in different SQL databases, interpreters and other programs.

@thestinger
Copy link
Contributor Author

I don't care enough about Rust's performance to deal with this. That's all folks.

@pepp-cz
Copy link

pepp-cz commented Nov 12, 2014

@thestinger Excuse me that I ask another question but the pull request says "The lack of a
prefix allows jemalloc to override the weak symbols from the C standard
library to replace the system allocator." Recently you posted that "dlopen of a Rust library will not change the system allocator with this pull request." Is it contradiction or I just misundestood something?

@thestinger
Copy link
Contributor Author

It replaces the C allocator if and only if it's linked into the executable at initialization (ignoring RTLD_DEEPBIND, which is broken with nearly everything).

@pepp-cz
Copy link

pepp-cz commented Nov 12, 2014

@thestinger You mean dynamically linked at program initialization? What about RTLD_GLOBAL? Wouldn't it replace C allocator for other dlopened libraries?

@thestinger
Copy link
Contributor Author

It won't clobber the existing symbols without RTLD_DEEPBIND. I'm not pursuing this anymore so none of it really matters.

@vadimcn
Copy link
Contributor

vadimcn commented Nov 12, 2014

It replaces the C allocator if and only if it's linked into the executable at initialization (ignoring RTLD_DEEPBIND, which is broken with nearly everything).

I think this is the critical piece of information that was missing from the PR, and what most people were confused/wary about. This was mentioned in this thread, but not prominently enough. Very few people will have the energy to carefully read a long-ish thread, then hunt down the rest of info in the IRC logs, previous PRs, etc. Least of all Rust core team, who, I imagine, are pretty busy these days.
This is exactly why I suggested that you write a mini-RFC, or a FAQ, to collect all information in one place that you can refer people to, instead of trying to put out fires in a zillion individual conversations.

I'm not pursuing this anymore so none of it really matters.

I hope things look less bleak in the morning :-)
We value your contributions, but please try to keep your calm when people do not understand you. I assure you, this is not entirely their fault.

@vadimcn
Copy link
Contributor

vadimcn commented Nov 12, 2014

Oh, and regarding LLVM allocator: can't we link rustc_llvm with --defsym=malloc=je_malloc, or something like that?

@vks
Copy link
Contributor

vks commented Nov 12, 2014

@vadimcn Citing pcwalton from the reddit thread:

Eventually we all came to the conclusion that the better way to avoid this is to use jemalloc the way it was designed: Rust executables should link against jemalloc, and shared Rust libraries should use whatever allocator that their host executable was linked with.
The submitter of that PR has stated that he'll be working on that quite soon, actually, and I'd be quite happy to take that revised PR that, as far as I can tell, pleases everyone involved.

This sounds less bleak I think.

lnicola added a commit to lnicola/rust that referenced this pull request Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow jemalloc to replace the platform allocator