-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: memory not being returned to OS #22439
Comments
@6degreeshealth My current suspicion is that we're actually returning pages to the OS, but it isn't taking them. We've seen this before, where the OS doesn't actually remove the pages from the process unless there is demand for them elsewhere in the system. A scavenger trace (the |
@randall77 I replaced your 10 minutes sleep with
and I am posting whole output
Alex |
The Go runtime claims it returned all but 94MB to the OS (the scvg2 lines). |
I downloaded VMMap program to see what is going on with this process. And it does uses a lot of memory. This particular program creates quite a few threads, and OS allocates 2MB for each thread stack space. @aclements merged CL 49331 recently, that changed oh64.SizeOfStackCommit from 0x00001000 to 0x00200000. I think it is a mistake (and I did not notice at the time). We should change it back to make thread initial stack size small again. It would be nice to have a test to prevent mistakes like that in the future. I could, probably, write a test that measures committed memory of an external program. But what should I compare that value with? Alex |
Ah, I see, this might be Ms, not Gs or the heap, that are using up all the memory. |
CL 49331 intended to change just the use of virtual memory, but, I think, there is a mistake in CL 49331 - change of oh64.SizeOfStackCommit from 0x00001000 to 0x00200000 actually affects physical memory. Windows memory management is quite complicated, but, for the purpose of this discussion, each page of process addressed memory could be one of three: free, reserved and committed (see https://msdn.microsoft.com/en-us/library/windows/desktop/aa366794(v=vs.85).aspx ) Only "committed" When process starts, EXE file describes to OS how stacks should be managed. In particular PE file format specifies the maximum size of the stack to be used (stored in oh64.SizeOfStackReserve). When process creates new thread, it "reserves" oh64.SizeOfStackReserve bytes for each stack. PE file also specify how much of oh64.SizeOfStackReserve bytes should be "committed" at start of each thread (this is stored in oh64.SizeOfStackCommit). So normal stacks should start with large "reserved" stack, with some small part of it "committed" at the beginning. As stack grows, "committed" part grows too - when program grows beyond "committed" area, it tries to read / write memory, this causes "system exception", and memory manager handles these exceptions by making more "committed" memory and restarting failed code again. So for a particular case of non-cgo windows/amd64 we had:
before CL 49331, and:
after CL 49331. So the max stack size got increased from 132K to 2M, but "committed" part of the stack also was increased from 4K to 2M. I think we should not have done the second change, because now every new thread cost us 2M of physical memory comparing to 4K before CL 49331. And once "committed", stack memory cannot be freed - I suspect only exiting thread will free its stack memory. I hope I explained it well enough. Alex |
Yes, that all makes sense, thanks.
Are we ever exiting Ms that have been idle for a long time? If not, then doing that might fix the issue. Maybe an additional task for the scavenger to do. |
I see. It looks like I just did what we were already doing in the cgo case. I don't know why we were committing basically the whole stack under cgo in the first place. I'm certainly fine with reducing the commit if that works.
I recently added the framework necessary to do this, but right now we only exit locked Ms (when their Gs exit without unlocking). We don't use this for exiting idle Ms currently, but we could. |
I did test that yesterday. Setting oh64.SizeOfStackCommit back to 0x00001000 helps with memory usage significantly. We can get @6degreeshealth check that it helps too to be sure. I am worried that we have broken this thing without anyone noticing. I wonder if I could write some test to stop this happening in the future.
I don't think so. But Austin will know.
Yes, the fact that thread stacks consume memory (even when threads are idle) might be a good reason to exit them.
cgo case is very different from non-cgo case as far as I am concerned. I hope not many people use cgo. Especially on Windows, because we have syscall.Syscall to easily call into Windows C code without cgo. And cgo executables are created by external linker, so I am not sure if we can control SizeOfStackReserve and SizeOfStackCommit fields in the PE exe. Also we use _beginthead function to start threads with cgo. I do not know if _beginthread allow us control "commit" stack size. I probably dropped the ball with cgo. Do we want to try and improve this for cgo case too?
Cool. I will send the change. The change should be simple, I just wonder about the test. Alex |
That's true. Though I would expect the external linker to set reasonable "normal" stack sizes for Windows, which presumably have a small commit?
We pass 0 for the stack size, which MSDN says means to "the same value as the stack that's specified for the main thread", which I expect means to use both the reservation and commit from the PE header.
Start lots of threads and if the system crashes the test fails? :) |
I would be happy to rest, but I'm on a business trip away from my Windows computer. I won't be able to test until Saturday. I assume that to test I'll need to build Go from source using the latest source. Never done that before, but I'll give it a whirl. |
I will see what we get with external linker.
I was actually hoping you will be helpful. ;-)
That would be nice, if you test when my change is ready. Thank you.
The change is not ready, so do not worry.
Yes, you will need to build Go from source. Here https://golang.org/doc/install/source are the instructions. We will help you if you get into trouble. Alex |
For the test, I was thinking the Windows equivalent of this: package main
import (
"fmt"
"io/ioutil"
"runtime"
"strconv"
"strings"
"sync"
"syscall"
)
func main() {
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
runtime.LockOSThread()
wg.Done()
select {}
}()
}
wg.Wait()
vsize, rss := mem()
fmt.Printf("%d bytes RSS (%d bytes virtual)\n", rss, vsize)
}
func mem() (vsize, rss uintptr) {
stat, err := ioutil.ReadFile("/proc/self/stat")
if err != nil {
panic(err)
}
parts := strings.Fields(string(stat))
parseInt := func(x string) uintptr {
y, err := strconv.ParseInt(x, 10, 64)
if err != nil {
panic(err)
}
return uintptr(y)
}
vsize = parseInt(parts[22])
rss = parseInt(parts[23]) * uintptr(syscall.Getpagesize())
return
} E.g., on Linux/amd64, I get a mere "3973120 bytes RSS (52129792 bytes virtual)". I'm not sure what the Windows equivalent of /proc/self/stat's RSS is. Maybe Unfortunately, this probably has to be a |
Thank you, Austin. I will have a go. Alex |
Change https://golang.org/cl/74490 mentions this issue: |
Sounds correct to me. That is what I have used in my CL 74490 anyway. @robarchibald if you want to see if your problem is fixed, you need to build version of Go that uses this source code:
Please, let use know how it goes. Thank you. Alex |
Thanks @alexbrainman, @aclements, @randall77 for your work on this. Unfortunately, the problem doesn't seem to be resolved. The fix is only marginally better. What's interesting today is that if I run this program more than once it will behave differently each time. I'm seeing it vary between 70 MB and 1.5 GB of memory. I can only get to 1.5 GB of memory if I run two of the memoryLeak.exe at the same time. Why would the memory usage skyrocket when system utilization is higher? It's as if the system gets heavily utilized and suddenly it can't keep up and starts over-allocating or forgetting it has the memory allocated. Is it possible (gasp) that the garbage collector has a concurrency problem? I changed the
Interestingly, today Go is eventually cleaning up after itself, but not until somewhere between 8 and 10 minutes. I noticed the memory go down on one run so I switched the wait to be 11 minutes instead of 10 so I wouldn't mistake any cleanup as the process just killing itself. That's also why I added more output logging as shown above. Is there a way I can trick the GC to clean up sooner? I put runtime.GC() in the code, but it doesn't seem to do anything. It appears that the scavenger doesn't notice that it has an extra 1686 and 1625 MB of memory until 8-10 minutes into program execution. Once it reports this extra memory, the OS dutifully cleans up. Why does Go keep such a messy room for so long? Here is the full gctrace for the memoryLeak.exe and memoryLeak_fix.exe (compiled using the recent fix). I ran these concurrent with one another from the command Line. memoryLeak.exe
memoryLeak_fix.exe
|
I suggest you use VMMap to see where all that memory goes. Please report back.
I do not know know why. Use VMMap to answer all these questions. When I run your program here, I saw: 1) memory allocated by Go gc; 2) memory allocated by Windows thread stacks. 1 should go away (I saw it go away) once gc reports "scvg2: 1625 MB released". 2 should be much smaller (see "committed" column) once you apply CL 74490 - I see about 16-20K per thread.
Yes, that is what I saw too. So once you apply CL 74490 and wait for 10 minutes, most of your memory should be free.
I do not know, but others might help. But reading source code, I can see that runtime.GC() does not release memory to OS, it keeps it handy until scavenger decides that it should be released to OS.
I guess, Go gc assumes that program might reuse memory, so it keeps it around for a few minutes. Your program uses a lot of memory at the start, and than sits and does nothing till the end - normal programs do not do that.
I would like you to use VMMap to see how much of all "private" memory there left after 10 minutes. And compare that with the same figure before CL 74490. Alex |
You can clean up sooner by calling https://golang.org/pkg/runtime/debug/#FreeOSMemory . |
That means we've at least fixed the thread stacks problem because those wouldn't be cleaned up by the scavenger.
There are a few reasons: it costs to release memory and later re-acquire it from the OS, applications don't typically have spikes like this (though certainly plenty do), and if you're running in a server/container environment you often have memory dedicated to your job so it's pointless to return the memory. That said, I think this is mostly wrong. :) Rewriting the scavenger and releasing memory much more aggressively has been on my list of things to do for a while, but it doesn't actually come up all that often so it hasn't been a high priority. |
@alexbrainman this is a "normal" program even if the test isn't. The application I'm building is like a job scheduler. My program runs a web server which kicks off background workers to perform work on request. Prior to becoming a Gopher, I would've written this in C# and I wouldn't be having this problem. C# schedules background workers as separate threads which terminate when they complete processing. In C# when the processing is done, all the memory would be cleaned up when the threads are killed. In Go, we don't have that luxury so that's all the more reason why Garbage Collection and the scavenger must be good at cleaning up after themselves. I've downloaded VMMap and will run it and report back later today. BTW, I've also made a small modification to my test program so that it's more "realistic". It now kicks off another background job every minute and now Go never releases the memory.
Is this the desired behavior? In the world of CPU and memory, 1 minute is a long time. 5 minutes seems like forever and 8-10 minutes feels like an eternity. 8 minutes on a quad core 4 GHz processor is 7.7 trillion operations later. Having to have a program be completely idle for 8+ minutes before cleaning up seems way too long to me. |
No, it's not. But see my earlier comment. However, if your application's heap size is going to be spiking regularly, do you have something else running that would be able to take advantage of that idle memory just between those spikes? This resource management problem is bigger than a single Go process releasing memory. |
The code you have shown us should use very little memory after you apply CL 74490 and wait for 8-10 minutes. If you see differently, please use VMMap, and tell us where the memory is.
It is difficult to speculate that your problem is. Please, show your code and we will investigate (perhaps create new issue). Thank you Alex |
Thanks @aclements and @alexbrainman . Your comments have caused me to think differently about this problem. I've never had such a massively parallel program as the one I'm working on and I realize after messing with this little toy project that I was thinking of memory incorrectly. I may not have any "leaks" in my code, but if I have 1000 things all asking for memory at the same time, Go will dutifully request 1000 different small pieces of memory at the same time which will lead to a lot of memory usage. Such a program that requests a lot of data in parallel just takes more resources than I anticipated. The fact that Go holds on to the memory for so long was confusing, but I understand why now thanks to @aclements comments. I do look forward to the changes @aclements plans to make to the scavenger though and I'd love to test those when ready. |
Copied from issue #9869 .
Reporter thinks the memory used by this program is not returned to the OS during the Sleep(10*minute) call.
The text was updated successfully, but these errors were encountered: