cmd/compile: slow to compile large map literals containing dynamic elements #19751

myleshorton · 2017-03-28T22:58:47Z

What version of Go are you using (`go version`)?

go version go1.8 darwin/amd64

What operating system and processor architecture are you using (`go env`)?

$ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/afisk/gopath"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_q/9g3l6s7x3v92ws74nng9g2g00000gn/T/go-build418677857=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

Used text/template to generate a very large go file to embed data, and then just ran go build

$ git clone https://github.com/getlantern/httpseverywhere.git
$ cd httpseverywhere
$ git checkout too-large-for-go
$ go build

What did you expect to see?

A successfully built binary

What did you see instead?

$ go build
# github.com/getlantern/httpseverywhere
panic: runtime error: makeslice: len out of range

goroutine 1 [running]:
cmd/compile/internal/gc.newliveness(0xc4272221b0, 0xc42011c238, 0xc49bc7e000, 0x4a651, 0x5ac00, 0xc4da1da000, 0x832c, 0x9800, 0xc48ef5f898)
	/usr/local/go/src/cmd/compile/internal/gc/plive.go:717 +0x145


cmd/compile/internal/gc.liveness(0xc4272221b0, 0xc42011c238, 0xc4c21321c0, 0xc4c2132230)
	/usr/local/go/src/cmd/compile/internal/gc/plive.go:1740 +0x14f
cmd/compile/internal/gc.genssa(0xc45d30b440, 0xc42011c238, 0xc4c21321c0, 0xc4c2132230)
	/usr/local/go/src/cmd/compile/internal/gc/ssa.go:4500 +0x77c
cmd/compile/internal/gc.compile(0xc4272221b0)
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:443 +0x707
cmd/compile/internal/gc.funccompile(0xc4272221b0)
	/usr/local/go/src/cmd/compile/internal/gc/dcl.go:1292 +0xdc
cmd/compile/internal/gc.fninit(0xc420115c00, 0x3b, 0x70)
	/usr/local/go/src/cmd/compile/internal/gc/init.go:164 +0xb8d
cmd/compile/internal/gc.Main()
	/usr/local/go/src/cmd/compile/internal/gc/main.go:471 +0x23b1
main.main()
	/usr/local/go/src/cmd/compile/main.go:50 +0xfe

The text was updated successfully, but these errors were encountered:

myleshorton · 2017-03-28T22:59:22Z

@josharian here's a new issue originally described at #14082

josharian · 2017-03-29T15:22:10Z

Thanks, @myleshorton.

The problematic file is a giant composite literal, a map whose elements contain maps. The compiler converts this into static initialization code in which each element in turn is initialized, and then placed into the top-level map. It all ends up in one giant function. Almost all the time is spent in phi insertion, at least on tip.

The file is 300k lines long. 1k lines take 0.2s to compile. 10k lines take 2s. 50k lines take 30s. That's as far as I got.

I didn't see any memory blow-up in liveness, and most memory is allocated in the usual places (nodes, stackalloc), although tip's liveness implementation is significantly different than 1.8's, so maybe that particular problem is fixed.

Here's one idea. We could break up giant composite literal initialization into multiple functions. Each function by itself would be much faster to compile, and as long as they don't get too granular, I doubt we'd lose much optimization opportunity or slow things down due to call overhead. The main problem here is probably implementation--coordinating compilation and calling of all the extra functions. Maybe a binary splitting approach until each chunk's size is manageable. This has the additional benefit of (theoretically!) allowing concurrent compilation of giant init functions. This suggestion pains me a little, since I'm trying to figure out how to eliminate the convolution caused by the "drop everything and compile this function" thing we currently do all over the compiler, but maybe if I find a good general solution to the problem, this can fit into it.

Thoughts, @randall77 or @mdempsky?

josharian · 2017-03-30T21:12:21Z

I took a stab at this but haven't gotten anything working yet. One sticking point is that the dynamic entries may refer to other local variables. Another sticking point is that setting up a new function manually is a pain.

In the meantime, @myleshorton, would you mind running an experiment? (I'd do it myself, but I don't see your template/generator code checked in anywhere obvious. Or just point me to it.)

Instead of generating code like:

var targetsDict = map[string]*Targets{ 
	"": &Targets{
		Plain: map[string]bool{ 
			"s.lair.io": true,
			"xserver.*.jp": true,
			"youtube.co.ke": true,
		},
	},
	"0": &Targets{
		Plain: map[string]bool{ 
			"bit.0.md": true,
		},
	},
	"0-60": &Targets{
		Plain: map[string]bool{ 
			"0-60.in": true,
		},
	},
	// ...

Try generating:

var targetsDict = map[string]*Targets{}

func init() {
	targetsDict[""] = &Targets{
		Plain: map[string]bool{ 
			"s.lair.io": true,
			"xserver.*.jp": true,
			"youtube.co.ke": true,
		},
	}
	targetsDict["0"] = &Targets{
		Plain: map[string]bool{ 
			"bit.0.md": true,
		},
	}
	// ...
}

func init() {
	targetsDict["0-60"] = &Targets{
		Plain: map[string]bool{ 
			"0-60.in": true,
		},
	}
	// ...
}

where each of the init functions contains (say) 100 or 1000 of the entries. Each of the functions should be called init. That should in theory result in the same behavior when compiled, but be much faster to compile. I'd love to hear whether that theory holds water, since it is more or less what I'd like to get the compiler to do automatically. And if it works, you'll be unblocked. :)

josharian · 2017-04-03T15:44:00Z

I wrote a quick parser and did some manual editing and hacked together a version of targets.go that puts every single targetsDict assignment in its own init function. (This was easier for a quick hack than grouping them into 100s or 1000s.)

This reduced the compilation time to 33 seconds on my machine. (It used to take 22 minutes to finally panic in plive.) So this approach has some promise.

randall77 · 2017-04-03T15:48:38Z

I'd like to understand what isn't linear about the compilation, that makes breaking functions up worthwhile. I'd be much happier if we could find and fix that nonlinearity instead.
@josharian You said all the time was in phi insertion. Any particular pattern that leads to nonlinearity?

josharian · 2017-04-03T15:50:58Z

@randall77 I'm already looking into it. :)

gopherbot · 2017-04-03T17:15:22Z

CL https://golang.org/cl/39294 mentions this issue.

This triggers 119 times during make.bash. This CL reduces the time it takes for the compiler to panic while compiling the code in #19751 from 22 minutes to 15 minutes. Yay, I guess. Updates #19751 Change-Id: I8ca7f1ae75f89d1eb2a361d67b3055a975221734 Reviewed-on: https://go-review.googlesource.com/39294 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Keith Randall <[email protected]>

gopherbot · 2017-04-03T20:00:39Z

CL https://golang.org/cl/39351 mentions this issue.

Instead of walking the list of nodes twice, once to find static entries to add to an array and once to find dynamic entries to generate code for, do the split once up front, into two slices. Then process each slice individually. This makes the code easier to read and more importantly, easier to modify. While we're here, add a TODO to avoid using temporaries for mapassign_fast calls. It's not an important TODO; the generated code would be basically identical. It would just avoid a minor amount of pointless SSA optimization work. Passes toolstash-check. No measureable compiler performance impact. Updates #19751 Change-Id: I84a8f2c22f9025c718ef34639059d7bd02a3c406 Reviewed-on: https://go-review.googlesource.com/39351 Run-TryBot: Josh Bleecher Snyder <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>

josharian · 2017-04-03T20:21:55Z

OK, after the previous commits are in, the compiler panics much faster. Yay!

The panic occurs because there are lots of basic blocks, and we try to bulk allocate a slice with length longer than an int32 can hold. Eep. I'll send a CL soon to at least turn that into a Fatal instead of a panic.

The basic blocks in this case all come from branches from checking whether runtime.writeBarrier is set.

I see lots of possible fixes, but the one I find most appealing is asserting (and ensuring!) that runtime.writeBarrier is off during init, and then assuming as much in the SSA writebarrier pass.

(Other fixes including more gracefully handling lots of blocks in liveness or following my earlier suggestion of breaking up giant init functions.)

@randall77 @cherrymui opinions about this?

gopherbot · 2017-04-03T20:30:32Z

CL https://golang.org/cl/39354 mentions this issue.

randall77 · 2017-04-03T21:19:21Z

We can't assume write barriers are off during init, unfortunately. An init function can have arbitrarily long-running code in it.

What part of the compiler is doing the bulk allocating? The actual block ID allocator will panic if we try to allocate an ID that's too big. Or is something else trying to preallocate and gets too aggressive?

Sounds like we should put some more effort into optimizing phi insertion, at least.

When a map is small, it's not worth putting the contents in an array and then looping over the array. Just generate code instead. This makes smaller binaries. It might also be better for cache lines. It also can avoids adding control flow in the middle of the init function, which can be very large. Eliminating this source of extra blocks makes phi insertion easier for temp-heavy init functions. This reduces the time required for compiler to panic while compiling the code in #19751 from 15 minutes to 45 seconds. The cutoff of 25 was chosen fairly unscientifically by looking at the size of cmd/go. Cutoff of 0: 10689604 Cutoff of 5: 10683572 Cutoff of 15: 10682324 Cutoff of 25: 10681700 Cutoff of 50: 10685476 Cutoff of 100: 10689412 There are probably more sophisticated mechanisms available. For example, the smaller the key/value sizes, the better generated code will be vs a table. Nevertheless this is simple and seems like a good start. Updates #19751 name old time/op new time/op delta Template 204ms ± 6% 202ms ± 5% -0.78% (p=0.027 n=47+45) Unicode 84.8ms ± 6% 85.2ms ± 7% ~ (p=0.146 n=46+45) GoTypes 551ms ± 2% 556ms ± 3% +0.76% (p=0.004 n=43+45) SSA 3.93s ± 3% 3.95s ± 4% ~ (p=0.179 n=50+49) Flate 123ms ± 4% 123ms ± 5% ~ (p=0.201 n=47+49) GoParser 145ms ± 3% 145ms ± 4% ~ (p=0.937 n=50+50) Reflect 356ms ± 3% 354ms ± 5% -0.44% (p=0.048 n=46+50) Tar 107ms ± 6% 106ms ± 6% ~ (p=0.188 n=50+49) XML 201ms ± 4% 200ms ± 4% ~ (p=0.085 n=50+49) name old user-ns/op new user-ns/op delta Template 252M ± 9% 250M ± 7% ~ (p=0.206 n=49+47) Unicode 106M ± 7% 106M ± 9% ~ (p=0.331 n=47+46) GoTypes 724M ± 5% 729M ± 5% ~ (p=0.160 n=47+49) SSA 5.64G ± 2% 5.62G ± 4% ~ (p=0.148 n=47+50) Flate 147M ± 6% 147M ± 5% ~ (p=0.466 n=50+49) GoParser 179M ± 5% 179M ± 6% ~ (p=0.584 n=50+49) Reflect 448M ± 6% 441M ± 8% -1.39% (p=0.027 n=50+49) Tar 124M ± 6% 123M ± 5% ~ (p=0.221 n=50+47) XML 244M ± 5% 243M ± 4% ~ (p=0.275 n=49+49) name old alloc/op new alloc/op delta Template 39.9MB ± 0% 39.4MB ± 0% -1.28% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.310 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% ~ (p=0.421 n=5+5) SSA 854MB ± 0% 854MB ± 0% ~ (p=0.151 n=5+5) Flate 25.3MB ± 0% 25.3MB ± 0% ~ (p=1.000 n=5+5) GoParser 31.8MB ± 0% 31.8MB ± 0% ~ (p=0.222 n=5+5) Reflect 78.2MB ± 0% 78.2MB ± 0% ~ (p=1.000 n=5+5) Tar 26.7MB ± 0% 26.7MB ± 0% ~ (p=0.841 n=5+5) XML 42.3MB ± 0% 42.3MB ± 0% -0.15% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 390k ± 1% 386k ± 1% -1.05% (p=0.016 n=5+5) Unicode 319k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) SSA 7.60M ± 0% 7.59M ± 0% ~ (p=0.310 n=5+5) Flate 234k ± 0% 235k ± 1% ~ (p=1.000 n=5+5) GoParser 315k ± 1% 317k ± 0% ~ (p=0.151 n=5+5) Reflect 978k ± 0% 978k ± 0% ~ (p=0.841 n=5+5) Tar 251k ± 1% 251k ± 1% ~ (p=0.690 n=5+5) XML 394k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Change-Id: Ic53a18627082abe075a1cbc33330ce015e50850a Reviewed-on: https://go-review.googlesource.com/39354 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Keith Randall <[email protected]>

josharian · 2017-04-03T23:09:39Z

We can't assume write barriers are off during init, unfortunately. An init function can have arbitrarily long-running code in it.

Could we do it during autogenerated inits? IIRC those all complete before user-provided inits run, or at least, they could probably be made to. But I guess maybe that's overkill for this problem.

What part of the compiler is doing the bulk allocating? The actual block ID allocator will panic if we try to allocate an ID that's too big. Or is something else trying to preallocate and gets too aggressive?

It's the call to bvbulkalloc in newliveness. We're calling it with nbit=48645 and count=2074940, so nword=1521 and nword*count=3155983740, leading to the call make([]uint32, 3155983740), which is way too big.

I think it might take some significant refactoring to avoid this.

It might be better and easier to try to split up large init functions, as proposed above. Note that there is some history of doing this manually: https://codereview.appspot.com/72590045. I'd say we could just keep asking users to split them up manually, since any init that big must be autogenerated, but splitting them up also has the benefit of allowing it to be compiler concurrently.

Sounds like we should put some more effort into optimizing phi insertion, at least.

That's definitely true in general, but the CLs that I've submitted so far in mostly solve the phi problem, at least for this particular bit of code.

gopherbot · 2017-04-03T23:15:16Z

CL https://golang.org/cl/39358 mentions this issue.

This provides better diagnostics when it occurs. Updates #19751 Change-Id: I87db54c22e1345891b418c1741dc76ac5fb8ed00 Reviewed-on: https://go-review.googlesource.com/39358 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]>

josharian · 2017-04-04T01:10:13Z

Another idea: have static initialization track and reuse autotmps, much as order.go does. Will investigate.

cherrymui · 2017-04-04T15:43:41Z

The basic blocks in this case all come from branches from checking whether runtime.writeBarrier is set.

Yes, the write barrier pass creates many blocks. I filed issue #19838 for grouping write barriers more aggresively. But it actually doesn't help here: there are calls (newobject, mapassign_faststr, makemap) between each write barrier.

ghost · 2017-04-04T21:37:24Z

When I tried to compile this program with gccgo (Ubuntu 6.0.1-0ubuntu1) 6.0.0 20160414 (experimental) [trunk revision 234994], it ran for about 45 minutes, then started allocating my system's memory at such a rate that I was forced to abort.

josharian · 2017-04-05T14:48:39Z

have static initialization track and reuse autotmps

Challenge with this: We flip back and forth between sinit and walk, so it's hard to maintain state cleanly. Probably still worth doing, but it's going to require some significant refactoring.

josharian · 2017-04-06T13:04:28Z

have static initialization track and reuse autotmps

I hacked this in with some globals. There's lots of autotmp reuse, but (of course) the exact variable in use disappears by the time we reach liveness anyway, so it doesn't help with the liveness issue. And it doesn't help as much as you'd hope with the speed of getting to the panic. So for now, I think that is a dead end.

josharian · 2017-04-09T22:19:34Z

Yes, the write barrier pass creates many blocks. I filed issue #19838 for grouping write barriers more aggresively. But it actually doesn't help here: there are calls (newobject, mapassign_faststr, makemap) between each write barrier.

On a second look, I think it would. Though there are calls between each writebarrier-enabled check, calls don't create a new SSA block. Basically all the blocks in this function come from write barriers, and the size of the problematic alloc is directly proportional to the number of blocks.

That's not to say that if we fix this there wouldn't also be problems with the size of the generated stackmap. But it's possible those could be fruitfully addressed (for all functions, not just this one) by doing liveness compaction as we go, rather than all at the end.

Sample SSA output for a random chunk from the middle of this function, at the point that it arrives at liveness:

  b1789: <- b1787 b1788
    v815 = Phi <mem> v1968012 v1968006
    v1977408 = LEAQ <*uint8> {type.map[string]*"".Targets} v3 : AX
    v817 = MOVQstore <mem> v2 v1977408 v815
    v1977356 = LoadReg <map[string]*Targets> v2901053 : CX
    v818 = MOVQstore <mem> [8] v2 v1977356 v817
    v1977322 = LEAQ <*uint8> {go.string."0am"} v3 : DX
    v2374727 = MOVQstore <mem> [16] v2 v1977322 v818
    v819 = MOVQstoreconst <mem> [val=3,off=24] v2 v2374727
    v820 = CALLstatic <mem> {runtime.mapassign_faststr} [40] v819
    v821 = MOVQload <**Targets> [32] v2 v820 : AX
    v822 = LoweredNilCheck <void> v821 v820
    v1968089 = MOVLload <uint32> {runtime.writeBarrier} v3 v820 : CX
    v2705012 = TESTL <flags> v1968089 v1968089
    NE v2705012 -> b1784 b1785 (unlikely)
  b1785: <- b1789
    v2702048 = LoadReg <*Targets> v1977553 : CX
    v1968053 = MOVQstore <mem> v821 v2702048 v820
    Plain -> b1786
  b1786: <- b1784 b1785
    v823 = Phi <mem> v1968054 v1968053
    v2702059 = LEAQ <*uint8> {type."".Targets} v3 : AX
    v826 = MOVQstore <mem> v2 v2702059 v823
    v827 = CALLstatic <mem> {runtime.newobject} [16] v826
    v828 = MOVQload <*Targets> [8] v2 v827 : AX
    v2702051 = StoreReg <*Targets> v828 : .autotmp_327651[*Targets]
    v2900939 = LEAQ <*uint8> {type.map[string]bool} v3 : CX
    v832 = MOVQstore <mem> v2 v2900939 v827
    v833 = MOVQstoreconst <mem> [val=1,off=8] v2 v832
    v834 = MOVQstoreconst <mem> [val=0,off=16] v2 v833
    v835 = MOVQstoreconst <mem> [val=0,off=24] v2 v834
    v836 = CALLstatic <mem> {runtime.makemap} [40] v835
    v837 = MOVQload <map[string]bool> [32] v2 v836 : AX
    v1977227 = StoreReg <map[string]bool> v837 : .autotmp_327673[map[string]bool]
    v1977261 = LEAQ <*uint8> {type.map[string]bool} v3 : CX
    v840 = MOVQstore <mem> v2 v1977261 v836
    v841 = MOVQstore <mem> [8] v2 v837 v840
    v2702040 = LEAQ <*uint8> {go.string."by.0atz.com"} v3 : DX
    v2374725 = MOVQstore <mem> [16] v2 v2702040 v841
    v842 = MOVQstoreconst <mem> [val=11,off=24] v2 v2374725
    v843 = CALLstatic <mem> {runtime.mapassign_faststr} [40] v842
    v844 = MOVQload <*bool> [32] v2 v843 : AX
    v846 = MOVBstoreconst <mem> [val=1,off=0] v844 v843
    v2900937 = LoadReg <*Targets> v2702051 : AX
    v847 = LoweredNilCheck <void> v2900937 v846
    v1968131 = MOVLload <uint32> {runtime.writeBarrier} v3 v846 : CX
    v848 = ADDQconst <*map[string]bool> [64] v2900937 : DX
    v2705010 = TESTL <flags> v1968131 v1968131
    NE v2705010 -> b1781 b1782 (unlikely)
  b1782: <- b1786
    v2702032 = LoadReg <map[string]bool> v1977227 : CX
    v1968096 = MOVQstore <mem> [64] v2900937 v2702032 v846
    Plain -> b1783
  b1783: <- b1781 b1782
    v849 = Phi <mem> v1968097 v1968096
    v2702043 = LEAQ <*uint8> {type.map[string]*"".Targets} v3 : AX
    v851 = MOVQstore <mem> v2 v2702043 v849
    v2900935 = LoadReg <map[string]*Targets> v2901053 : CX
    v852 = MOVQstore <mem> [8] v2 v2900935 v851
    v1977175 = LEAQ <*uint8> {go.string."0atz"} v3 : DX
    v2375035 = MOVQstore <mem> [16] v2 v1977175 v852
    v853 = MOVQstoreconst <mem> [val=4,off=24] v2 v2375035
    v854 = CALLstatic <mem> {runtime.mapassign_faststr} [40] v853
    v855 = MOVQload <**Targets> [32] v2 v854 : AX
    v856 = LoweredNilCheck <void> v855 v854
    v1968174 = MOVLload <uint32> {runtime.writeBarrier} v3 v854 : CX
    v2705008 = TESTL <flags> v1968174 v1968174
    NE v2705008 -> b1778 b1779 (unlikely)
  b1779: <- b1783
    v2702024 = LoadReg <*Targets> v2702051 : CX
    v1968142 = MOVQstore <mem> v855 v2702024 v854
    Plain -> b1780
  b1780: <- b1778 b1779
    v857 = Phi <mem> v1968150 v1968142
    v2702035 = LEAQ <*uint8> {type."".Targets} v3 : AX
    v860 = MOVQstore <mem> v2 v2702035 v857
    v861 = CALLstatic <mem> {runtime.newobject} [16] v860
    v862 = MOVQload <*Targets> [8] v2 v861 : AX
    v2702016 = StoreReg <*Targets> v862 : .autotmp_327651[*Targets]
    v2900933 = LEAQ <*uint8> {type.map[string]bool} v3 : CX
    v866 = MOVQstore <mem> v2 v2900933 v861
    v867 = MOVQstoreconst <mem> [val=1,off=8] v2 v866
    v868 = MOVQstoreconst <mem> [val=0,off=16] v2 v867
    v869 = MOVQstoreconst <mem> [val=0,off=24] v2 v868
    v870 = CALLstatic <mem> {runtime.makemap} [40] v869
    v871 = MOVQload <map[string]bool> [32] v2 v870 : AX
    v2900931 = StoreReg <map[string]bool> v871 : .autotmp_327674[map[string]bool]
    v1977132 = LEAQ <*uint8> {type.map[string]bool} v3 : CX
    v874 = MOVQstore <mem> v2 v1977132 v870
    v875 = MOVQstore <mem> [8] v2 v871 v874
    v1977080 = LEAQ <*uint8> {go.string."x.0b00000000.me"} v3 : DX
    v2374719 = MOVQstore <mem> [16] v2 v1977080 v875
    v876 = MOVQstoreconst <mem> [val=15,off=24] v2 v2374719
    v877 = CALLstatic <mem> {runtime.mapassign_faststr} [40] v876
    v878 = MOVQload <*bool> [32] v2 v877 : AX
    v880 = MOVBstoreconst <mem> [val=1,off=0] v878 v877
    v2702027 = LoadReg <*Targets> v2702016 : AX
    v881 = LoweredNilCheck <void> v2702027 v880
    v1968218 = MOVLload <uint32> {runtime.writeBarrier} v3 v880 : CX
    v882 = ADDQconst <*map[string]bool> [64] v2702027 : DX
    v2705006 = TESTL <flags> v1968218 v1968218
    NE v2705006 -> b1775 b1776 (unlikely)
  b1776: <- b1780
    v1977046 = LoadReg <map[string]bool> v2900931 : CX
    v1968182 = MOVQstore <mem> [64] v2702027 v1977046 v880
    Plain -> b1777
  b1777: <- b1775 b1776
    v883 = Phi <mem> v1968183 v1968182
    v2702008 = LEAQ <*uint8> {type.map[string]*"".Targets} v3 : AX
    v885 = MOVQstore <mem> v2 v2702008 v883
    v2702019 = LoadReg <map[string]*Targets> v2901053 : CX
    v886 = MOVQstore <mem> [8] v2 v2702019 v885
    v2900929 = LEAQ <*uint8> {go.string."0b00000000"} v3 : DX
    v2374717 = MOVQstore <mem> [16] v2 v2900929 v886
    v887 = MOVQstoreconst <mem> [val=10,off=24] v2 v2374717
    v888 = CALLstatic <mem> {runtime.mapassign_faststr} [40] v887
    v889 = MOVQload <**Targets> [32] v2 v888 : AX
    v890 = LoweredNilCheck <void> v889 v888
    v1968260 = MOVLload <uint32> {runtime.writeBarrier} v3 v888 : CX
    v2705004 = TESTL <flags> v1968260 v1968260
    NE v2705004 -> b1772 b1773 (unlikely)

When using a concurrent backend, the overall compilation time is bounded in part by the slowest function to compile. The number of top-level statements in a function is an easily calculated and fairly reliable proxy for compilation time. Here's a standard compilecmp output for -c=8 with this CL: name old time/op new time/op delta Template 127ms ± 4% 125ms ± 6% -1.33% (p=0.000 n=47+50) Unicode 84.8ms ± 4% 84.5ms ± 4% ~ (p=0.217 n=49+49) GoTypes 289ms ± 3% 287ms ± 3% -0.78% (p=0.002 n=48+50) Compiler 1.36s ± 3% 1.34s ± 2% -1.29% (p=0.000 n=49+47) SSA 2.95s ± 3% 2.77s ± 4% -6.23% (p=0.000 n=50+49) Flate 70.7ms ± 3% 70.9ms ± 2% ~ (p=0.112 n=50+49) GoParser 85.0ms ± 3% 83.0ms ± 4% -2.31% (p=0.000 n=48+49) Reflect 229ms ± 3% 225ms ± 4% -1.83% (p=0.000 n=49+49) Tar 70.2ms ± 3% 69.4ms ± 3% -1.17% (p=0.000 n=49+49) XML 115ms ± 7% 114ms ± 6% ~ (p=0.158 n=49+47) name old user-time/op new user-time/op delta Template 352ms ± 5% 342ms ± 8% -2.74% (p=0.000 n=49+50) Unicode 117ms ± 5% 118ms ± 4% +0.88% (p=0.005 n=46+48) GoTypes 986ms ± 3% 980ms ± 4% ~ (p=0.110 n=46+48) Compiler 4.39s ± 2% 4.43s ± 4% +0.97% (p=0.002 n=50+50) SSA 12.0s ± 2% 13.3s ± 3% +11.33% (p=0.000 n=49+49) Flate 222ms ± 5% 219ms ± 6% -1.56% (p=0.002 n=50+50) GoParser 271ms ± 5% 268ms ± 4% -0.83% (p=0.036 n=49+48) Reflect 560ms ± 4% 571ms ± 3% +1.90% (p=0.000 n=50+49) Tar 183ms ± 3% 183ms ± 3% ~ (p=0.903 n=45+50) XML 364ms ±13% 391ms ± 4% +7.16% (p=0.000 n=50+40) A more interesting way of viewing the data is by looking at the ratio of the time taken to compile the slowest-to-compile function to the overall time spent compiling functions. If this ratio is small (near 0), then increased concurrency might help. If this ratio is big (near 1), then we're bounded by that single function. I instrumented the compiler to emit this ratio per-package, ran 'go build -a -gcflags=-c=C -p=P std cmd' three times, for varying values of C and P, and collected the ratios encountered into an ASCII histogram. Here's c=1 p=1, which is a non-concurrent backend, single process at a time: 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%|** 10%|*** 0%|********* ----+---------- |0123456789 The x-axis is floor(10*ratio), so the first column indicates the percent of ratios that fell in the 0% to 9.9999% range. We can see in this histogram that more concurrency will help; in most cases, the ratio is small. Here's c=8 p=1, before this CL: 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%|* * * 0%|********** ----+---------- |0123456789 In 30-40% of cases, we're mostly bound by the compilation time of a single function. Here's c=8 p=1, after this CL: 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 The sorting pays off; we are bound by the compilation time of a single function in over half of packages. The single * in the histogram indicates 0-10%. The actual values for this chart are: 0: 5%, 1: 1%, 2: 1%, 3: 4%, 4: 5%, 5: 7%, 6: 7%, 7: 7%, 8: 9%, 9: 55% This indicates that efforts to increase or enable more concurrency, e.g. by optimizing mutexes or increasing the value of c, will probably not yield fruit. That matches what compilecmp tells us. Further optimization efforts should thus focus instead on one of: (1) making more functions compile concurrently (2) improving the compilation time of the slowest functions (3) speeding up the remaining serial parts of the compiler (4) automatically splitting up some large autogenerated functions into small ones, as discussed in golang#19751 I hope to spend more time on (1) before the freeze. Adding process parallelism doesn't change the story much. For example, here's c=8 p=8, after this CL: 90%| 80%| 70%| 60%| 50%| 40%| * 30%| * 20%| * 10%| *** 0%|********** ----+---------- |0123456789 Since we don't need to worry much about p, these histograms can help us select a good general value of c to use as a default, assuming we're not bounded by GOMAXPROCS. Here are some charts after this CL, for c from 1 to 8: c=1 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%|** 10%|*** 0%|********* ----+---------- |0123456789 c=2 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%| 10%| **** * 0%|********** ----+---------- |0123456789 c=3 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%| * 10%| ** * * 0%|********** ----+---------- |0123456789 c=4 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%| * * 0%|********** ----+---------- |0123456789 c=5 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%| * * 0%|********** ----+---------- |0123456789 c=6 p=1 90%| 80%| 70%| 60%| 50%| 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 c=7 p=1 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| ** 0%|********** ----+---------- |0123456789 c=8 p=1 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 Given the increased user-CPU costs as c increases, it looks like c=4 is probably the sweet spot, at least for now. Pleasingly, this matches (and explains) the results of the standard benchmarking that I have done. Change-Id: I82b606c06efd34a5dbd1afdbcf66a605905b2aeb

gopherbot · 2017-04-21T18:18:24Z

CL https://golang.org/cl/41192 mentions this issue.

When using a concurrent backend, the overall compilation time is bounded in part by the slowest function to compile. The number of top-level statements in a function is an easily calculated and fairly reliable proxy for compilation time. Here's a standard compilecmp output for -c=8 with this CL: name old time/op new time/op delta Template 127ms ± 4% 125ms ± 6% -1.33% (p=0.000 n=47+50) Unicode 84.8ms ± 4% 84.5ms ± 4% ~ (p=0.217 n=49+49) GoTypes 289ms ± 3% 287ms ± 3% -0.78% (p=0.002 n=48+50) Compiler 1.36s ± 3% 1.34s ± 2% -1.29% (p=0.000 n=49+47) SSA 2.95s ± 3% 2.77s ± 4% -6.23% (p=0.000 n=50+49) Flate 70.7ms ± 3% 70.9ms ± 2% ~ (p=0.112 n=50+49) GoParser 85.0ms ± 3% 83.0ms ± 4% -2.31% (p=0.000 n=48+49) Reflect 229ms ± 3% 225ms ± 4% -1.83% (p=0.000 n=49+49) Tar 70.2ms ± 3% 69.4ms ± 3% -1.17% (p=0.000 n=49+49) XML 115ms ± 7% 114ms ± 6% ~ (p=0.158 n=49+47) name old user-time/op new user-time/op delta Template 352ms ± 5% 342ms ± 8% -2.74% (p=0.000 n=49+50) Unicode 117ms ± 5% 118ms ± 4% +0.88% (p=0.005 n=46+48) GoTypes 986ms ± 3% 980ms ± 4% ~ (p=0.110 n=46+48) Compiler 4.39s ± 2% 4.43s ± 4% +0.97% (p=0.002 n=50+50) SSA 12.0s ± 2% 13.3s ± 3% +11.33% (p=0.000 n=49+49) Flate 222ms ± 5% 219ms ± 6% -1.56% (p=0.002 n=50+50) GoParser 271ms ± 5% 268ms ± 4% -0.83% (p=0.036 n=49+48) Reflect 560ms ± 4% 571ms ± 3% +1.90% (p=0.000 n=50+49) Tar 183ms ± 3% 183ms ± 3% ~ (p=0.903 n=45+50) XML 364ms ±13% 391ms ± 4% +7.16% (p=0.000 n=50+40) A more interesting way of viewing the data is by looking at the ratio of the time taken to compile the slowest-to-compile function to the overall time spent compiling functions. If this ratio is small (near 0), then increased concurrency might help. If this ratio is big (near 1), then we're bounded by that single function. I instrumented the compiler to emit this ratio per-package, ran 'go build -a -gcflags=-c=C -p=P std cmd' three times, for varying values of C and P, and collected the ratios encountered into an ASCII histogram. Here's c=1 p=1, which is a non-concurrent backend, single process at a time: 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%|** 10%|*** 0%|********* ----+---------- |0123456789 The x-axis is floor(10*ratio), so the first column indicates the percent of ratios that fell in the 0% to 9.9999% range. We can see in this histogram that more concurrency will help; in most cases, the ratio is small. Here's c=8 p=1, before this CL: 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%|* * * 0%|********** ----+---------- |0123456789 In 30-40% of cases, we're mostly bound by the compilation time of a single function. Here's c=8 p=1, after this CL: 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 The sorting pays off; we are bound by the compilation time of a single function in over half of packages. The single * in the histogram indicates 0-10%. The actual values for this chart are: 0: 5%, 1: 1%, 2: 1%, 3: 4%, 4: 5%, 5: 7%, 6: 7%, 7: 7%, 8: 9%, 9: 55% This indicates that efforts to increase or enable more concurrency, e.g. by optimizing mutexes or increasing the value of c, will probably not yield fruit. That matches what compilecmp tells us. Further optimization efforts should thus focus instead on one of: (1) making more functions compile concurrently (2) improving the compilation time of the slowest functions (3) speeding up the remaining serial parts of the compiler (4) automatically splitting up some large autogenerated functions into small ones, as discussed in golang#19751 I hope to spend more time on (1) before the freeze. Adding process parallelism doesn't change the story much. For example, here's c=8 p=8, after this CL: 90%| 80%| 70%| 60%| 50%| 40%| * 30%| * 20%| * 10%| *** 0%|********** ----+---------- |0123456789 Since we don't need to worry much about p, these histograms can help us select a good general value of c to use as a default, assuming we're not bounded by GOMAXPROCS. Here are some charts after this CL, for c from 1 to 8: c=1 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%|** 10%|*** 0%|********* ----+---------- |0123456789 c=2 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%| 10%| **** * 0%|********** ----+---------- |0123456789 c=3 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%| * 10%| ** * * 0%|********** ----+---------- |0123456789 c=4 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%| * * 0%|********** ----+---------- |0123456789 c=5 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%| * * 0%|********** ----+---------- |0123456789 c=6 p=1 90%| 80%| 70%| 60%| 50%| 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 c=7 p=1 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| ** 0%|********** ----+---------- |0123456789 c=8 p=1 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 Given the increased user-CPU costs as c increases, it looks like c=4 is probably the sweet spot, at least for now. Pleasingly, this matches (and explains) the results of the standard benchmarking that I have done. Change-Id: I82b606c06efd34a5dbd1afdbcf66a605905b2aeb

When using a concurrent backend, the overall compilation time is bounded in part by the slowest function to compile. The number of top-level statements in a function is an easily calculated and fairly reliable proxy for compilation time. Here's a standard compilecmp output for -c=8 with this CL: name old time/op new time/op delta Template 127ms ± 4% 125ms ± 6% -1.33% (p=0.000 n=47+50) Unicode 84.8ms ± 4% 84.5ms ± 4% ~ (p=0.217 n=49+49) GoTypes 289ms ± 3% 287ms ± 3% -0.78% (p=0.002 n=48+50) Compiler 1.36s ± 3% 1.34s ± 2% -1.29% (p=0.000 n=49+47) SSA 2.95s ± 3% 2.77s ± 4% -6.23% (p=0.000 n=50+49) Flate 70.7ms ± 3% 70.9ms ± 2% ~ (p=0.112 n=50+49) GoParser 85.0ms ± 3% 83.0ms ± 4% -2.31% (p=0.000 n=48+49) Reflect 229ms ± 3% 225ms ± 4% -1.83% (p=0.000 n=49+49) Tar 70.2ms ± 3% 69.4ms ± 3% -1.17% (p=0.000 n=49+49) XML 115ms ± 7% 114ms ± 6% ~ (p=0.158 n=49+47) name old user-time/op new user-time/op delta Template 352ms ± 5% 342ms ± 8% -2.74% (p=0.000 n=49+50) Unicode 117ms ± 5% 118ms ± 4% +0.88% (p=0.005 n=46+48) GoTypes 986ms ± 3% 980ms ± 4% ~ (p=0.110 n=46+48) Compiler 4.39s ± 2% 4.43s ± 4% +0.97% (p=0.002 n=50+50) SSA 12.0s ± 2% 13.3s ± 3% +11.33% (p=0.000 n=49+49) Flate 222ms ± 5% 219ms ± 6% -1.56% (p=0.002 n=50+50) GoParser 271ms ± 5% 268ms ± 4% -0.83% (p=0.036 n=49+48) Reflect 560ms ± 4% 571ms ± 3% +1.90% (p=0.000 n=50+49) Tar 183ms ± 3% 183ms ± 3% ~ (p=0.903 n=45+50) XML 364ms ±13% 391ms ± 4% +7.16% (p=0.000 n=50+40) A more interesting way of viewing the data is by looking at the ratio of the time taken to compile the slowest-to-compile function to the overall time spent compiling functions. If this ratio is small (near 0), then increased concurrency might help. If this ratio is big (near 1), then we're bounded by that single function. I instrumented the compiler to emit this ratio per-package, ran 'go build -a -gcflags=-c=C -p=P std cmd' three times, for varying values of C and P, and collected the ratios encountered into an ASCII histogram. Here's c=1 p=1, which is a non-concurrent backend, single process at a time: 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%|** 10%|*** 0%|********* ----+---------- |0123456789 The x-axis is floor(10*ratio), so the first column indicates the percent of ratios that fell in the 0% to 9.9999% range. We can see in this histogram that more concurrency will help; in most cases, the ratio is small. Here's c=8 p=1, before this CL: 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%|* * * 0%|********** ----+---------- |0123456789 In 30-40% of cases, we're mostly bound by the compilation time of a single function. Here's c=8 p=1, after this CL: 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 The sorting pays off; we are bound by the compilation time of a single function in over half of packages. The single * in the histogram indicates 0-10%. The actual values for this chart are: 0: 5%, 1: 1%, 2: 1%, 3: 4%, 4: 5%, 5: 7%, 6: 7%, 7: 7%, 8: 9%, 9: 55% This indicates that efforts to increase or enable more concurrency, e.g. by optimizing mutexes or increasing the value of c, will probably not yield fruit. That matches what compilecmp tells us. Further optimization efforts should thus focus instead on one of: (1) making more functions compile concurrently (2) improving the compilation time of the slowest functions (3) speeding up the remaining serial parts of the compiler (4) automatically splitting up some large autogenerated functions into small ones, as discussed in #19751 I hope to spend more time on (1) before the freeze. Adding process parallelism doesn't change the story much. For example, here's c=8 p=8, after this CL: 90%| 80%| 70%| 60%| 50%| 40%| * 30%| * 20%| * 10%| *** 0%|********** ----+---------- |0123456789 Since we don't need to worry much about p, these histograms can help us select a good general value of c to use as a default, assuming we're not bounded by GOMAXPROCS. Here are some charts after this CL, for c from 1 to 8: c=1 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%|** 10%|*** 0%|********* ----+---------- |0123456789 c=2 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%| 10%| **** * 0%|********** ----+---------- |0123456789 c=3 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%| * 10%| ** * * 0%|********** ----+---------- |0123456789 c=4 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%| * * 0%|********** ----+---------- |0123456789 c=5 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%| * * 0%|********** ----+---------- |0123456789 c=6 p=1 90%| 80%| 70%| 60%| 50%| 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 c=7 p=1 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| ** 0%|********** ----+---------- |0123456789 c=8 p=1 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 Given the increased user-CPU costs as c increases, it looks like c=4 is probably the sweet spot, at least for now. Pleasingly, this matches (and explains) the results of the standard benchmarking that I have done. Updates #15756 Change-Id: I82b606c06efd34a5dbd1afdbcf66a605905b2aeb Reviewed-on: https://go-review.googlesource.com/41192 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Robert Griesemer <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]>

josharian · 2017-04-27T21:05:31Z

A possible alternative to splitting these init functions up (which I have tried and failed at multiple times) is to try to prevent generating some of these problematic write barrier calls by avoiding the mapmake calls by emitting some static empty map buckets to fill in.

gopherbot · 2017-04-30T17:25:17Z

CL https://golang.org/cl/42178 mentions this issue.

DO NOT SUBMIT [I am not sure the best way to identify whether a particular OMAPLIT is safe for a static hmap; see the TODO in the CL] Prior to this CL, we did code generation for a map literal like this: m := make(maptype, number of elements) m[k] = v // for all elements, sometimes using a loop The assignment to m generates a write barrier. Each write barrier generates a new SSA block. In giant composite literals, this can lead to lots of blocks. This led to a panic when compiling the code in golang#19751. This CL changes code generation when compiling a static composite literal to do: var stat hmap // static symbol hp := &stat m := (map)(hp) initmap(hp, number of elements) m[k] = v This involves no assignments, and thus no write barriers. As a result, the code in golang#19751 compiles. Slowly still, but the compilation completes. Change-Id: Ic0c05f5892e50fadc5e6d6adb5a3f5275ec38224

josharian · 2017-05-18T18:12:27Z

Looks like CL 42178 is unlikely to make 1.9. Moving to 1.10.

josharian added the ToolSpeed label Mar 29, 2017

josharian added this to the Go1.9Maybe milestone Mar 29, 2017

josharian changed the title ~~cmd/compile: Panic building very large go files~~ cmd/compile: slow to compile large map literals containing dynamic elements Mar 29, 2017

josharian mentioned this issue Apr 3, 2017

Add torture tests rsc/compilebench#7

Closed

josharian mentioned this issue Apr 24, 2017

cmd/compile: big binary and slow compilation times with maps & []interface{} in static code #20095

Open

josharian modified the milestones: Go1.10, Go1.9Maybe May 18, 2017

randall77 mentioned this issue Aug 22, 2017

cmd/compile: reuse temporaries introduced by compiler #21557

Closed

bradfitz modified the milestones: Go1.10, Go1.11 Nov 29, 2017

gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018

bcmills mentioned this issue Apr 20, 2021

spec: generics: use type sets to remove type keyword in constraints #45346

Closed

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: slow to compile large map literals containing dynamic elements #19751

cmd/compile: slow to compile large map literals containing dynamic elements #19751

myleshorton commented Mar 28, 2017

myleshorton commented Mar 28, 2017

josharian commented Mar 29, 2017

josharian commented Mar 30, 2017 •

edited

Loading

josharian commented Apr 3, 2017

randall77 commented Apr 3, 2017

josharian commented Apr 3, 2017

gopherbot commented Apr 3, 2017

gopherbot commented Apr 3, 2017

josharian commented Apr 3, 2017

gopherbot commented Apr 3, 2017

randall77 commented Apr 3, 2017

josharian commented Apr 3, 2017

gopherbot commented Apr 3, 2017

josharian commented Apr 4, 2017

cherrymui commented Apr 4, 2017

ghost commented Apr 4, 2017

josharian commented Apr 5, 2017

josharian commented Apr 6, 2017

josharian commented Apr 9, 2017

gopherbot commented Apr 21, 2017

josharian commented Apr 27, 2017

gopherbot commented Apr 30, 2017

josharian commented May 18, 2017

cmd/compile: slow to compile large map literals containing dynamic elements #19751

cmd/compile: slow to compile large map literals containing dynamic elements #19751

Comments

myleshorton commented Mar 28, 2017

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

myleshorton commented Mar 28, 2017

josharian commented Mar 29, 2017

josharian commented Mar 30, 2017 • edited Loading

josharian commented Apr 3, 2017

randall77 commented Apr 3, 2017

josharian commented Apr 3, 2017

gopherbot commented Apr 3, 2017

gopherbot commented Apr 3, 2017

josharian commented Apr 3, 2017

gopherbot commented Apr 3, 2017

randall77 commented Apr 3, 2017

josharian commented Apr 3, 2017

gopherbot commented Apr 3, 2017

josharian commented Apr 4, 2017

cherrymui commented Apr 4, 2017

ghost commented Apr 4, 2017

josharian commented Apr 5, 2017

josharian commented Apr 6, 2017

josharian commented Apr 9, 2017

gopherbot commented Apr 21, 2017

josharian commented Apr 27, 2017

gopherbot commented Apr 30, 2017

josharian commented May 18, 2017

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

josharian commented Mar 30, 2017 •

edited

Loading