Skip to content
This repository was archived by the owner on Nov 1, 2022. It is now read-only.

git notes operation times out #2082

Closed
squaremo opened this issue May 23, 2019 · 7 comments · Fixed by #2086
Closed

git notes operation times out #2082

squaremo opened this issue May 23, 2019 · 7 comments · Fixed by #2086

Comments

@squaremo
Copy link
Member

squaremo commented May 23, 2019

To recover the set of events represented by a sync, fluxd looks at the notes made in git. With a large enough git repo, this can take longer than the (pretty generous) default git timeout.

Scratch that: it takes a fraction of a second to run that command, in the pod. I suspect instead there is a deadlock in execGitCmd that leads to the operation timing out -- the problem started on the day that #2054 was merged :-S

@squaremo squaremo changed the title git notes operation can time out fairly easily git notes operation times out May 23, 2019
@squaremo
Copy link
Member Author

I can't see it would deadlock, given that only Write and ReadFrom enter the mutex, and they don't call each other.

@2opremio
Copy link
Contributor

@squaremo can you reproduce? Did you get a goroutine listing?

@squaremo
Copy link
Member Author

No, and unfortunately it would be tricky to get a useful stack dump. This happened in our dev environment, and it would be necessary to signal it while it was in the middle of the operation.

My guess right now is that it's from having a large (larger than the starting size of the buffer?) output. The output on stdout of git notes list for our dev environment is about 8000 lines.

@squaremo
Copy link
Member Author

I'll try to write a main() to reproduce it.

@squaremo
Copy link
Member Author

squaremo commented May 23, 2019

If you compile this: https://gist.github.com/squaremo/68f75a34768195e79333eb338b329f8d
You'll get a program which exhibits the problem we're seeing in dev. You may need to run it from in a git repo with a lot of notes!

Further note: if you pass nil instead of os.Stdout to execGitCmd in main(), it succeeds trivially. So it is the combination of the io.MultiWriter and the mutex, I suspect.

@squaremo
Copy link
Member Author

Stack dump:

SIGABRT: abort
PC=0x46ee98 m=0 sigcode=0

goroutine 1 [syscall]:
syscall.Syscall6(0xf7, 0x1, 0x7696, 0xc00009abe8, 0x1000004, 0x0, 0x0, 0x60, 0xc00001c2a0, 0x0)
	/usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5 fp=0xc00009ab90 sp=0xc00009ab88 pc=0x46ee75
os.(*Process).blockUntilWaitable(0xc0000164e0, 0x5, 0xc00009ad50, 0xc0000164e0)
	/usr/local/go/src/os/wait_waitid.go:31 +0x98 fp=0xc00009ac88 sp=0xc00009ab90 pc=0x47d1f8
os.(*Process).wait(0xc0000164e0, 0x4c27c0, 0x1, 0xc00000e2a0)
	/usr/local/go/src/os/exec_unix.go:22 +0x39 fp=0xc00009ad00 sp=0xc00009ac88 pc=0x47ad29
os.(*Process).Wait(0xc0000164e0, 0x4e3990, 0x4e3998, 0x4e3988)
	/usr/local/go/src/os/exec.go:125 +0x2b fp=0xc00009ad30 sp=0xc00009ad00 pc=0x47a3ab
os/exec.(*Cmd).Wait(0xc000094000, 0x0, 0x0)
	/usr/local/go/src/os/exec/exec.go:465 +0x5b fp=0xc00009ada8 sp=0xc00009ad30 pc=0x4a221b
os/exec.(*Cmd).Run(0xc000094000, 0x2, 0x2)
	/usr/local/go/src/os/exec/exec.go:309 +0x5c fp=0xc00009add0 sp=0xc00009ada8 pc=0x4a173c
main.execGitCmd(0x4f3000, 0xc000068180, 0x4f2d60, 0xc00000c018, 0xc000026100, 0x4, 0x4, 0x0, 0x4e1639)
	/home/mikeb/scratch/blorp.go:52 +0x104 fp=0xc00009af18 sp=0xc00009add0 pc=0x4a3ab4
main.main()
	/home/mikeb/scratch/blorp.go:17 +0x112 fp=0xc00009af98 sp=0xc00009af18 pc=0x4a36f2
runtime.main()
	/usr/local/go/src/runtime/proc.go:201 +0x207 fp=0xc00009afe0 sp=0xc00009af98 pc=0x429767
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc00009afe8 sp=0xc00009afe0 pc=0x4538f1

goroutine 6 [semacquire]:
sync.runtime_SemacquireMutex(0xc0000a406c, 0x1000)
	/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc0000a4068)
	/usr/local/go/src/sync/mutex.go:134 +0xff
main.(*threadSafeBuffer).Write(0xc0000a4000, 0xc0000b0000, 0x1000, 0x8000, 0x0, 0x0, 0x0)
	/home/mikeb/scratch/blorp.go:30 +0x4b
io.(*multiWriter).Write(0xc00000a0a0, 0xc0000b0000, 0x1000, 0x8000, 0x1000, 0x0, 0x0)
	/usr/local/go/src/io/multi.go:60 +0x87
io.copyBuffer(0x4f2ce0, 0xc00000a0a0, 0x4f2d40, 0xc00000c040, 0xc0000b0000, 0x8000, 0x8000, 0x4f3000, 0xc000068180, 0x4f2d60)
	/usr/local/go/src/io/io.go:404 +0x201
io.Copy(0x4f2ce0, 0xc00000a0a0, 0x4f2d40, 0xc00000c040, 0x4f3000, 0x0, 0xc000068180)
	/usr/local/go/src/io/io.go:364 +0x5a
os/exec.(*Cmd).writerDescriptor.func1(0x0, 0xc00001c0c0)
	/usr/local/go/src/os/exec/exec.go:279 +0x4d
os/exec.(*Cmd).Start.func1(0xc000094000, 0xc00000a0e0)
	/usr/local/go/src/os/exec/exec.go:400 +0x27
created by os/exec.(*Cmd).Start
	/usr/local/go/src/os/exec/exec.go:399 +0x5af

goroutine 7 [IO wait]:
internal/poll.runtime_pollWait(0x7f5fda0f5d00, 0x72, 0xc00004aca0)
	/usr/local/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000068318, 0x72, 0xffffffffffffff01, 0x4f2e40, 0x578120)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000068318, 0xc0000c8001, 0x200, 0x200)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc000068300, 0xc0000c8000, 0x200, 0x200, 0x0, 0x0, 0x0)
	/usr/local/go/src/internal/poll/fd_unix.go:169 +0x179
os.(*File).read(0xc00000c058, 0xc0000c8000, 0x200, 0x200, 0xc0000c8000, 0x0, 0x0)
	/usr/local/go/src/os/file_unix.go:249 +0x4e
os.(*File).Read(0xc00000c058, 0xc0000c8000, 0x200, 0x200, 0x4d4ff8, 0xc00004ae30, 0xc00004ae18)
	/usr/local/go/src/os/file.go:108 +0x69
bytes.(*Buffer).ReadFrom(0xc0000a4000, 0x4f2d40, 0xc00000c058, 0x4c1720, 0x4d4fc0, 0x1)
	/usr/local/go/src/bytes/buffer.go:206 +0xbd
main.(*threadSafeBuffer).ReadFrom(0xc0000a4000, 0x4f2d40, 0xc00000c058, 0x0, 0x0, 0x0)
	/home/mikeb/scratch/blorp.go:38 +0x93
io.copyBuffer(0x4f2d00, 0xc0000a4000, 0x4f2d40, 0xc00000c058, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go/src/io/io.go:388 +0x303
io.Copy(0x4f2d00, 0xc0000a4000, 0x4f2d40, 0xc00000c058, 0x0, 0x0, 0x0)
	/usr/local/go/src/io/io.go:364 +0x5a
os/exec.(*Cmd).writerDescriptor.func1(0x0, 0x0)
	/usr/local/go/src/os/exec/exec.go:279 +0x4d
os/exec.(*Cmd).Start.func1(0xc000094000, 0xc00000a120)
	/usr/local/go/src/os/exec/exec.go:400 +0x27
created by os/exec.(*Cmd).Start
	/usr/local/go/src/os/exec/exec.go:399 +0x5af

goroutine 8 [select]:
os/exec.(*Cmd).Start.func2(0xc000094000)
	/usr/local/go/src/os/exec/exec.go:407 +0xc4
created by os/exec.(*Cmd).Start
	/usr/local/go/src/os/exec/exec.go:406 +0x64d

rax    0xf7
rbx    0x0
rcx    0x46ee9a
rdx    0xc00009abe8
rdi    0x1
rsi    0x7696
rbp    0xc00009ac78
rsp    0xc00009ab88
r8     0x0
r9     0x0
r10    0x1000004
r11    0x212
r12    0x30
r13    0x11
r14    0x4f0eac
r15    0x0
rip    0x46ee98
rflags 0x212
cs     0x33
fs     0x0
gs     0x0

@squaremo
Copy link
Member Author

squaremo commented May 23, 2019

My reading of this is that the ReadFrom holds the lock while it blocks on its source io.Reader (probably the stderr), while the multiwriter is trying to Write. Since nothing is forthcoming on stderr, nothing can be written.

Why does it use Write in one place and ReadFrom in another? In both cases it's using io.Copy to convey bytes from the child process to the given io.Writers. But io.Copy has an optimisation: if the destination has a method ReadFrom, it'll use that instead of a loop. bytes.Buffer has a ReadFrom, but io.multiWriter doesn't.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants