Lock all PAM operations to the startup thread #4133
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes the
pam_loginuid.so
flakiness.Tested with:
pam_loginuid.so
configured in PAM policytsh ssh user@ubuntu true
1000 times in a loopWithout the fix, this setup reproduced the sporadic problem. With the fix, no errors reported.
Below is the comment I left in the code:
Lock all PAM commands to the startup thread. From LockOSThread docs:
This is needed for pam_loginuid.so. It writes "/proc/self/loginuid"
which, on Linux, depends on being called from a specific thread. If
it's not running on the right thread, pam_loginuid.so may fail with
EPERM sporadically.
The kernel does some validation based on the thread context. I could
not find what the kernel uses specifically. Some relevant code:
https://github.com/torvalds/linux/blob/9d99b1647fa56805c1cfef2d81ee7b9855359b62/kernel/audit.c#L2284-L2317
Locking to the startup thread seems to make the kernel happy.
By the time pam.Open gets called, more goroutines could've been
spawned. This means that the main goroutine (running pam.Open) could
get re-scheduled to a different thread.
This is an assumption. As of today, this is true because teleport
re-executes itself and calls pam.Open synchronously. If we change this
later, loginuid can become flaky again.
OpenSSH has a separate "authentication thread" which does all the PAM
stuff:
https://github.com/openssh/openssh-portable/blob/598c3a5e3885080ced0d7c40fde00f1d5cdbb32b/auth-pam.c#L470-L474
Updates #2476