Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

apt-get update failed in kvm-flavor stage1 #1917

Closed
xelatex opened this issue Dec 24, 2015 · 27 comments · Fixed by #1918
Closed

apt-get update failed in kvm-flavor stage1 #1917

xelatex opened this issue Dec 24, 2015 · 27 comments · Fixed by #1918

Comments

@xelatex
Copy link
Contributor

xelatex commented Dec 24, 2015

Hi there,

I use docker://ubuntu image with stage1-kvm.aci to run a pod. But when I execute "apt-get update" inside, something goes wrong as follows:

root@node4:~/rkt.git# rkt run --insecure-options=all --interactive --stage1-image=/usr/local/bin/stage1-kvm.aci docker://ubuntu
rkt: using image from file /usr/local/bin/stage1-kvm.aci
rkt: using image from local store for url docker://ubuntu
run: group "rkt" not found, will use default gid when rendering images
root@clr:/# apt-get update
Ign http://archive.ubuntu.com trusty InRelease
Get:1 http://archive.ubuntu.com trusty-updates InRelease [64.4 kB]
Get:2 http://archive.ubuntu.com trusty-security InRelease [64.4 kB]
E: Unable to determine file size for fd 7 - fstat (2: No such file or directory)
root@clr:/#

Is this a bug? Or I miss some configurations?

Thanks.

@jonboulle
Copy link
Contributor

Hmm, looks like this bug: https://bugs.launchpad.net/qemu/+bug/1336794

/cc @jellonek @ppalucki

xelatex added a commit to xelatex/rkt that referenced this issue Dec 25, 2015
Upgrade lkvm to current master branch to fix a 9pfs bug.

Bug details: rkt#1917

Signed-off-by: Arthur Chunqi Li <[email protected]>
@xelatex
Copy link
Contributor Author

xelatex commented Dec 25, 2015

Hi jonboulle,

I dived deep into the lkvm code and found that this bug was fixed by lkvm patch 6d7eeb7a1328fcce82b5783d9e4605bf5e4737dd "kvmtool: set 9p caching mode to support writable mmaps". This patch adds "cache=loose" mount param to 9pfs, which maintains the cache inner Linux VFS. Actually this is just a alternative and trade-off, but according to v9fs official document:

Loose cacheing works well for read-only mounts (allowing scalable fanout in clusters with intermediate servers re-exporting read-only v9fs mounts to more clients), or mounts with nonconcurrent users (including only one client mounting a directory, or user home directories under a common directory).

I think it's just fit for rkt environment since the rootfs of stage-2 is only used by one pod.

Here attached my patch: xelatex@1f145e8

Thanks.

@jonboulle
Copy link
Contributor

@xelatex awesome, thanks for doing that research! Would you mind putting up your patch as a PR, and we can get @jellonek or @ppalucki to ack?

blixtra pushed a commit to blixtra/rkt that referenced this issue Jan 8, 2016
Upgrade lkvm to current master branch to fix a 9pfs bug.

Bug details: rkt#1917

Signed-off-by: Arthur Chunqi Li <[email protected]>
blixtra pushed a commit to kinvolk/rkt that referenced this issue Jan 8, 2016
Upgrade lkvm to current master branch to fix a 9pfs bug.

Bug details: rkt#1917

Signed-off-by: Arthur Chunqi Li <[email protected]>
blixtra pushed a commit to kinvolk/rkt that referenced this issue Jan 8, 2016
Upgrade lkvm to current master branch to fix a 9pfs bug.

Bug details: rkt#1917

Signed-off-by: Arthur Chunqi Li <[email protected]>
@tjdett
Copy link
Contributor

tjdett commented May 4, 2016

I'm still seeing this:

$ sudo ~core/rkt-v1.5.1/rkt run --dns=8.8.8.8 --stage1-name=coreos.com/rkt/stage1-kvm:1.5.1 docker://debian:8 --exec sh -- -c "apt-get update"
image: using image from local store for image name coreos.com/rkt/stage1-kvm:1.5.1
image: using image from local store for url docker://debian:8
networking: loading networks from /etc/rkt/net.d
kvm: network unit created: "interface-eth0.service" in "stage1/rootfs/usr/lib/systemd/system" (iface="eth0", addr="172.16.28.55/24")
[    1.402500] sh[121]: Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
[    1.781378] sh[121]: E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
[    2.000992] cgroup: option or name mismatch, new: 0x0 "", old: 0x4 "systemd"
[    2.133925] reboot: Restarting system with command '<rkt: obligatory restart>'

It's not limited to Debian apt-get of course. yum doesn't like file systems like this either:

$ sudo ~core/rkt-v1.5.1/rkt run --dns=8.8.8.8 --stage1-name=coreos.com/rkt/stage1-kvm:1.5.1 docker://centos:7 --exec sh -- -c "yum install -y which"
image: using image from local store for image name coreos.com/rkt/stage1-kvm:1.5.1
image: using image from local store for url docker://centos:7
networking: loading networks from /etc/rkt/net.d
kvm: network unit created: "interface-eth0.service" in "stage1/rootfs/usr/lib/systemd/system" (iface="eth0", addr="172.16.28.54/24")
[    2.940208] sh[121]: Loaded plugins: fastestmirror, ovl
[    9.138312] sh[121]: Determining fastest mirrors
[    9.576521] sh[121]: * base: ftp.swin.edu.au
[    9.580736] sh[121]: * extras: ftp.swin.edu.au
[    9.583725] sh[121]: * updates: ftp.swin.edu.au
[   14.737657] sh[121]: Resolving Dependencies
[   14.741756] sh[121]: --> Running transaction check
[   14.742628] sh[121]: ---> Package which.x86_64 0:2.20-7.el7 will be installed
[   15.923598] sh[121]: --> Finished Dependency Resolution
[   15.935339] sh[121]: Dependencies Resolved
[   15.937328] sh[121]: ================================================================================
[   15.937800] sh[121]: Package          Arch              Version               Repository       Size
[   15.938336] sh[121]: ================================================================================
[   15.938796] sh[121]: Installing:
[   15.939341] sh[121]: which            x86_64            2.20-7.el7            base             41 k
[   15.939866] sh[121]: Transaction Summary
[   15.940403] sh[121]: ================================================================================
[   15.940864] sh[121]: Install  1 Package
[   15.942100] sh[121]: Total download size: 41 k
[   15.943484] sh[121]: Installed size: 75 k
[   15.943952] sh[121]: Downloading packages:
[   16.745615] sh[121]: warning: /var/cache/yum/x86_64/7/base/packages/which-2.20-7.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY
[   16.748267] sh[121]: Public key for which-2.20-7.el7.x86_64.rpm is not installed
[   16.774448] sh[121]: Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
[   16.801776] sh[121]: Importing GPG key 0xF4A80EB5:
[   16.802756] sh[121]: Userid     : "CentOS-7 Key (CentOS 7 Official Signing Key) <[email protected]>"
[   16.803738] sh[121]: Fingerprint: 6341 ab27 53d7 8a78 a7c2 7bb1 24c6 a8a7 f4a8 0eb5
[   16.804911] sh[121]: Package    : centos-release-7-2.1511.el7.centos.2.10.x86_64 (@CentOS)
[   16.805890] sh[121]: From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
[   16.996269] sh[121]: Running transaction check
[   17.036896] sh[121]: Running transaction test
[   17.399899] sh[121]: Transaction test succeeded
[   17.403942] sh[121]: Running transaction
[   17.955771] sh[121]: Installing : which-2.20-7.el7.x86_64                                      1/1
[   17.956944] sh[121]: install-info: No such file or directory for /usr/share/info/which.info.gz
[   17.982166] sh[121]: Rpmdb checksum is invalid: dCDPT(pkg checksums): which.x86_64 0:2.20-7.el7 - u
[   18.299838] cgroup: option or name mismatch, new: 0x0 "", old: 0x4 "systemd"
[   18.441417] reboot: Restarting system with command '<rkt: obligatory restart>'

Environment

rkt Version: 1.5.1
appc Version: 0.7.4
Go Version: go1.6.1
Go OS/Arch: linux/amd64
Features: -TPM
--
Linux 4.5.2-coreos x86_64
--
NAME=CoreOS
ID=coreos
VERSION=1032.0.0
VERSION_ID=1032.0.0
BUILD_ID=2016-04-28-0152
PRETTY_NAME="CoreOS 1032.0.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
--
systemd 229
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS -ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN

@tjdett
Copy link
Contributor

tjdett commented May 4, 2016

@jonboulle can you confirm this should be fixed as of CoreOS 1032.0.0, as per details above?

This is going to be a blocker for me moving from Docker to rkt, so I'm rather committed to helping diagnose the problem if it has reappeared.

@jonboulle
Copy link
Contributor

I can reproduce.
@jellonek any ideas?

@jonboulle jonboulle reopened this May 6, 2016
@jonboulle jonboulle added this to the v1.6.0 milestone May 6, 2016
@jellonek
Copy link
Contributor

jellonek commented May 6, 2016

In near future we plant to add support for hanging hypervisors (adding qemu) and will look forward on this issue.

@jonboulle
Copy link
Contributor

sorry, I don't quite follow - is that somehow related or are you just
saying you don't have time to look at this right now?

On Fri, May 6, 2016 at 7:57 PM, Piotr Skamruk [email protected]
wrote:

In near future we plant to add support for hanging hypervisors (adding
qemu) and will look forward on this issue.


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
#1917 (comment)

@jellonek
Copy link
Contributor

jellonek commented May 9, 2016

Yes, I don't have a time for this at the moment and we want to if this issue show also still under qemu.

@iaguis iaguis modified the milestones: v1.7.0, v1.6.0 May 12, 2016
@iaguis
Copy link
Member

iaguis commented May 12, 2016

Needs more investigation. Moving...

@hectorj2f
Copy link

I am also facing this issue when using rkt v.1.6.0 and stage1-kvm:1.6.0.

root@rkt-59f7c1fa-4480-4ed7-9ce2-87b0223d09be:/# apt-get install -y aufs-tools
Reading package lists... Error!
E: Unable to determine file size for fd 6 - fstat (2: No such file or directory)
E: Problem opening /var/lib/apt/lists/http.debian.net_debian_dists_jessie-updates_main_binary-amd64_Packages.gz
E: The package lists or status file could not be parsed or opened.

@jellonek
Copy link
Contributor

Please try kvm flavor from branch proposed in this PR.

@iaguis iaguis added this to the v1.8.0 milestone May 24, 2016
@lucab
Copy link
Member

lucab commented Jun 9, 2016

I'm bumping this to next release, but I think this would need some further assessment. @jellonek can you please re-target this to a reasonable ETA and find an owner?

@jellonek
Copy link
Contributor

jellonek commented Jun 9, 2016

This should be resolved by moving to hypervisors barnch, which defaults to qemu.
I can't say anything about ETA, as this is still in review - #2684.

@lucab
Copy link
Member

lucab commented Jun 9, 2016

@jellonek so we don't want to track down the root cause here and see if it can be fixed on lkvm?

@jjlakis
Copy link

jjlakis commented Jun 9, 2016

@jellonek @lucab Unfortunately, this PR consists only the hypervisor interface. The QEMU support will be added in another PR

@jellonek
Copy link
Contributor

jellonek commented Jun 9, 2016

@lucab: There is no one longer interested in lkvm. Using qemu solves a lot of pain and adds further possibilities.
lkvm is chosen to be deprecated.

@lucab
Copy link
Member

lucab commented Oct 10, 2016

@coreos/rkt-kvm-maintainers can anybody of you confirm that this is actually fixed in the QEMU-based stage1-kvm flavor that we plan to transition to?

@tjdett
Copy link
Contributor

tjdett commented Oct 19, 2016

@lucab No, it's not fixed in QEMU.

$ sudo ./rkt run --interactive --dns=8.8.8.8 --stage1-path=`pwd`/stage1-kvm-qemu.aci --insecure-options=image docker://debian:8 --exec /bin/bash -- -c "apt-get update && apt-get install httpfs2" 
image: using image from file /home/uqtdettr/workspace/rkt/build-rkt-1.17.0+git/target/bin/stage1-kvm-qemu.aci
image: using image from local store for url docker://debian:8
networking: loading networks from /etc/rkt/net.d
Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
[    2.302834] reboot: Restarting system with command '<rkt: obligatory restart>'

@tjdett
Copy link
Contributor

tjdett commented Oct 21, 2016

@lucab @coreos/rkt-kvm-maintainers of all the KVM bugs, this is the biggest one preventing me from using KVM. I want KVM so I can run user code, and I want package installation available for those users.

It's also a pain point when submitting examples for other rkt KVM bugs, because I can't do on-the-fly package installation with Fedora/Debian containers to get the utilities required to show problems. At the moment I can use Alpine with apk on QEMU, but nothing much else.

Can we put this on a milestone?

@jjlakis
Copy link

jjlakis commented Oct 24, 2016

@tjdett @lucab Looking for the previous comments and found the source of the problem. After changing 9p mount caching mode from "mmap" to "loose" problem disappears. For LKVM, there is a patch to set this mode.
Unfortunately, I changed this intentionally couple months ago (See #2795), because there was the only way to read systemctl logs from the host... We should take a deep look on 9p mounting stuff...
cc @pskrzyns @lukasredynk

@slallema
Copy link

could this be related : http://www.spinics.net/lists/stable/msg140096.html ?

@grahamwhaley
Copy link

Hi.
I'll see if we can narrow in on this, get more details, and a fix. It is a long standing known issue with 9pfs afaik. We have seen it in the intel/cc-oci-runtime#47 for instance, and there are some fixes in intel/qemu-lite@e92ce82

And, I think there is probably more than one 9pfs bug in this area. In intel/cc-oci-runtime#152 you can see we believe there is another related, but subtly different, instance where it is an unlink/ftruncate bug, rather than an unlink/fstat bug - same type of bug (using the name of a file that has been unlinked instead of the handle to that file), but I think through a different code path.

@grahamwhaley
Copy link

Quick update. It looks like just the qemu patch from qemu-lite as referenced above is enough to fix apt-get. Currently the qemu.mk does not have the ability to apply patches, so I'll graft that into the build system and do significantly more testing.
@lucab - is it possible for you to add me to this project so I can assign this item to myself etc.?

@jonboulle
Copy link
Contributor

@grahamwhaley I've added you to the org

grahamwhaley pushed a commit to grahamwhaley/rkt that referenced this issue Dec 15, 2016
Even though a uid/gid pair are passed over the protocol, they were
not being used for chmod'ing files or dirs being created. This
surfaced inside /tmp with the S_ISVTX 'sticky' bit set, where
all files ended up being owned by root.
Fix it by chmod'ing after creation.

Fixes rkt#2576

As a side effect, it also seems to solve the remaining problems with
'apt-get update' in lkvk, so also:

Fixes rkt#1917
grahamwhaley pushed a commit to grahamwhaley/rkt that referenced this issue Dec 16, 2016
Even though a uid/gid pair are passed over the protocol, they were
not being used for chmod'ing files or dirs being created. This
surfaced inside /tmp with the S_ISVTX 'sticky' bit set, where
all files ended up being owned by root.
Fix it by chmod'ing after creation.

Fixes rkt#2576

As a side effect, it also seems to solve the remaining problems with
'apt-get update' in lkvk, so also:

Fixes rkt#1917
grahamwhaley pushed a commit to grahamwhaley/rkt that referenced this issue Dec 16, 2016
Even though a uid/gid pair are passed over the protocol, they were
not being used for chmod'ing files or dirs being created. This
surfaced inside /tmp with the S_ISVTX 'sticky' bit set, where
all files ended up being owned by root.
Fix it by chmod'ing after creation.

Fixes rkt#2576

As a side effect, it also seems to solve the remaining problems with
'apt-get update' in lkvm, so also:

Fixes rkt#1917
alban added a commit to kinvolk/rkt that referenced this issue Dec 17, 2016
This allows me to use rkt to test ebpf-kprobes without impacting the
host kernel.

The KVM flavor currently include the kernel version 4.8.6rkt-v1, which
is enough for my current needs.

tracefs is normally available in /sys/kernel/debug/tracing/ and is
necessary for kprobe support.

I could test in the following way:
```
sudo ./build-rkt-1.21.0+git/target/bin/rkt \
  run --interactive \
  --insecure-options=image,all-run \
  --dns=8.8.8.8 \
  --stage1-path=./build-rkt-1.21.0+git/target/bin/stage1-kvm.aci \
  --volume=ebpf,kind=host,source=$GOPATH/src/github.com/kinvolk/gobpf-elf-loader/ \
  docker://debian \
  --mount=volume=ebpf,target=/ebpf
```

And inside rkt:
```
mount -t tmpfs tmpfs /tmp # workaround rkt#1917
mount -t debugfs debugfs  /sys/kernel/debug/
cd /ebpf
./gobpf-elf-loader ./ebpf.o
```

This patch increases the size of stage1-kvm.aci by 5MB (from 43MB to
48MB). Is it acceptable?
alban added a commit to kinvolk/rkt that referenced this issue Dec 17, 2016
This allows me to use rkt to test ebpf-kprobes without impacting the
host kernel.

The KVM flavor currently includes the kernel version 4.8.6rkt-v1, which
is enough for my current needs.

tracefs is normally available in /sys/kernel/debug/tracing/ and is
necessary for kprobe support.

I could test in the following way:
```
sudo ./build-rkt-1.21.0+git/target/bin/rkt \
  run --interactive \
  --insecure-options=image,all-run \
  --dns=8.8.8.8 \
  --stage1-path=./build-rkt-1.21.0+git/target/bin/stage1-kvm.aci \
  --volume=ebpf,kind=host,source=$GOPATH/src/github.com/kinvolk/gobpf-elf-loader/ \
  docker://debian \
  --mount=volume=ebpf,target=/ebpf
```

And inside rkt:
```
mount -t tmpfs tmpfs /tmp # workaround rkt#1917
mount -t debugfs debugfs /sys/kernel/debug/
cd /ebpf
./gobpf-elf-loader ./ebpf.o
```

This patch increases the size of stage1-kvm.aci by 5MB (from 43MB to
48MB). Is it acceptable?
alban added a commit to kinvolk/rkt that referenced this issue Dec 28, 2016
This allows me to use rkt to test ebpf-kprobes without impacting the
host kernel.

The KVM flavor currently includes the kernel version 4.8.6rkt-v1, which
is enough for my current needs.

tracefs is normally available in /sys/kernel/debug/tracing/ and is
necessary for kprobe support.

I could test in the following way:
```
sudo ./build-rkt-1.21.0+git/target/bin/rkt \
  run --interactive \
  --insecure-options=image,all-run \
  --dns=8.8.8.8 \
  --stage1-path=./build-rkt-1.21.0+git/target/bin/stage1-kvm.aci \
  --volume=ebpf,kind=host,source=$GOPATH/src/github.com/kinvolk/gobpf-elf-loader/ \
  docker://debian \
  --mount=volume=ebpf,target=/ebpf
```

And inside rkt:
```
mount -t tmpfs tmpfs /tmp # workaround rkt#1917
mount -t debugfs debugfs /sys/kernel/debug/
cd /ebpf
./gobpf-elf-loader ./ebpf.o
```

This patch increases the size of stage1-kvm.aci by 5MB (from 43MB to
48MB). Is it acceptable?
@jonboulle
Copy link
Contributor

jonboulle commented Jan 13, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.