Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sendfile regression in ZFS arch linux 5.16 2.1.2_5.16.arch1.1-1 not zfs-linux-2.1.2_5.15.13.arch1.1-1 #12971

Closed
ppickfor opened this issue Jan 14, 2022 · 16 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ppickfor
Copy link

System information

Distribution Name | Arch Linux
Distribution Version | 5.16.0-arch1-1
Kernel Version | 5.16.0-arch1-1
Architecture | x86_64
OpenZFS Version | zfs-2.1.2-1 zfs-kmod-2.1.2-1

Describe the problem you're observing

sendfile fails with EINVAL after upgrade to linux-5.16.arch1-1 and zfs-linux-2.1.2_5.16.arch1.1-1

works fine with on zfs linux-5.15.13.arch1-1 zfs-linux-2.1.2_5.15.13.arch1.1-1

works fine with an xfs file system on linux-5.16.arch1-1

flexo.strace:48892 sendfile(4, 6, [0], 2147479552)   = -1 EINVAL (Invalid argument)

Describe how to reproduce the problem

attempting to fetch a package from flexo with pacman

ffmpeg-2:4.4.1-1-x86_64.pkg.tar.zst failed to download
error: failed retrieving file 'ffmpeg-2:4.4.1-1-x86_64.pkg.tar.zst' from 192.168.1.138:8080 : transfer closed with 10864773 bytes remaining to read
warning: failed to retrieve some files
error: failed to commit transaction (download library error)
Errors occurred, no packages were upgraded.

from flexo log

2022-01-14T03:18:27.557Z DEBUG flexo] Schedule new job
[2022-01-14T03:18:27.558Z DEBUG flexo::mirror_flexo] Determine cache state for path "/var/cache/flexo/pkg/extra/os/x86_64/ffmpeg-2:4.4.1-1-x86_64.pkg.tar.zst"
[2022-01-14T03:18:27.558Z DEBUG flexo] Order DownloadOrder { requested_path: StrPath { path_buf: "extra/os/x86_64/ffmpeg-2:4.4.1-1-x86_64.pkg.tar.zst", inner: "extra/os/x86_64/ffmpeg-2:4.4.1-1-x86_64.pkg.tar.zst" }, non_cacheable_unique_id: None } is already cached.
[2022-01-14T03:18:27.558Z DEBUG flexo] Cache hit for request StrPath { path_buf: "extra/os/x86_64/ffmpeg-2:4.4.1-1-x86_64.pkg.tar.zst", inner: "extra/os/x86_64/ffmpeg-2:4.4.1-1-x86_64.pkg.tar.zst" }
[2022-01-14T03:18:27.643Z DEBUG flexo] Sending header to client: "HTTP/1.1 200 OK\r\nServer: flexo\r\nDate: Fri, 14 Jan 2022 03:18:27 GMT\r\nFlexo-Payload-Origin: Cache\r\nContent-Length: 10864773\r\n\r\n"
[2022-01-14T03:18:27.643Z WARN  flexo] Error while sending payload: Os { code: 22, kind: InvalidInput, message: "Invalid argument" }
[2022-01-14T03:18:27.643Z ERROR flexo] Unable to serve request "extra/os/x86_64/ffmpeg-2:4.4.1-1-x86_64.pkg.tar.zst": IoError(InvalidInput)
[2022-01-14T03:18:27.643Z ERROR flexo] Input/Output Error: InvalidInput
[2022-01-14T03:18:27.643Z WARN  flexo] Closing TCP socket due to error: IoError(InvalidInput)

from strace of flexo process

48892 sendto(4, "HTTP/1.1 200 OK\r\nServer: flexo\r\n"..., 126, MSG_NOSIGNAL, NULL, 0) = 126
48892 sendfile(4, 6, [0], 2147479552)   = -1 EINVAL (Invalid argument)

Include any warning/errors/backtraces from the system logs

no relevant system logs on either end

@ppickfor ppickfor added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 14, 2022
@rincebrain
Copy link
Contributor

rincebrain commented Jan 14, 2022

rincebrain/zfs@f6960eb resolves this for me (possibly with the other two patches in that branch; I tried them in a different order than they're committed, this was the last one, and it worked). I'll test it some more and try to submit it for inclusion, possibly with a test to notice this in the future.

Of course, the fact that it doesn't work without that is still a problem, I think, since I can't find any suggestion that sendfile() support should be constrained like that (EINVAL is perfectly reasonable and a listed valid response if it's not supported, I just don't see any reason it shouldn't be in this case...), but for the immediate issue...

@mcmilk
Copy link
Contributor

mcmilk commented Jan 14, 2022

sendfile(4, 6, [0], 2147479552) = -1 EINVAL (Invalid argument)

The last argument to sendfile() looks a bit to big and seems to be some 2^31 overflow ?
Could you check the code which uses sendfile?
Maybe some cast to size_t sendfile(4, 6, [0], (size_t)var) would help... maybe the error just tells, that count is out of range?

@ppickfor
Copy link
Author

ppickfor commented Jan 14, 2022 via email

@ppickfor
Copy link
Author

Works fine under 5.15
sendfile is called with the same value 2g-4096

end of a long steam of sendfiles for the package

075442 sendfile(4, 8, [10698461] => [10714845], 2147479552) = 16384
2075442 sendfile(4, 8, [10714845] <unfinished ...>
2075442 <... sendfile resumed> => [10731229], 2147479552) = 16384
2075442 sendfile(4, 8, [10731229] => [10747613], 2147479552) = 16384
2075442 sendfile(4, 8, [10747613] => [10763997], 2147479552) = 16384
2075442 sendfile(4, 8, [10763997] => [10780381], 2147479552) = 16384
2075442 sendfile(4, 8, [10780381] => [10796765], 2147479552) = 16384
2075442 sendfile(4, 8, [10796765] => [10813149], 2147479552) = 16384
2075442 sendfile(4, 8, [10813149] <unfinished ...>
2075442 <... sendfile resumed> => [10829533], 2147479552) = 16384
2075442 sendfile(4, 8, [10829533] <unfinished ...>
2075442 <... sendfile resumed> => [10862301], 2147479552) = 32768
2075442 sendfile(4, 8, [10862301] <unfinished ...>
2075442 <... sendfile resumed> => [10864773], 2147479552) = 2472

sendfile for the signature of 310 bytes

2075442 sendfile(4, 8, [0] <unfinished ...>
2075442 <... sendfile resumed> => [310], 2147479552) = 310

Good news code in not buried in some library :)
So I can have a go at improving it.

code

// man 2 read: read() (and similar system calls) will transfer at most 0x7ffff000 bytes.
#[cfg(not(test))]
const MAX_SENDFILE_COUNT: usize = 0x7fff_f000;

fn send_payload<T>(source: &mut File, filesize: u64, bytes_sent: i64, receiver: &mut T) -> io::Result<i64>
    where T: AsRawFd {
    let fd = source.as_raw_fd();
    let sfd = receiver.as_raw_fd();
    let size = unsafe {
        let mut offset = bytes_sent as off64_t;
        while (offset as u64) < filesize {
            let size: isize = libc::sendfile64(sfd, fd, &mut offset, MAX_SENDFILE_COUNT);
            if size == -1 {
                return Err(std::io::Error::last_os_error());
            }
        }
        offset
    };

    Ok(size)
}

I'm not sure it will resolve the issue but its not best according to the man page.

@ppickfor
Copy link
Author

from sendfile man

      EINVAL Descriptor is not valid or locked, or an mmap(2)-like operation is not available for in_fd, or count is negative.

       EINVAL out_fd has the O_APPEND flag set.  This is not currently supported by sendfile().

       EOVERFLOW
              count is too large, the operation would result in exceeding the maximum size of either the input file or the output file.

I would have expected EOVERFLOW when file size is less than 0x7fff_f000 (which it is in both cases)
Does not to seem to generate an error at least in the past but best to be compliant.

@mcmilk
Copy link
Contributor

mcmilk commented Jan 15, 2022

EINVAL Descriptor is not valid or locked, or an mmap(2)-like operation is not available for in_fd, or count is negative.

The count is negative was my thinking, but when it's really just some very big value... then there is a problem within the FS-handling. :-(

Thank you for the hints.

@ppickfor
Copy link
Author

I updated the sendfile call and submitted a PR as it seems more inline with the sendfile manpage.

It made no difference under 5.16 and zfs.

flexo.strace.5.15.txt
flexo.strace.5.16.txt

@ericonr
Copy link
Contributor

ericonr commented Jan 23, 2022

Have the same issue using ZFS 2.1.2 and Linux 5.16.2 on a Void Linux system, would appreciate it if the patches were merged and backported to 2.1.x branch.

I will try to test them locally myself and give feedback in the PR.

@Luflosi
Copy link

Luflosi commented Jan 30, 2022

copy_file_range() also fails, see kovidgoyal/kitty#4568.

@thesamesam
Copy link
Contributor

thesamesam commented Jan 30, 2022

copy_file_range() also fails, see kovidgoyal/kitty#4568.

Is this with the master ZFS fixes for 5.16? What commit?

@Luflosi
Copy link

Luflosi commented Jan 30, 2022

No, this was with the 2.1.2 release.

@xordspar0
Copy link

It looks like this change was backported to Linux 5.15 starting in 5.15.37. I'm seeing sendfile failures on that version with openzfs 2.1.2.

Here's the diff for Linux 5.15.37: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=v5.15.37&id2=v5.15.36&dt=2

@kuon
Copy link

kuon commented Sep 20, 2022

What is the current status on this? Is there a current (LTS or mainline) kernel that works with ZFS and sendfile?

@rincebrain
Copy link
Contributor

This has been fixed for several releases now. cf. #12975

@kuon
Copy link

kuon commented Sep 20, 2022

This has been fixed for several releases now. cf. #12975

Great, maybe we could close this? I was holding an upgrade back because of it.

@rincebrain
Copy link
Contributor

Sounds reasonable to me; if there's some reason I'm not seeing this should be open, I'm sure someone will ping it and we can reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

8 participants