Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SINGLE_FILE_READERS/write_next_chunk race condition? #91

Open
jbielick opened this issue Feb 19, 2019 · 0 comments
Open

SINGLE_FILE_READERS/write_next_chunk race condition? #91

jbielick opened this issue Feb 19, 2019 · 0 comments

Comments

@jbielick
Copy link

Hello maintainer(s)!

Thanks so much for your work and curation. I've recently resurfaced from deep-sea debugging and have a question about my understanding of the Operations::Upload code—i.e. if there's a race condition when uploading a single (small) file.

Background

Net::SFTP 2.2.1
Net::SSH 4.2.0
Ruby 2.4.1
protocol version 6

An application I work on regularly uploads small csv files to an sftp server. The volume might be around several dozen per day.

Expected behavior

# nearly identical to the code from our application:
Net::SFTP.start(host, user, port: port, password: password) do |sftp|
  sftp.upload!(local_path, remote_path)
end

Under normal conditions and a working server, that the code above uploads a file without a socket failure.

Actual Behavior

The file is sometimes uploaded and we receive this error.

IOError: closed stream
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/ruby_compat.rb:20:in `select'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/ruby_compat.rb:20:in `io_select'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/packet_stream.rb:75:in `available_for_read?'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/packet_stream.rb:90:in `next_packet'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/session.rb:193:in `block in poll_message'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/session.rb:188:in `loop'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/session.rb:188:in `poll_message'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:544:in `dispatch_incoming_packets'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:246:in `ev_preprocess'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/event_loop.rb:99:in `each'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/event_loop.rb:99:in `ev_preprocess'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/event_loop.rb:27:in `process'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:225:in `process'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `block in loop'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `loop'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `loop'
... 15 levels...
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/event_loop.rb:27:in `process'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:225:in `process'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `block in loop'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `loop'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `loop'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-sftp-2.1.2/lib/net/sftp/session.rb:802:in `loop'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-sftp-2.1.2/lib/net/sftp/session.rb:787:in `connect!'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-sftp-2.1.2/lib/net/sftp.rb:32:in `start'
	from (irb):401
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands/console.rb:110:in `start'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands/console.rb:9:in `start'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands/commands_tasks.rb:68:in `console'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands/commands_tasks.rb:39:in `run_command!'
	from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands.rb:17:in `<top (required)>'
	from bin/rails:5:in `require'
	from bin/rails:5:in `<main>'

Possible Cause

When the size of the file being uploaded is smaller than the read_size, Net::SFTP appears to close the connection before the upload (write bytes) is done and acknowledged. When the read_size is less than the file size, it always seems to succeed independently of file size.

filesize: 10_713, read_size: 32_000 - FAIL
filesize: 10_767, read_size: 32_000 - SUCCESS
filesize: 10_670, read_size: 32_000 - SUCCESS

filesize: 10_713, read_size: 10_713 - FAIL
filesize: 10_713, read_size: 10_714 - FAIL
filesize: 10_713, read_size: 10_712 - SUCCESS

The conclusion isn't straightforward, but it seems like a race condition correlated with file size.

When trying to trace the cause I found this:

write_next_chunk is the method that says "read a chunk of size x, if there's no data left then close the connection". Is that correct?

      def write_next_chunk(file)
        if file.io.nil?
          process_next_entry
        else
          @active += 1
          offset = file.io.pos
          data = file.io.read(options[:read_size] || DEFAULT_READ_SIZE)
          if data.nil?
            update_progress(:close, file)
            request = sftp.close(file.handle, &method(:on_close))

The issue I'm seeing is when the upload initially starts and the file is opened on disk (see on_open):

      def on_open(response)
        @active -= 1
        file = response.request[:file]
        raise StatusException.new(response, "open #{file.remote}") unless response.ok?

        file.handle = response[:handle]

        @uploads << file
        # we start writing chunks here. This will get called recursively
        write_next_chunk(file)

        # recursive? is only true when uploading a directory
        if !recursive?
          # n.times write_next_chunk(same file as above)!
          (options[:requests] || SINGLE_FILE_READERS).to_i.times { write_next_chunk(file) }
        end
      end

We really only need write_next_chunk called once and the event handler will continually call it after each chunk sent over the socket until the file is completely read. The next few lines under if !recursive? are troubling. At this point write_next_chunk has already been called. Let's say the file size is smaller than the chunk size (read_size). When write_next_chunk is called, is reads the entire file contents to EOF.

After this has occurred, requests.to_i.times { write_next_chunk(file) } is executed. The default being 2.

        if !recursive?
          # n.times write_next_chunk(same file as above)!
          (options[:requests] || SINGLE_FILE_READERS).to_i.times { write_next_chunk(file) }
        end

Thus, two more times we call write_next_chunk. As the second or later caller to write_next_chunks will have a file handle at the EOF position, it will call read(bytes) and data will be nil.

It will enter the if data.nil? block:

          data = file.io.read(options[:read_size] || DEFAULT_READ_SIZE)
          if data.nil?
            update_progress(:close, file)
            request = sftp.close(file.handle, &method(:on_close))

and the session/connection will be closed.

I believe the issue I'm seeing is that the second call to write_next_chunk occurs synchronously after the first call and the server response is not received before the socket is closed. The result is an uploaded file and a "closed stream" error (this error is sometimes different depending on how fast this all happens). Sometimes the error is connection closed by remote host and sometimes it is connection reset by peer recvfrom(2).

Am I reading this correctly? Would you consider a patch to this? We can currently work around it by always ensuring the chunk size is smaller than the file size. Another workaround is ensuring that requests: 0 is passed as we don't need multiple writers. As a very core part of the library I wouldn't expect to have to use these workaround so I thought this was worth brining up.

How should I proceed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant