You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks so much for your work and curation. I've recently resurfaced from deep-sea debugging and have a question about my understanding of the Operations::Upload code—i.e. if there's a race condition when uploading a single (small) file.
Background
Net::SFTP 2.2.1 Net::SSH 4.2.0 Ruby 2.4.1 protocol version 6
An application I work on regularly uploads small csv files to an sftp server. The volume might be around several dozen per day.
Expected behavior
# nearly identical to the code from our application:Net::SFTP.start(host,user,port: port,password: password)do |sftp|
sftp.upload!(local_path,remote_path)end
Under normal conditions and a working server, that the code above uploads a file without a socket failure.
Actual Behavior
The file is sometimes uploaded and we receive this error.
IOError: closed stream
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/ruby_compat.rb:20:in `select'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/ruby_compat.rb:20:in `io_select'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/packet_stream.rb:75:in `available_for_read?'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/packet_stream.rb:90:in `next_packet'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/session.rb:193:in `block in poll_message'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/session.rb:188:in `loop'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/transport/session.rb:188:in `poll_message'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:544:in `dispatch_incoming_packets'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:246:in `ev_preprocess'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/event_loop.rb:99:in `each'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/event_loop.rb:99:in `ev_preprocess'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/event_loop.rb:27:in `process'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:225:in `process'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `block in loop'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `loop'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `loop'
... 15 levels...
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/event_loop.rb:27:in `process'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:225:in `process'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `block in loop'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `loop'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/connection/session.rb:178:in `loop'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-sftp-2.1.2/lib/net/sftp/session.rb:802:in `loop'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-sftp-2.1.2/lib/net/sftp/session.rb:787:in `connect!'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/net-sftp-2.1.2/lib/net/sftp.rb:32:in `start'
from (irb):401
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands/console.rb:110:in `start'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands/console.rb:9:in `start'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands/commands_tasks.rb:68:in `console'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands/commands_tasks.rb:39:in `run_command!'
from /u/apps/bighatads/production/shared/vendor/bundle/ruby/2.4.0/gems/railties-4.2.9/lib/rails/commands.rb:17:in `<top (required)>'
from bin/rails:5:in `require'
from bin/rails:5:in `<main>'
Possible Cause
When the size of the file being uploaded is smaller than the read_size, Net::SFTP appears to close the connection before the upload (write bytes) is done and acknowledged. When the read_size is less than the file size, it always seems to succeed independently of file size.
The issue I'm seeing is when the upload initially starts and the file is opened on disk (see on_open):
defon_open(response)@active -= 1file=response.request[:file]raiseStatusException.new(response,"open #{file.remote}")unlessresponse.ok?file.handle=response[:handle]@uploads << file# we start writing chunks here. This will get called recursivelywrite_next_chunk(file)# recursive? is only true when uploading a directoryif !recursive?# n.times write_next_chunk(same file as above)!(options[:requests] || SINGLE_FILE_READERS).to_i.times{write_next_chunk(file)}endend
We really only need write_next_chunk called once and the event handler will continually call it after each chunk sent over the socket until the file is completely read. The next few lines under if !recursive? are troubling. At this point write_next_chunk has already been called. Let's say the file size is smaller than the chunk size (read_size). When write_next_chunk is called, is reads the entire file contents to EOF.
After this has occurred, requests.to_i.times { write_next_chunk(file) } is executed. The default being 2.
if !recursive?# n.times write_next_chunk(same file as above)!(options[:requests] || SINGLE_FILE_READERS).to_i.times{write_next_chunk(file)}end
Thus, two more times we call write_next_chunk. As the second or later caller to write_next_chunks will have a file handle at the EOF position, it will call read(bytes) and data will be nil.
I believe the issue I'm seeing is that the second call to write_next_chunk occurs synchronously after the first call and the server response is not received before the socket is closed. The result is an uploaded file and a "closed stream" error (this error is sometimes different depending on how fast this all happens). Sometimes the error is connection closed by remote host and sometimes it is connection reset by peer recvfrom(2).
Am I reading this correctly? Would you consider a patch to this? We can currently work around it by always ensuring the chunk size is smaller than the file size. Another workaround is ensuring that requests: 0 is passed as we don't need multiple writers. As a very core part of the library I wouldn't expect to have to use these workaround so I thought this was worth brining up.
How should I proceed?
The text was updated successfully, but these errors were encountered:
Hello maintainer(s)!
Thanks so much for your work and curation. I've recently resurfaced from deep-sea debugging and have a question about my understanding of the
Operations::Upload
code—i.e. if there's a race condition when uploading a single (small) file.Background
Net::SFTP 2.2.1
Net::SSH 4.2.0
Ruby 2.4.1
protocol version 6
An application I work on regularly uploads small csv files to an sftp server. The volume might be around several dozen per day.
Expected behavior
Under normal conditions and a working server, that the code above uploads a file without a socket failure.
Actual Behavior
The file is sometimes uploaded and we receive this error.
Possible Cause
When the size of the file being uploaded is smaller than the
read_size
, Net::SFTP appears to close the connection before the upload (write bytes) is done and acknowledged. When theread_size
is less than the file size, it always seems to succeed independently of file size.The conclusion isn't straightforward, but it seems like a race condition correlated with file size.
When trying to trace the cause I found this:
write_next_chunk
is the method that says "read a chunk of size x, if there's no data left then close the connection". Is that correct?The issue I'm seeing is when the upload initially starts and the file is opened on disk (see
on_open
):We really only need
write_next_chunk
called once and the event handler will continually call it after each chunk sent over the socket until the file is completely read. The next few lines underif !recursive?
are troubling. At this pointwrite_next_chunk
has already been called. Let's say the file size is smaller than the chunk size (read_size). Whenwrite_next_chunk
is called, is reads the entire file contents to EOF.After this has occurred,
requests.to_i.times { write_next_chunk(file) }
is executed. The default being2
.Thus, two more times we call
write_next_chunk
. As the second or later caller towrite_next_chunks
will have a file handle at the EOF position, it will callread(bytes)
anddata
will benil
.It will enter the
if data.nil?
block:and the session/connection will be closed.
I believe the issue I'm seeing is that the second call to
write_next_chunk
occurs synchronously after the first call and the server response is not received before the socket is closed. The result is an uploaded file and a "closed stream" error (this error is sometimes different depending on how fast this all happens). Sometimes the error isconnection closed by remote host
and sometimes it isconnection reset by peer recvfrom(2)
.Am I reading this correctly? Would you consider a patch to this? We can currently work around it by always ensuring the chunk size is smaller than the file size. Another workaround is ensuring that
requests: 0
is passed as we don't need multiple writers. As a very core part of the library I wouldn't expect to have to use these workaround so I thought this was worth brining up.How should I proceed?
The text was updated successfully, but these errors were encountered: