-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible regression in 2.5.0 - iperf test w/packet loss frequently times out #157
Comments
Can you bisect this one back to see if a particular changeset caused it, Magnus? |
I'm about to attempt to confirm/deny that this is still failing since #156 was merged, also. |
Sure - I'll try to find the commit. |
Still failing in a855bac (current latest) , unfortunately. |
I'm running a script now to automatically test commits back in time to see when it starts failing. It will probably take a few hours to complete. |
It looks like the test started failing after 7cecc9f. |
eeeek. The main change here is 7cecc9f#diff-5d0c376089097530d3f7f9c4082b6443R27 when maybe changes the semantics of retransmission.... Could you add a test which fail if such retransmissions appears? Would be much easier to avoid future regressions... and I can try to revert part of the patch to see if that fixes the bug or not. |
Hm - not sure what we should test for yet. The retransmissions mostly look normal until the connection stalls (at least to me, maybe someone else can spot something). It's only in rare cases that I've seen lots of retransmissions, so it could be a separate problem that usually corrects itself. If you want to try to revert some of the changes it's relatively easy to reproduce/test with the for-loop above. It fails after just a few attempts (<10). The pcap output is stored in |
It's probably valuable to generate a trace with mirage-profile when we run this test as well as a packet dump; a trace from a failing test would probably be instructive. |
(if testing with master it's faster to reproduce with |
Hm - would it be useful to add tracing in vnetif as well? Since both client and server are running in the same process we could (in theory at least :-)) track everything that happens as a result of a |
Any reason why we're doing performance testing using bytecode? I'd imagine native would be faster. Anyway, here's a trace showing it being slow: http://test.roscidus.com/traces/2015-07-02/tcp-perf-slow/ Test branch: https://github.com/talex5/mirage-tcpip/tree/trace-perf (aborts if a request takes > 1s) Raw trace: http://test.roscidus.com/traces/2015-07-02/tcp-perf-slow/trace.ctf.bz2 |
@talex5 thanks! The test is not really intended to be a performance test right now. It just verifies that we can transfer data between two stacks over different backends. |
Here's a slightly suspicious bit of another run: This is just before the system sleeps for a long time (the grey area on the right). We're doing a TCP fast transmit, but get stuck waiting for However, I don't know why that should be a problem. |
Looks like the modified test in https://github.com/talex5/mirage-tcpip/tree/trace-perf also fails with the last "good" commit f583cad, so the traces above could be from states the stack is able to recover from. Would it be possible to just label the |
Yes, but you might need a very large trace buffer if you want to do that. If 1s delays are expected, then it might be easier to just raise the threshold to whatever you would consider unacceptable. |
I think as @talex5 observed the async fast retransmit may be the change On Thu, Jul 2, 2015 at 1:43 PM, Thomas Leonard [email protected]
|
@balrajsingh Sure, I can check that - is it this xmit call? |
@MagnusS how about using a fixed seed for the random number generator in |
@balrajsingh @samoht The test still fails unfortunately... |
@talex5 Hm - yes, but is (or was) very unlikely... but I guess we could get unlucky and drop a lot of packets in a row for instance. Probably a good idea to add a seed - if we can seed the tcp/ip-stack as well we could have reproducible test runs |
Thanks @MagnusS, is there a pcap trace of a failed flow, or could you On Thu, Jul 2, 2015 at 3:09 PM, Magnus Skjegstad [email protected]
|
@MagnusS Given the low rate of dropped packets (1%), is it reasonable that sending a segment between the two test stacks might take more than 10s? I would have thought TCP should recover before then. |
did you push your patch somewhere? |
@samoht I've pushed it here with @talex5's tracing version of the test (but with exit and debug disabled): https://github.com/magnuss/mirage-tcpip/tree/trace-iperf Trace and pcap from failing test: (the trace buffer may be too small here though) |
@MagnusS would be useful to put a (also, if Looking at the trace, it looks like the |
Good idea - pushed a patch with labels for xmit-[seq] and pkt_drop now. Another trace and pcap: Console output
|
So, we dropped two packets close together. In both cases, the receiver sent a duplicate ack because it saw that a segment was missing. In the first case, we did a fast retransmission, but in the second we didn't for some reason. Is there some kind of back-off going on here? Note: if you reenable logging, that will show up in the trace too, which might be helpful. |
There is indeed! For tl;dr-ers: this TCP is designed to recover from the usual case on the Long version: Fast rexmit is designed to recover with almost no loss in transmission rate When the '3 duplicate ack signal' (I.e. 3 pure ack pkts that don't advance If multiple pkts in a window are lost then the first loss will cause a That said, there is always the possibility that there is a bug in the logic BTW, one common hack used is to set the number of dup acks received to 2 I'll look thru the code again tonight. Thanks.
|
Thanks @balrajsingh! I wonder if the
So, it thinks it the "smoothed round-trip time" between the two local stacks is 18s? Could it be including the delayed retransmissions in the srtt calculations? If so, this would explain the excessive 108s retransmission delay. |
The rto is supposed to back off exponentially with each loss and then when On Fri, Jul 3, 2015 at 3:33 PM, Thomas Leonard [email protected]
|
@balrajsingh right - but |
I'm not sure, I'll look. It is supposed to be RFC 2988, if it doesn't There may be a problem there because SRTT is a blend of current SRTT with a On Fri, Jul 3, 2015 at 5:14 PM, Thomas Leonard [email protected]
|
Here's another trace+pcap of a failing test with more debugging enabled (code is here): http://www.skjegstad/data/040715/iperf_test_timeout_11c3aa1.ctf.gz Here's the console output:
The timer went up to 270 seconds when the test failed - the test timeout is 120 sec. Could the timer not be triggered properly after commit 7cecc9f? I'll try to wrap the other xmit-calls in the timer in async as well to see if that helps. |
Looks like it works now! This call to xmit also had to be in Lwt.async: 7cecc9f#diff-5c19dbcb696d8a908ac70c39feacd849R298 I've run about 30 tests so far and the timer seems to never go above 1-2 seconds. I'll run a few more tests and prepare a PR :-) It would be really nice to have some of this debugging and tracing available by default in the tests! Would it be possible to enable profiling automatically when it's supported? |
yay! thanks for spotting this. If you have any idea to test that we don't have more regression (apart in forbidding me to push new code to that repository) that would be great. And yes, turning the profiling on during the test is a good idea! |
@MagnusS you'd have to detect that the |
Fix timer issue from #157 + test improvements
(continuing the discussion in mirage/ocaml-git#117) This has started happening again. My last two builds of CueKeeper on Travis failed with timeouts in the tcpip tests. First failure:
Failure on rebuild:
Did something change recently? |
(continuing the discussion from mirage/ocaml-git#117) @talex5 wrote
Hm... How would we advance the virtual clock? I guess we could base it on how fast we can transfer data between the stacks and adjust the clock to a target throughput, but I'm not sure how well that would work with packet loss and varying throughput. I think we should at least change the timeout to be reset for every new byte received (with normal clock for now) - then it would also be independent of the amount of data transferred. And two minutes without a single byte transferred seems very slow even for Travis :-) |
@MagnusS when all callbacks scheduled for the current time have run, the clock advances automatically to the next scheduled event (sleep timeout). At least, that's how the CueKeeper test clock works: We should probably add a (virtual) delay to the virtual network to simulate packet transmission time. This would also allow simulation of networks of varying speeds. Resetting the timeout after each byte might work too, but we also want to know when the stack really is behaving slowly, and if we make it too generous we might miss that. After all, the last time this happened it was a real bug... |
I think we've fixed the specific problems mentioned here with recent updates to |
Fix timer issue from mirage#157 + test improvements
CHANGES: * Use `Lwt_dllist` instead of `Lwt_sequence`, due to the latter being deprecated upstream in Lwt (ocsigen/lwt#361) (mirage/mirage-tcpip#388 by @avsm). * Remove arpv4 and ethif sublibraries, now provided by ethernet and arp-mirage opam packages (mirage/mirage-tcpip#380 by @hannesm). * Upgrade from jbuilder to dune (mirage/mirage-tcpip#391 @avsm) * Switch from topkg to dune-release (mirage/mirage-tcpip#391 @avsm) ### v3.6.0 (2019-01-04) * The IPv4 implementation now supports reassembly of IPv4 fragments (mirage/mirage-tcpip#375 by @hannesm) - using a LRU cache using up to 256KB memory - out of order fragments are supported - maximum number of fragments is 16 - timeout between first and last fragment is 10s - overlapping fragments are dropped * IPv6: use correct timeout value after first NS message (mirage/mirage-tcpip#334 @djs55) * Use `Ipaddr.pp` instead of `Ipaddr.pp_hum` due to upstream interface changes (mirage/mirage-tcpip#385 @hannesm). ### v3.5.1 (2018-11-16) * socket stack (tcp/udp): catch exception in recv_from and accept (mirage/mirage-tcpip#376 @hannesm) * use mirage-random-test for testing (Stdlibrandom got removed from mirage-random>1.2.0, mirage/mirage-tcpip#377 @hannesm) ### v3.5.0 (2018-09-16) * Ipv4: require Mirage_random.C, used for generating IPv4 identifier instead of using OCaml's stdlib Random directly (mirage/mirage-tcpip#371 @hannesm) * Tcp: use entire 32 bits at random for the initial sequence number, thanks to Spencer Michaels and Jeff Dileo of NCC Group for reporting (mirage/mirage-tcpip#371 @hannesm) * adjust to mirage-protocols 1.4.0 and mirage-stack 1.3.0 changes (mirage/mirage-tcpip#371 @hannesm) Arp no longer contains the type alias ethif Ethif no longer contains the type alias netif Static_ipv4 no longer contains the type alias ethif and prefix Ipv6 no longer contains the type alias ethif and prefix Mirage_protocols_lwt.IPV4 no longer contains the type alias ethif Mirage_protocols_lwt.UDPV4 and TCPV4 no longer contain the type alias ip * remove unused types: 'a config, netif, and id from socket and direct stack (mirage/mirage-tcpip#371 @hannesm) * remove usage of Result, depending on OCaml >= 4.03.0 (mirage/mirage-tcpip#372 @hannesm) ### v3.4.2 (2018-06-15) Note the use of the new TCP keep-alive feature can cause excessive amounts of memory to be used in some circumstances, see mirage/mirage-tcpip#367 * Ensure a zero UDP checksum is sent as 0xffff, not 0x0000 (mirage/mirage-tcpip#359 @stedolan) * Avoid leaking a file descriptor in the socket stack if the connection fails (mirage/mirage-tcpip#363 @hannesm) * Avoid raising an exception with `Lwt.fail` when `write` fails in the socket stack (mirage/mirage-tcpip#363 @hannesm) * Ignore `EBADF` errors in `close` in the socket stack (mirage/mirage-tcpip#366 @hannesm) * Emit a warning when TCP keep-alives are used (mirage/mirage-tcpip#368 @djs55) ### v3.4.1 (2018-03-09) * expose tcp_socket_options in the socket stack, fixing downstream builds (mirage/mirage-tcpip#356 @yomimono) * add missing dependencies and constraints (mirage/mirage-tcpip#354 @yomimono, mirage/mirage-tcpip#353 @rgrinberg) * remove leftover ocamlbuild files (mirage/mirage-tcpip#353 @rgrinberg) ### v3.4.0 (2018-02-15) * Add support for TCP keepalives (mirage/mirage-tcpip#338 @djs55) * Fix TCP deadlock (mirage/mirage-tcpip#343 @mfp) * Update the CI to test OCaml 4.04, 4.05, 4.06 (mirage/mirage-tcpip#344 @yomimono) ### v3.3.1 (2017-11-07) * Add an example for user-space `ping`, and some socket ICMPv4 fixes (mirage/mirage-tcpip#336 @djs55) * Make tcpip safe-string-safe (and buildable by default on OCaml 4.06.0) (mirage/mirage-tcpip#341 @djs55) ### v3.3.0 (2017-08-08) * Test with current mirage-www master (mirage/mirage-tcpip#323 @yomimono) * Improve the Tcp.Wire API (mirage/mirage-tcpip#325 @samoht) * Add dependency from stack-unix to io-page-unix (@avsm) * Replace dependency on cstruct.lwt with cstruct-lwt (mirage/mirage-tcpip#322 @yomimono) * Update to lwt 3.0 (mirage/mirage-tcpip#326 @samoht) * Replace oUnit with alcotest (mirage/mirage-tcpip#329 @samoht) * Fix stub linking on Xen (mirage/mirage-tcpip#332 @djs55) * Add support for ICMP sockets on Windows (mirage/mirage-tcpip#333 @djs55) ### v3.2.0 (2017-06-26) * port to jbuilder. Build time is now roughly 4-5x faster than the old oasis-based build system. * packs have been replaced by module aliases. ### v3.1.4 (2017-06-12) * avoid linking to cstruct.ppx in the compiled library and only use it at build time (mirage/mirage-tcpip#316 @djs55) * use improved packet size support in `mirage-vnetif>=0.4.0` to test the MTU fixes in mirage/mirage-tcpip#313. ### v3.1.3 (2017-05-23) * involve the IP layer's MTU in the TCP MSS calculation (hopefully correctly) (mirage/mirage-tcpip#313, by @yomimono) ### v3.1.2 (2017-05-14) * impose a maximum TCP MSS of 1460 to avoid sending over-large datagrams on 1500 MTU links (mirage/mirage-tcpip#309, by @hannesm) ### v3.1.1 (2017-05-14) * fix parsing 20-byte cstructs as ipv4 packets (mirage/mirage-tcpip#307, by @yomimono) * udp: payload length parse fix (mirage/mirage-tcpip#307, by @yomimono) * support lwt >= 2.7.0 (mirage/mirage-tcpip#308, by @djs55) ### v3.1.0 (2017-03-14) * implement MTU setting and querying in the Ethernet module (compatibility with mirage-protocols version 1.1.0), and use this value to inform TCP's MSS. (mirage/mirage-tcpip#288, by @djs55) * rename the ~payload argument of TCP/UDP marshallers to `~payload_len`, in an attempt to clarify that the payload will not be copied to the Cstruct.t returned by these functions (mirage/mirage-tcpip#301, by @talex5) * functorize ipv6 over a random implementation (mirage/mirage-tcpip#298, by @olleolleolle and @hannesm) * add tests for sending and receiving UDP packets over IPv6 (mirage/mirage-tcpip#300, by @mattgray) * avoid float in TCP RTO calculations. (mirage/mirage-tcpip#295, by @olleolleolle and @mattgray) * numerous bugfixes in header marshallers and unmarshallers (mirage/mirage-tcpip#301, by @talex5 and @yomimono) * replace polymorphic equality in `_packet.equals` functions (mirage/mirage-tcpip#302, by @yomimono) ### v3.0.0 (2017-02-23) * adapt to MirageOS 3 API changes (*many* PRs, from @hannesm, @samoht, and @yomimono): - replace error polyvars in many functions with result types - define and use error types - `connect` in various modules now returns the device directly or raises an exception - refer to mirage-protocols and mirage-stacks, rather than mirage-types * if no UDP source port is given to UDP.write, choose a random one (mirage/mirage-tcpip#272, by @hannesm) * remove `Ipv4.Routing.No_route_to_destination_address` exception; treat routing failures as normal packet loss in TCP (mirage/mirage-tcpip#269, by @yomimono) * Ipv6.connect takes a list of IPs (mirage/mirage-tcpip#268, by @yomimono) * remove exception "Refused" in TCP (mirage/mirage-tcpip#267, by @yomimono) * remove DHCP module. Users may be interested in the replacement charrua-core (mirage/mirage-tcpip#260, by @yomimono) * move Ipv4 to Static\_ipv4, which can be used by other IPv4 modules with their own configuration logic (mirage/mirage-tcpip#260, by @yomimono) * remove `mode` from STACKV4 record and configuration; Ipv4.connect now requires address parameters and the module exposes no methods for modifying them. (mirage/mirage-tcpip#260, by @yomimono) * remove unused `id` types no longer required by mirage-types (mirage/mirage-tcpip#255, by @yomimono) * overhaul how `random` is used and handled (mirage/mirage-tcpip#254 and others, by @hannesm) * fix redundant `memset` that zeroed out options in Tcp\_packet.Marshal.into\_cstruct (mirage/mirage-tcpip#250, by @balrajsingh) * add vnetif backend for triggering fast retransmit in iperf tests (mirage/mirage-tcpip#248, by @MagnusS) * fixes for incorrect timer values (mirage/mirage-tcpip#247, by @balrajsingh) * add vnetif backend that drops packets with no payload (mirage/mirage-tcpip#246, by @MagnusS) * fix a race when closing test pcap files (mirage/mirage-tcpip#246, by @MagnusS) ### v2.8.1 (2016-09-12) * Set the TCP congestion window correctly when going into fast-recovery mode. (mirage/mirage-tcpip#244, by @balrajsingh) * When TCP packet loss is discovered by timeout, allow transition into fast-recovery mode. (mirage/mirage-tcpip#244, by @balrajsingh) ### v2.8.0 (2016-04-04) * Provide an implementation for the ICMPV4 module type defined in mirage-types 2.8.0. Remove default ICMP handling from the IPv4 module, but preserve it in tcpip-stack-direct. (mirage/mirage-tcpip#195 by @yomimono) * Explicitly require the use of an OCaml compiler >= 4.02.3 . (mirage/mirage-tcpip#195 by @yomimono) * Explicitly depend on `result`. (mirage/mirage-tcpip#195 by @yomimono) ### v2.7.0 (2016-03-20) * Raise Invalid\_argument if given an invalid port number in listen_{tcp,udp}v4 (mirage/mirage-tcpip#173 by @matildah and mirage/mirage-tcpip#175 by @hannesm) * Improve TCP options marshalling/unmarshalling (mirage/mirage-tcpip#174 by @yomimono) * Add state tests and fixes for closure conditions (mirage/mirage-tcpip#177 mirage/mirage-tcpip#176 by @yomimono) * Remove bogus warning (mirage/mirage-tcpip#178 by @talex5) * Clean up IPv6 stack (mirage/mirage-tcpip#179 by @nojb) * RST checking from RFC5961 (mirage/mirage-tcpip#182 by @ppolv) * Transform EPIPE exceptions into `Eof (mirage/mirage-tcpip#183 by @djs55) * Improve error strings in IPv4 (mirage/mirage-tcpip#184 by @yomimono) * Replace use of cstruct.syntax with cstruct.ppx (mirage/mirage-tcpip#188 by @djs55) * Make the Unix subpackages optional, so the core builds on Win32 (mirage/mirage-tcpip#191 by @djs55) ### v2.6.1 (2015-09-15) * Add optional arguments for settings in ip v6 and v4 connects (mirage/mirage-tcpip#170, by @Drup) * Expose `Ipv4.Routing.No_route_to_destination_address` (mirage/mirage-tcpip#166, by @yomimono) ### v2.6.0 (2015-07-29) * ARP now handles ARP frames, not Ethernet frames with ARP payload (mirage/mirage-tcpip#164, by @hannesm) * Check length of received ethernet frame to avoid cstruct exceptions (mirage/mirage-tcpip#117, by @hannesm) * Pull arpv4 module out of ipv4. Also add unit-tests for the newly created ARP library (mirage/mirage-tcpip#155, by @yomimono) ### v2.5.1 (2015-07-07) * Fix regression introduced in 2.5.0 where packet loss could lead to the connection to become very slow (mirage/mirage-tcpip#157, MagnusS, @talex5, @yomimono and @balrajsingh) * Improve the tests: more logging, more tracing and compile to native code when available, etc (@MagnusS and @talex5) * Do not raise `Invalid_argument("Lwt.wakeup_result")` everytime a connection is closed. Also now pass the raised exceptions to `Lwt.async_exception_hook` instead of ignoring them transparently, so the user can decide to shutdown its application if something wrong happens (mirage/mirage-tcpip#153, mirage/mirage-tcpip#156, @yomomino and @talex5) * The `channel` library now lives in a separate repository and is released separately (mirage/mirage-tcpip#159, @samoht) ### v2.5.0 (2015-06-10) * The test runs now produce `.pcap` files (mirage/mirage-tcpip#141, by @MagnusS) * Strip trailing bytes from network packets (mirage/mirage-tcpip#145, by @talex5) * Add tests for uniform packet loss (mirage/mirage-tcpip#147, by @MagnusS) * fixed bug where in case of out of order packets the ack and window were set incorrectly (mirage/mirage-tcpip#140, mirage/mirage-tcpip#146) * Properly handle RST packets (mirage/mirage-tcpip#107, mirage/mirage-tcpip#148) * Add a `Log` module to control at runtime the debug statements which are displayed (mirage/mirage-tcpip#142) * Writing in a PCB which does not have the right state now returns an error instead of blocking (mirage/mirage-tcpip#150) ### v2.4.3 (2015-05-05) * Fix infinite loop in `Channel.read_line` when the line does not contain a CRLF sequence (mirage/mirage-tcpip#131) ### v2.4.2 (2015-04-29) * Fix a memory leak in `Channel` (mirage/mirage-tcpip#119, by @yomimono) * Add basic unit-test for channels (mirage/mirage-tcpip#119, by @yomimono) * Add alcotest testing templates * Modernize Travis CI scripts ### v2.4.1 (2015-04-21) * Merge between 2.4.0 and 2.3.1 ### v2.4.0 (2015-03-24) * ARP improvements (mirage/mirage-tcpip#118) ### v2.3.1 (2015-03-31) * Do not raise an assertion if an IP frame has extra trailing bytes (mirage/mirage-tcpip#221). ### v2.3.0 (2015-03-09) * Fix `STACKV4` for the `DEVICE` signature which has `connect` removed (in Mirage types 2.3+). ### v2.2.3 (2015-03-09) * Add ICMPv6 error reporting functions (mirage/mirage-tcpip#101) * Add universal IP address converters (mirage/mirage-tcpip#108) * Add `error_message` functions for human-readable errors (mirage/mirage-tcpip#98) * Improve debug logging for ICMP Destination Unreachable packets. * Filter incoming frames by MAC address to stop sending unnecessary RSTs. (mirage/mirage-tcpip#114) * Unhook unused modules `Sliding_window` and `Profiler` from the build. (mirage/mirage-tcpip#112) * Add an explicit `connect` method to the signatures. (mirage/mirage-tcpip#100) ### v2.2.2 (2015-01-11) * Readded tracing and ARP fixes which got accidentally reverted in the IPv6 merge. (mirage/mirage-tcpip#96) ### v2.2.1 (2014-12-20) * Use `Bytes` instead of `String` to begin the `-safe-string` migration in OCaml 4.02.0 (mirage/mirage-tcpip#93). * Remove dependency on `uint` to avoid the need for a C stub (mirage/mirage-tcpip#92). ### v2.2.0 (2014-12-18) Add IPv6 support. This changeset minimises interface changes to the existing `STACKV4` interfaces to faciliate a progressive merge. The only visible interface changes are: * `IPV4.set_ipv4_*` functions have been renamed `IPV4.set_ip_*` because they are shared between IPV4 and IPV6. * `IPV4.get_ipv4` and `get_ipv4_netmask` now return a `list` of `Ipaddr.V4.t` (again because this is the common semantics with IPV6.) * Several types that had `v4` in their names (like `IPV4.ipv4addr`) have lost that particle. ### v2.1.1 (2014-12-12) * Improve console printing for the DHCP client to output line breaks properly on Xen consoles. ### v2.1.0 (2014-12-07) * Build Xen stubs separately, with `CFLAGS` from `mirage-xen` 2.1.0+. This allows us to use the red zone under x86_64 Unix again. * Adding tracing labels and counters, which introduces a new dependency on the `mirage-profile` package. ### v2.0.3 (2014-12-05) * Fixed race waiting for ARP response (mirage/mirage-tcpip#86). * Move the the code that configures IPv4 address, netmask and gateways after receiving a successful lease out of the `Dhcp_clientv4` module and into `Stackv4` (mirage/mirage-tcpip#87) ### v2.0.2 (2014-12-01) * Add IPv4 multicast to MAC address mapping in IPv4 output processing (mirage/mirage-tcpip#81 from Luke Dunstan). * Improve formatting of DHCP console logging, including printing out options (mirage/mirage-tcpip#83). * Build with -mno-red-zone on x86_64 to avoid stack corruption on Xen (mirage/mirage-tcpip#80). ### v2.0.1 (2014-11-04) * Fixed race condition in the signalling between the rx/tx threads under load. * Experimentally switch to immediate ACKs in TCPv4 by default instead of delayed ones. ### v2.0.0 (2014-11-02) * Moved 1s complement checksum C code here from mirage-platform. * Depend on `Console_unix` and `Console_xen` instead of `Console`. * [socket] Do not return an `Eof` when writing 0-length buffer (mirage/mirage-tcpip#76). * [socket] Accept callbacks now run in async threads instead of being serialised (mirage/mirage-tcpip#75). ### v1.1.6 (2014-07-20) * Quieten down the stack logging rate by not announcing IPv6 packet discards. * Raise exception `Bad_option` for unparseable or invalid TCPv4 options (mirage/mirage-tcpip#57). * Fix linking error with module `Tcp_checksum` by lifting it into top library (mirage/mirage-tcpip#60). * Add `opam` file to permit easier local pinning, and fix Travis to use this. ### v1.1.5 (2014-06-18) * Ensure that DHCP completes before the application is started, so that unikernels that establish outgoing connections can do so without a race. (fix from Mindy Preston in mirage/mirage-tcpip#53, followup in mirage/mirage-tcpip#55) * Add `echo`, `chargen` and `discard` services into the `examples/` directory. (from Mindy Preston in mirage/mirage-tcpip#52). ### v1.1.4 (2014-06-03) * [tcp] Fully process the last `ACK` in a 3-way handshake for server connections. This ensures that a `FIN` is correctly transmitted upon application-initiated connection close. (fix from Mindy Preston in mirage/mirage-tcpip#51). ### v1.1.3 (2014-03-01) * Expose IPV4 through the STACKV4 interface. ### v1.1.2 (2014-03-27) * Fix DHCP variable length option parsing for MTU responses, which in turns improves robustness on Amazon EC2 (fix from @yomimono via mirage/mirage-tcpip#48) ### v1.1.1 (2014-02-21) * Catch and ignore top-level socket exceptions (mirage/mirage-tcpip#219). * Set `SO_REUSEADDR` on listening sockets for Unix (mirage/mirage-tcpip#218). * Adapt the Stack interfaces to the v1.1.1 mirage-types interface (see mirage/mirage#226 for details). ### v1.1.0 (2014-02-03) * Rewrite of the library as a set of functors that parameterize the stack across the `V1_LWT` module types from Mirage 1.1.x. This removes the need to compile separate Xen and Unix versions of the stack. ### v0.9.5 (2013-12-08) * Build for either Xen or Unix, depending on the value of the `OS` envvar. * Shift to the `mirage-types` 0.5.0+ interfaces, which breaks the socket backend (temporarily). * Port the direct stack to the new interfaces. * Add Travis CI scripts. ### v0.9.4 (2013-08-09) * Use the `Ipaddr` external library and remove the Homebrew equivalents in `Nettypes`. ### v0.9.3 (2013-07-18) * Changes in module Manager: Removed some functions from the `.mli (plug/unplug) and added some modifications in the way the Manager interacts with the underlying module Netif. The Netif.create function does not take a callback anymore. ### v0.9.2 (2013-07-09) * Improve TCP state machine for connection teardown. * Limit fragment number to 8, and coalesce buffers if it goes higher. * Adapt to mirage-platform-0.9.2 API changes. ### v0.9.1 (2013-06-12) * Depend on mirage-platform-0.9.1 direct tuntap interfaces. * Version bump to catch up with mirage-platform. ### v0.5.2 (2013-02-08) * Encourage scatter-gather I/O all the time, rather than playing tricks with packet header buffers. This simplifies the output path considerably and cuts minor heap allocations down. * Install the packed `cmx` along with the `cmxa` to ensure that the compiler can do cross-module optimization (this is not a fatal error, but will impact performance if the `cmx` file is not present). ### v0.5.1 (2012-12-20) * Update socket stack to use Cstruct 0.6.0 API ### v0.5.0 (2012-12-20) * Update Cstruct API to 0.6.0 * [tcp] write now blocks if the write buffer and write window are full ### v0.4.1 (2012-12-14) * Add iperf self-test that creates two VIFs and transmits across them. This is a useful local test which stresses the bridge code using just one VM. * Add support for attaching existing devices when initialising the network manager, via an optional `attached` parameter. * Constrain TCP connect to be a `unit Lwt.t` instead of a polymorphic return value. * Expose IPv4 netmask function. * Reduce ARP verbosity to the console. * Fix TCP fast recovery to wait until all in-flight packets are acked, rather then exiting early. ### v0.4.0 (2012-12-11) * Require OCaml-4.00.0 or higher, and add relevant build fixes to deal with module packing. ### v0.3.1 (2012-12-10) * Fix the DHCP client marshalling for IPv4 addresses. * Expose the interface MAC address in the Manager signature. * Tweak TCP ISN calculation to be more friendly on a 32-bit host. * Add Manager.create ?devs to control the number of Netif devices constructed by default. * Add Ethif.set/disable_promiscuous to permit directly tapping a network interface. ### v0.3.0 (2012-09-04) * Initial public release.
A few weeks ago I was able to run the iperf test w/uniform packet loss 150 times locally without timeout (as mentioned here) with the master branch. I just repeated the experiment with current master (f31810c) and after a few attempts I've been unable to run more than max 8 tests in a row. Release 2.5.0 (455263d) also times out frequently.
With rev aab5709 the test runs fine. To rule out a bug in the newest version of the test I've also tried using the tests from master with aab5709 and got the same result (no timeouts). I also tried to double the timeout, in case it was caused by the reduced performance of the recently merged debug branch, but the test still times out - the connection seems to just stall when the test fails (see pcap output below).
This is the command I use - it runs 100 iperf tests and terminates if one of them fails.
I've also increased the data size from 10mb back to 25mb in lib_tests/lib_iperf.ml, as this was set lower to reduce timeouts in Travis. With 10mb the test is less likely to timeout, but it is still unreliable (I was able to run 20 tests with 10mb vs 8 with 25mb).
Here are the last packets in the pcap output from three failed tests.
This test sent 92 dup ack's before stalling:
The text was updated successfully, but these errors were encountered: