I honestly think the theory of the NIC messing up one of the offloaded tasks is a likely one, if I remember other strange errors that have happened in this context. To bad that he doesn't have access to the machine to do more thorough tests.
My money's also on a NIC checksum/segmentation offloading bug. Even though it apparently isn't controllable via the exposed parameters, the card does have these features. "TCP/UDP checksum and segmentation offload (IPv4 and IPv6)" on the spec sheet https://www-ssl.intel.com/content/dam/doc/product-brief/8257...
In the past to rule out NIC problems I have temporarily used a "dumb NIC" like a 3C905 or other NE2000-compatible as a replacement. They're slow because they rely on the host CPU for much of the processing, but a good fallback to use when you want something that is known to work reliably.
Maybe I just haven't hit the right combination of great nic and questionable CPU, but I've seen segmentation offloading result in weird/bad things on the wire, and when I turn it off, I don't see any difference in CPU, so I don't see the point of ever having it on. Specifically for transmit offloading, with dual xeon 2690s, doing https serving via 2x 10gbps nics.
Additionally, I've seen the effects of weird bugs with receive offloading in Linux routers as well: If packets get combined before forwarding, they can be dropped because of the mtu on the next hop interface, so the performance enhancement is actually decreasing performance (and since it's a middle box, I can't do anything except hope my packets don't arrive close enough in time to get aggregated)
should this not be "Windows corrupting UDP Datagrams in some (possibly lower chance) cases"? The list of circumstances for the bug to occur (from the article):
1. UDP protocol. (Duh!)
2. Multicast sends. (Does not happen with unicast UDP.)
3. A process on the same machine must be joined to the same multicast group as being sent.
4. Window's IP MTU size set to smaller than the default 1500 (I tested 1300).
5. Sending datagrams large enough to require fragmentation by the reduced MTU, but still small enough not to require fragmentation with a 1500-byte MTU.
> should this not be "Windows corrupting UDP Datagrams in some (possibly lower chance) cases"?
I mean, I guess, but I kinda assumed as much, since if Windows was corrupting UDP at any significant frequency certainly it would have been noticed by now.
I don't know why you set the MTU to a lower value... What I do know, is that I've set it to 65535; my machine now transmits 64KB instead of just 1.5KB (per packet)... Can you imagine what would happen if I set this to UInt32.MaxValue...?
Its generally a terrible idea to increase UDP MTU. There's no reliable reassembly in UDP. If you miss a single packet (out of say 1000) then all are simply discarded silently.
In fact, some versions of Linux didn't do UDP reassembly properly until a year ago (was a topic here on HN). Reordered UDP wasn't dealt with properly if I recall.
There are routers that drop UDP for any reason or no reason. And so on. So UDP is the red-headed stepchild of protocols, with little testing going on and lots of issues.
I recommend putting a protocol on top of any UDP transfer you code, and never increasing the MTU.
The UDP packet, and what goes out on the wire, are different things. IP-over-Ethernet for instance does not send out data in units larger than 1500 bytes; often smaller over WiFi. So UDP packets larger than 1500 bytes will be divided up, and reassembled on the receive end.