Bypassing the kernel entirely is pretty normal in HPC applications. Infiniband implementations typically memory-map the device's registers into user-space so that applications can send and receive messages without a system call.
The issue GP talks about comes from the cost of context-switching on a syscall (going into "kernel mode", performing the call, then going back into "application mode"). There's no context switch in a unikernel.
Are you assuming SRV-IOV passthrough (which has its own performance profile) ? Because normal virt -definitely- hits a context switch when it goes from unikernels virtual NIC to real NIC, if not twice.
the OS isn't the bottleneck.
Curious, then why are we seeing articles here all the time on bypassing the linux kernel for low latency networking?