Hey YC, I helped Cohere scale GPUs to a 10k+ total GPU count for hybrid training and inference - if you’re curious about anything I wrote in the article I think there’s a good opportunity to read / reply to your comment. This article is made for fun.
Hey YC, I helped Cohere scale GPUs to a 10k+ total GPU count for hybrid training/inference computer - if you’re curious about anything I wrote in the article I think there’s a good opportunity to read / reply to your comment here. Please leave a comment!
I had the opportunity to help Cohere work on scaling transformers in the last year. If you have any questions about the article leave them below and I’ll do my best to answer openly. :)
I had the opportunity to help Cohere work on scaling transformers in the last year. If you have any questions about the article leave them below and I’ll do my best to answer openly. :)
I had the opportunity to help Cohere work on scaling transformers in the last year. If you have any questions about the article leave them below and I’ll do my best to answer openly. :)
I'm the the co-author of the GPU Virtual Machine (GVM project), and LibVF.IO. We just announced our enterprise product based on GVM called GVM Server. I'd love to hear what you all think of the work we've done and give suggestions on where we can improve in the future!
I was thinking this over in the past couple days and I think the words 'that they are aware' is really key here.
Ideally if GPU virtualization were sufficiently widespread as is support today for Intel VT-d, and AMD-v (IOMMU APIs for hardware assisted CPU virtualization) then software could make use of these functions without the user being aware of it. We're in a situation similar to that of CPU virtualization without hardware assistance with the early Xenoservers project from Cambridge (what would later become the Xen hypervisor and XenSource company). At that time there was not widespread support for virtualization assistance on most CPUs, and as a result Xen used methods like ring de-privileging to place the entire guest in ring 3 (userspace and kernel) while the hypervisor ran in ring 0 in order to virtualize any ordinary CPU model - my understanding is these were known as PV-guests (paravirtual guests). Over time however CPU companies began to introduce widespread support for features like VT-d and AMD-v to all of their models of CPU which enabled VM-exits/context save-restore with the use of shadow registers rather than ring de-privileging while Intel added new 'virtualization enhancements' through feature suites like vPro (SGX2 for example) which were only available on certain models of CPU (for example Xeon devices). Xen would adopt VT-d and AMD-v as HVM-guests (Hardware assisted virtualization) as they became more common on ubiquitous hardware and at the same time commercial forks of Xen would take advantage of these vPro features (like SGX2) for enterprise and high security government use-cases:
Since it's now practical to virtualize any GPU device (as was the case in the past with early Xen on CPUs supporting virtualization for various use-cases regardless of whether or not the hardware provided assistance mechanisms) it might then be time to start moving to a new paradigm of 'enterprise' vs. 'consumer' - in other words new 'virtualization enhancements' (similar to vPro on Intel's Xeons, ect..) are developed for enterprise GPUs (for example shadow page deduplication in VRAM, import/export of redundant objects between IO Virtual Address buffers, IOMMU protected balloon/deballoon, ect..) and basic hardware assistance mechanisms like SR-IOV & SIOV are enabled by default, across the board:
That depends on your use-case. In general I would recommend you consider purchasing Nvidia's GPUs for the best price/performance and GVM support. Intel's Xe architecture is currently improving but the performance isn't quite there for a number of use-cases however some appear to work quite well and I expect that will improve with time. The 2080Ti works well with current software. If you are a developer and would like to help us improve support for devices you can purchase a 3090Ti (support in GVM for this device is under active development).
Love this project. I nerded out on GPU passthrough several years ago and learned a lot. GPU sharing would be a killer feature for some of my workflows.
Recently, after being a long-time Nvidia customer, for Linux systems I've switched to AMD GPUs and have been very impressed with the amdgpu driver quality relative to Nvidia's proprietary driver. Do you have any thoughts on what the LibVF/GVM future looks like for AMD devices?
It is my understanding that both NVIDIA and AMD try to lock their vGPU functionality behind enterprise paid features and hardware. it is a big shame really. If people could run vGPUs in Linux at near native speed, the purely Windows market would shrink much quicker.
Ya, that's accurate. The precise driver implementation matters a lot. Having said that there are some good 'best practices' that seem to make a difference. In my opinion 'IOMMU Aware Mediated Device' could also make some much needed improvements here as it would allow for more granular IOMMU allocations - perhaps this mode could help further support the 'App VMs' use-case using shared work queues without breaking IO virtual address translation: