I've read about this system in a white paper somewhere, but can't remember where, or any details, and haven't found it since.
The comment was in regard to GPU virtualisation with Type 1 (bare metal) hypervisors. It noted that some system or other had used a different path from vGPU hardware. Instead of needing a card that could provide native vGPU or shared GPU, it used a different method. From memory, one installed an OS that was already able to support multiple desktop users (it suggested Windows Server 2016) as one VM, and then with some kind of mediating driver or shim in the hypervisor+server VM, the other VMs could redirect just their GPU calls to that VM, accelerating their GPU needs.
The difference being that in a normal shared user context (Windows Server/RDS), the multi user OS hosts and manages the users sessions+processes. As described, in this design, the only things the Windows Server VM handles, are requests initiated by the hypervisor to set up effectively "null" user accounts as required (to leverage the multiuser graphical sharing inherent in Windows Server), and GPU calls relayed from the hypervisor on behalf of VMs which are presented as graphics calls from those users, whose results are presumably forwarded back to the originating VMs via the hypervisor, or perhaps even directed to the VM operator via RDS - I'm not sure, that part wasn't described.
(Other than that, none of the other Windows Server VM capabilities are used at all. So its role sounds more like it's being leveraged to create a GPU sharing appliance able to "convert" a single user GPU to a multiuser GPU, via Windows' native full multiuser GPU sharing ; it isn't being used as an actual "server OS".)
The overall benefit was to provide what sounds like a "poor man's vGPU" leveraging existing GPU sharing mechanisms baked into Windows Server - a system that could take almost any decent consumer graphics card supported by Windows Server, and share/virtualise it among other VMs, without needing a specialist high-end card that had sGPU/vGPU built-in.
It also sounded like an approach that could be less susceptible to prohibitive licensing/HW cost (AMD/nVidia vGPU), EOL drivers (nVidia K1/K2), limited main core count due to inclusion of GPU cores (Iris Pro), or narrowed range of graphics APIs, and would also be considerably future-proof. So it sounded ideal for a small home VM server/home lab.
Does anyone know what this might refer to, or a system like this? I think it might have been when I was looking into Xen/Citrix, but I can't find a specific reference there either.
Talk to vendors to find this elusive document and what virtualized GPU architecture really looks like. Microsoft, Citrix, hypervisor, GPU. Do your research, and wait until you have a use case and budget if vGPU is too expensive for your test lab now.
I mention all of those vendors because vGPU is highly likely to be expensive and complicated.
And now a brief survey of multi user graphics on Windows Server, from least to most use of graphics hardware.
Windows Advanced Rasterization Platform (WARP) is a software rasterizer that provides graphics via Direct3D without a GPU. This is the fallback software renderer.
In the category of paravirtualization there was Microsoft RemoteFX vGPU and VMware Virtual Shared Graphics Acceleration (vSGA). I say was because neither are being developed further, they got sick of maintaining an API shim.
Supposedly paravirt on Windows will be branded GPU-PV and Windows will understand partitioning as GPU-P. I cannot find a lot of documentation on this at the moment.
Graphics card vendors have their own sharing options, if you get a supported GPU and its drivers. Check hypervisor specific HCLs, XenServer is clear that vGPU is only on certain Nvidia Tesla models. In some cases, there is separate per user license fees for the technology.
And then finally Direct Device Assignment (DDA), dedicating hardware to a VM. Expensive, and vastly complicates security, HA and live migration.