72 lines
		
	
	
		
			3.7 KiB
		
	
	
	
		
			YAML
		
	
	
	
	
	
			
		
		
	
	
			72 lines
		
	
	
		
			3.7 KiB
		
	
	
	
		
			YAML
		
	
	
	
	
	
| ---
 | |
| features:
 | |
|   - |
 | |
|     The libvirt driver now supports booting instances by asking for virtual
 | |
|     GPUs.
 | |
|     In order to support that, the operators should specify the enabled vGPU
 | |
|     types in the nova-compute configuration file by using the configuration
 | |
|     option ``[devices]/enabled_vgpu_types``. Only the enabled vGPU types can be
 | |
|     used by instances.
 | |
| 
 | |
|     For knowing which types the physical GPU driver supports for libvirt, the
 | |
|     operator can look at the sysfs by doing::
 | |
| 
 | |
|       ls /sys/class/mdev_bus/<device>/mdev_supported_types
 | |
| 
 | |
|     Operators can specify a VGPU resource in a flavor by adding in the flavor's
 | |
|     extra specs::
 | |
| 
 | |
|       nova flavor-key <flavor-id> set resources:VGPU=1
 | |
| 
 | |
|     That said, Nova currently has some caveats for using vGPUs.
 | |
| 
 | |
|     * For the moment, only a single type can be supported across one compute
 | |
|       node, which means that libvirt will create the vGPU by using that
 | |
|       specific type only. It's also possible to have two compute nodes having
 | |
|       different types but there is no possibility yet to specify in the flavor
 | |
|       which specific type we want to use for that instance.
 | |
| 
 | |
|     * Suspending a guest having vGPUs doesn't work yet given a libvirt concern
 | |
|       (it can't hot-unplug mediated devices from a guest). Workarounds using
 | |
|       other instance actions (like snapshotting the instance or shelving it)
 | |
|       are recommended until libvirt supports that. If a user asks to suspend
 | |
|       the instance, Nova will get an exception that will set the instance state
 | |
|       back to ``ACTIVE``, and you can see the suspend action in
 | |
|       ``os-instance-action`` API will be Error.
 | |
| 
 | |
|     * Resizing an instance with a new flavor that has vGPU resources doesn't
 | |
|       allocate those vGPUs to the instance (the instance is created without
 | |
|       vGPU resources). We propose to work around this problem by rebuilding the
 | |
|       instance once it has been resized so then it will have allocated vGPUs.
 | |
| 
 | |
|     * Migrating an instance to another host will have the same problem as
 | |
|       resize. In case you want to migrate an instance, make sure to rebuild
 | |
|       it.
 | |
| 
 | |
|     * Rescuing an instance having vGPUs will mean that the rescue image won't
 | |
|       use the existing vGPUs. When unrescuing, it will use again the existing
 | |
|       vGPUs that were allocated to the instance. That said, given Nova looks
 | |
|       at all the allocated vGPUs when trying to find unallocated ones, there
 | |
|       could be a race condition if an instance is rescued at the moment a new
 | |
|       instance asking for vGPUs is created, because both instances could use
 | |
|       the same vGPUs. If you want to rescue an instance, make sure to disable
 | |
|       the host until we fix that in Nova.
 | |
| 
 | |
|     * Mediated devices that are created by the libvirt driver are not persisted
 | |
|       upon reboot. Consequently, a guest startup would fail since the virtual
 | |
|       device wouldn't exist. In order to prevent that issue, when restarting
 | |
|       the compute service, the libvirt driver now looks at all the guest XMLs
 | |
|       to check if they have mediated devices, and if the mediated device no
 | |
|       longer exists, then Nova recreates it by using the same UUID.
 | |
| 
 | |
|     * If you use NVIDIA GRID cards, please know that there is a limitation with
 | |
|       the NVIDIA driver that prevents one guest to have more than one virtual
 | |
|       GPU from the same physical card. One guest can have two or more virtual
 | |
|       GPUs but then it requires each vGPU to be hosted by a separate physical
 | |
|       card. Until that limitation is removed, please avoid creating flavors
 | |
|       asking for more than one vGPU.
 | |
| 
 | |
|     We are working actively to remove or workaround those caveats, but please
 | |
|     understand that for the moment this feature is experimental given all the
 | |
|     above.
 | 
