diff --git a/doc/source/reference/block-device-structs.rst b/doc/source/reference/block-device-structs.rst new file mode 100644 index 000000000000..777816121441 --- /dev/null +++ b/doc/source/reference/block-device-structs.rst @@ -0,0 +1,224 @@ +========================== +Driver BDM Data Structures +========================== + +In addition to the :doc:`API BDM data format ` +there are also several internal data structures within Nova that map out how +block devices are attached to instances. This document aims to outline the two +general data structures and two additional specific data structures used by the +libvirt virt driver. + +.. note:: + + This document is based on an email to the openstack-dev mailing + list by Matthew Booth below provided as a primer for developers working on + virt drivers and interacting with these data structures. + + http://lists.openstack.org/pipermail/openstack-dev/2016-June/097529.html + +.. note:: + + References to local disks in the following document refer to any + disk directly managed by nova compute. If nova is configured to use RBD or + NFS for instance disks then these disks won't actually be local, but they + are still managed locally and referred to as local disks. As opposed to RBD + volumes provided by Cinder that are not considered local. + +Generic BDM data structures +=========================== + +``BlockDeviceMapping`` +---------------------- + +The 'top level' data structure is the ``BlockDeviceMapping`` (BDM) object. It +is a ``NovaObject``, persisted in the DB. Current code creates a BDM object for +every disk associated with an instance, whether it is a volume or not. + +The BDM object describes properties of each disk as specified by the user. It +is initially from a user request, for more details on the format of these +requests please see the :doc:`Block Device Mapping in Nova +<../user/block-device-mapping>` document. + +The Compute API transforms and consolidates all BDMs to ensure that all disks, +explicit or implicit, have a BDM, and then persists them. Look in +``nova.objects.block_device`` for all BDM fields, but in essence they contain +information like (source_type='image', destination_type='local', +image_id='), or equivalents describing ephemeral disks, swap disks +or volumes, and some associated data. + +.. note:: + + BDM objects are typically stored in variables called ``bdm`` with lists + in ``bdms``, although this is obviously not guaranteed (and unfortunately + not always true: ``bdm`` in ``libvirt.block_device`` is usually a + ``DriverBlockDevice`` object). This is a useful reading aid (except when + it's proactively confounding), as there is also something else typically + called ``block_device_mapping`` which is not a ``BlockDeviceMapping`` + object. + +``block_device_info`` +--------------------- + +Drivers do not directly use BDM objects. Instead, they are transformed into a +different driver-specific representation. This representation is normally +called ``block_device_info``, and is generated by +``virt.driver.get_block_device_info()``. Its output is based on data in BDMs. +``block_device_info`` is a dict containing: + +``root_device_name`` + Hypervisor's notion of the root device's name +``ephemerals`` + A list of all ephemeral disks +``block_device_mapping`` + A list of all cinder volumes +``swap`` + A swap disk, or None if there is no swap disk + +The disks are represented in one of two ways, depending on the specific +driver currently in use. There's the 'new' representation, used by the libvirt +and vmwareAPI drivers, and the 'legacy' representation used by all other +drivers. The legacy representation is a plain dict. It does not contain the +same information as the new representation. + +The new representation involves subclasses of +``nova.block_device.DriverBlockDevice``. As well as containing different +fields, the new representation significantly also retains a reference to the +underlying BDM object. This means that by manipulating the +``DriverBlockDevice`` object, the driver is able to persist data to the BDM +object in the DB. + +.. note:: + + Common usage is to pull ``block_device_mapping`` out of this + dict into a variable called ``block_device_mapping``. This is not a + ``BlockDeviceMapping`` object, or list of them. + +.. note:: + + If ``block_device_info`` was passed to the driver by compute manager, it + was probably generated by ``_get_instance_block_device_info()``. + By default, this function filters out all cinder volumes from + ``block_device_mapping`` which don't currently have ``connection_info``. + In other contexts this filtering will not have happened, and + ``block_device_mapping`` will contain all volumes. + +.. note:: + + Unlike BDMs, ``block_device_info`` does not currently represent all + disks that an instance might have. Significantly, it will not contain any + representation of an image-backed local disk, i.e. the root disk of a + typical instance which isn't boot-from-volume. Other representations used + by the libvirt driver explicitly reconstruct this missing disk. + +libvirt driver specific BDM data structures +=========================================== + +``instance_disk_info`` +---------------------- + +The virt driver API defines a method ``get_instance_disk_info``, which returns +a JSON blob. The compute manager calls this and passes the data over RPC +between calls without ever looking at it. This is driver-specific opaque data. +It is also only used by the libvirt driver, despite being part of the API for +all drivers. Other drivers do not return any data. The most interesting aspect +of ``instance_disk_info`` is that it is generated from the libvirt XML, not +from nova's state. + +.. note:: + + ``instance_disk_info`` is often named ``disk_info`` in code, which + is unfortunate as this clashes with the normal naming of the next + structure. Occasionally the two are used in the same block of code. + +.. note:: + + RBD disks (including non-volume disks) and cinder volumes + are not included in ``instance_disk_info``. + +``instance_disk_info`` is a list of dicts for some of an instance's disks. Each +dict contains the following: + +``type`` + libvirt's notion of the disk's type +``path`` + libvirt's notion of the disk's path +``virt_disk_size`` + The disk's virtual size in bytes (the size the guest OS sees) +``backing_file`` + libvirt's notion of the backing file path +``disk_size`` + The file size of path, in bytes. +``over_committed_disk_size`` + As-yet-unallocated disk size, in bytes. + +``disk_info`` +------------- + +.. note:: + + As opposed to ``instance_disk_info``, which is frequently called + ``disk_info``. + +This data structure is actually described pretty well in the comment block at +the top of ``nova.virt.libvirt.blockinfo``. It is internal to the libvirt +driver. It contains: + +``disk_bus`` + The default bus used by disks +``cdrom_bus`` + The default bus used by cdrom drives +``mapping`` + Defined below + +``mapping`` is a dict which maps disk names to a dict describing how that disk +should be passed to libvirt. This mapping contains every disk connected to the +instance, both local and volumes. + +First, a note on disk naming. Local disk names used by the libvirt driver are +well defined. They are: + +``disk`` + The root disk +``disk.local`` + The flavor-defined ephemeral disk +``disk.ephX`` + Where X is a zero-based index for BDM defined ephemeral disks +``disk.swap`` + The swap disk +``disk.config`` + The config disk + +These names are hardcoded, reliable, and used in lots of places. + +In ``disk_info``, volumes are keyed by device name, eg 'vda', 'vdb'. Different +buses will be named differently, approximately according to legacy Linux +device naming. + +Additionally, ``disk_info`` will contain a mapping for 'root', which is the +root disk. This will duplicate one of the other entries, either 'disk' or a +volume mapping. + +Each dict within the ``mapping`` dict contains the following 3 required fields +of bus, dev and type with two optional fields of format and ``boot_index``: + +``bus``: + The guest bus type ('ide', 'virtio', 'scsi', etc) +``dev``: + The device name 'vda', 'hdc', 'sdf', 'xvde' etc +``type``: + Type of device eg 'disk', 'cdrom', 'floppy' +``format`` + Which format to apply to the device if applicable +``boot_index`` + Number designating the boot order of the device + +.. note:: + + ``BlockDeviceMapping`` and ``DriverBlockDevice`` store boot index + zero-based. However, libvirt's boot index is 1-based, so the value stored + here is 1-based. + +.. todo:: + + Add a section for the per disk ``disk.info`` file within instance + directory when using the libvirt driver. diff --git a/doc/source/reference/index.rst b/doc/source/reference/index.rst index 09596798650d..15ea46eb71fc 100644 --- a/doc/source/reference/index.rst +++ b/doc/source/reference/index.rst @@ -39,6 +39,8 @@ The following is a dive into some of the internals in nova. works in nova to isolate groups of hosts. * :doc:`/reference/attach-volume`: Describes the attach volume flow, using the libvirt virt driver as an example. +* :doc:`/reference/block-device-structs`: Block Device Data Structures + .. # NOTE(amotoki): toctree needs to be placed at the end of the secion to # keep the document structure in the PDF doc. @@ -59,6 +61,7 @@ The following is a dive into some of the internals in nova. isolate-aggregates api-microversion-history attach-volume + block-device-structs Debugging ========= diff --git a/doc/source/user/block-device-mapping.rst b/doc/source/user/block-device-mapping.rst index 4648b3585ba0..b43f01de8b46 100644 --- a/doc/source/user/block-device-mapping.rst +++ b/doc/source/user/block-device-mapping.rst @@ -48,6 +48,9 @@ When we talk about block device mapping, we usually refer to one of two things virt driver code). We will refer to this format as 'Driver BDMs' from now on. + For more details on this please refer to the :doc:`Driver BDM Data + Structures <../reference/block-device-structs>` refernce document. + .. note:: The maximum limit on the number of disk devices allowed to attach to @@ -55,8 +58,8 @@ When we talk about block device mapping, we usually refer to one of two things :oslo.config:option:`compute.max_disk_devices_to_attach`. -Data format and its history ----------------------------- +API BDM data format and its history +----------------------------------- In the early days of Nova, block device mapping general structure closely mirrored that of the EC2 API. During the Havana release of Nova, block device