
We currently have three cells v2 documents in-tree: - A 'user/cellsv2-layout' document that details the structure or architecture of a cells v2 deployment (which is to say, any modern nova deployment) - A 'user/cells' document, which is written from a pre-cells v2 viewpoint and details the changes that cells v2 *will* require and the benefits it *would* bring. It also includes steps for upgrading from pre-cells v2 (that is, pre-Pike) deployment or a deployment with cells v1 (which we removed in Train and probably broke long before) - An 'admin/cells' document, which doesn't contain much other than some advice for handling down cells Clearly there's a lot of cruft to be cleared out as well as some centralization of information that's possible. As such, we combine all of these documents into one document, 'admin/cells'. This is chosen over 'users/cells' since cells are not an end-user-facing feature. References to cells v1 and details on upgrading from pre-cells v2 deployments are mostly dropped, as are some duplicated installation/configuration steps. Formatting is fixed and Sphinx-isms used to cross reference config option where possible. Finally, redirects are added so that people can continue to find the relevant resources. The result is (hopefully) a one stop shop for all things cells v2-related that operators can use to configure and understand their deployments. Change-Id: If39db50fd8b109a5a13dec70f8030f3663555065 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
76 lines
3.1 KiB
ReStructuredText
76 lines
3.1 KiB
ReStructuredText
Affinity policy violated with parallel requests
|
|
===============================================
|
|
|
|
Problem
|
|
-------
|
|
|
|
Parallel server create requests for affinity or anti-affinity land on the same
|
|
host and servers go to the ``ACTIVE`` state even though the affinity or
|
|
anti-affinity policy was violated.
|
|
|
|
Solution
|
|
--------
|
|
|
|
There are two ways to avoid anti-/affinity policy violations among multiple
|
|
server create requests.
|
|
|
|
Create multiple servers as a single request
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Use the `multi-create API`_ with the ``min_count`` parameter set or the
|
|
`multi-create CLI`_ with the ``--min`` option set to the desired number of
|
|
servers.
|
|
|
|
This works because when the batch of requests is visible to ``nova-scheduler``
|
|
at the same time as a group, it will be able to choose compute hosts that
|
|
satisfy the anti-/affinity constraint and will send them to the same hosts or
|
|
different hosts accordingly.
|
|
|
|
.. _multi-create API: https://docs.openstack.org/api-ref/compute/#create-multiple-servers
|
|
.. _multi-create CLI: https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-create
|
|
|
|
Adjust Nova configuration settings
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
When requests are made separately and the scheduler cannot consider the batch
|
|
of requests at the same time as a group, anti-/affinity races are handled by
|
|
what is called the "late affinity check" in ``nova-compute``. Once a server
|
|
lands on a compute host, if the request involves a server group,
|
|
``nova-compute`` contacts the API database (via ``nova-conductor``) to retrieve
|
|
the server group and then it checks whether the affinity policy has been
|
|
violated. If the policy has been violated, ``nova-compute`` initiates a
|
|
reschedule of the server create request. Note that this means the deployment
|
|
must have :oslo.config:option:`scheduler.max_attempts` set greater than ``1``
|
|
(default is ``3``) to handle races.
|
|
|
|
An ideal configuration for multiple cells will minimize :ref:`upcalls <upcall>`
|
|
from the cells to the API database. This is how devstack, for example, is
|
|
configured in the CI gate. The cell conductors do not set
|
|
:oslo.config:option:`api_database.connection` and ``nova-compute`` sets
|
|
:oslo.config:option:`workarounds.disable_group_policy_check_upcall` to
|
|
``True``.
|
|
|
|
However, if a deployment needs to handle racing affinity requests, it needs to
|
|
configure cell conductors to have access to the API database, for example:
|
|
|
|
.. code-block:: ini
|
|
|
|
[api_database]
|
|
connection = mysql+pymysql://root:a@127.0.0.1/nova_api?charset=utf8
|
|
|
|
The deployment also needs to configure ``nova-compute`` services not to disable
|
|
the group policy check upcall by either not setting (use the default)
|
|
:oslo.config:option:`workarounds.disable_group_policy_check_upcall` or setting
|
|
it to ``False``, for example:
|
|
|
|
.. code-block:: ini
|
|
|
|
[workarounds]
|
|
disable_group_policy_check_upcall = False
|
|
|
|
With these settings, anti-/affinity policy should not be violated even when
|
|
parallel server create requests are racing.
|
|
|
|
Future work is needed to add anti-/affinity support to the placement service in
|
|
order to eliminate the need for the late affinity check in ``nova-compute``.
|