diff --git a/specs/backlog/standalone-networking.rst b/specs/backlog/standalone-networking.rst new file mode 100644 index 00000000..570e81cb --- /dev/null +++ b/specs/backlog/standalone-networking.rst @@ -0,0 +1,672 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +=================================== +Standalone Networking Configuration +=================================== + +https://bugs.launchpad.net/ironic/+bug/2113769 + +This document proposes a mechanism for configuring switch port attributes +during the management of baremetal nodes. The aim is to enable Ironic to +support this functionality without delegating networking tasks to Neutron. +This new mechanism, initiated by the existing network driver interface in the +Conductor, would delegate actual switch port configuration to a standalone +service. This service could be co-located with existing Ironic services or +managed independently, on a separate server, by an operator group dedicated +to network equipment. + +Problem description +=================== + +Operators seek to automate network configuration to enable node deployment +without manual pre-configuration of switch ports. Nodes connect to +switches via one or more network interfaces, each requiring specific +neighboring switch configurations for their intended use. These interfaces +may require access port or trunk port configurations. Additionally, link +aggregation might be required to enhance a node's networking capacity or +redundancy. Normally, the configuration operations must be completed manually +before the nodes can be deployed by Ironic. + +Managing nodes and networking equipment often falls to separate organizational +groups. This division of responsibility introduces security concerns regarding +who can modify switch configurations and to what extent. Since fine-grained +authorization varies across switch vendors, this service should offer basic +controls to address operator security concerns; thereby limiting what +operations could be attempted by Ironic. + +Proposed change +=============== + +The implementation proposed is to introduce a new standalone service to execute +the interactions with the network switch equipment. This new service includes +a JSON RPC front end API to be accessed by a network driver running within the +context of the Conductor. During the normal management lifecycle of baremetal +nodes, any ports defined on a node will be processed by the network driver as +is already the case in the current implementation. + +It is the responsibility of the driver to ensure that related switch +configurations are updated according to the ``switchport`` and +``portchannel`` information contained within the ``extra`` property of the +port. Existing network drivers can interface with the Neutron subsystem to +manage switch configurations, but the solution proposed in this design +eliminates the need to include other OpenStack components and instead provides +an end-to-end Ironic only solution. + +.. note:: The usage of the ``extra`` property is an interim approach to enable the new + functionality. The long term goal is to replace these properties with a new + set of API models and endpoints that are specific to the new functionality. + The intent is to explore the use of this new functionality using the + ``extra`` property as a means to validate the new functionality. Once the + usecases are validated and better understood we can promote the new contents + of the ``extra`` property to a new set of API models and endpoints. + +.. note:: The standalone service may need to consider the implications of + issuing commands concurrently to the same switch. This could lead to + unexpected results under certain circumstances. The standalone service + should be designed to handle this scenario gracefully. + +.. code-block:: + + ┌──────────────────────┐ + │ │ + │ Ironic API │ + │ │ + └──────────┬───────────┘ + │ + │ ┌────────────────────────┐ + ┌──────────▼───────────┐ │ │ + │ │ │ Ironic Standalone │ + │ Ironic Conductor │ │ Networking Service │ ┌───────────────┐ + │ ┌───────────┤ │ ┌────────────┤ │ │ + │ │ Network ┼─────► │ Switch ┼─────► TOR Switch │ + │ │ Driver │ │ │ Driver │ │ │ + └──────────└───────────┘ └───────────└────────────┘ └───────────────┘ + + +Alternatives +------------ +If we could assume that all network switches implemented some form of fine +grained access controls that could be applied to specific user credentials then +we could forgo building an access control mechanism and instead rely on the +access rights granted to the user credentials assigned to the Ironic service. +This would eliminate the need to implement this mechanism as a standalone +service and instead we could simply embed the functionality directly into the +Conductor by way of a custom network driver that could directly interact with +the network switches. + +.. code-block:: + + ┌──────────────────────┐ + │ │ + │ Ironic API │ + │ │ + └──────────┬───────────┘ + │ + │ + ┌──────────▼───────────┐ + │ │ + │ Ironic Conductor │ ┌───────────────────┐ + │ ┌───────────┤ │ │ + │ │ Network ┼────────►─ TOR Switch │ + │ │ Driver │ │ │ + └──────────└───────────┘ └───────────────────┘ + +Of course, the access control requirement is only one reason for having a +separate service and we would need to assess whether other requirements would +also push us towards a standalone service (e.g., would it be beneficial to +have the actual switch interactions performed on a separate service for +performance/scalability reasons? Assuming the driver operations are +synchronous to this other service presumably performance/scalability would +not be improved by having a separate service). + +Data model impact +----------------- +There are no planned changes to existing data models. Switch port +configuration details are to be stored in the ``extra`` property of existing +port objects; therefore, existing port related data model schemas can be used +without modification. + +State Machine Impact +-------------------- +None + +REST API impact +--------------- +There are no planned changes to existing API schemas or endpoints. Switch port +configuration details are stored in the ``extra`` property of a port or +portgroup; therefore, existing port and portgroup related API endpoints and +schemas can be used without modification. The ``extra`` property should be +populated with a dictionary conforming to this schema when related to a port +object. + +.. code-block:: + + { + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "Switchport Configuration", + "description": "Schema for defining switchport configurations based on + mode (trunk or access)", + "type": "object", + "properties": { + "switchport": { + "type": "object", + "properties": { + "mode": { + "type": "string", + "enum": ["trunk", "access"] + }, + "native_vlan": { + "type": "integer", + "minimum": 1, + "description": "The native VLAN ID for the switchport. If not + supplied then the switch global default VLAN ID is used." + }, + "allowed_vlans": { + "type": "array", + "items": { + "type": "integer", + "minimum": 1 + }, + "minItems": 1, + "description": "List of allowed VLANs for trunk mode. Only + applicable, and is mandatory, if mode=trunk." + } + }, + "required": [ + "mode", + ] + } + }, + "required": [ + "switchport", + ], + } + +For example, the following data is stored in the ``extra`` property of a port +to specify that its switch port must be configured as a trunk port having a +specific default VLAN and a set of allowed VLANs. + +.. code-block:: + + {'switchport': + {'mode': 'trunk', + 'native_vlan': 1, + 'allowed_vlans': [2, 3, 4] + } + } + +The following data is stored in the ``extra`` property of a port to specify +that its switch port must be configured as an access port having a specific +default VLAN. + +.. code-block:: + + {'switchport': + {'mode': 'access', + 'native_vlan': 1, + } + } + +If related to a portgroup object then a similar schema is supported but with +differences applicable to switch port channels only. + +.. code-block:: + + { + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "Switchport Configuration", + "description": "Schema for defining switch port channel configurations + based on mode (e.g., trunk or access) and aggregation mode (e.g., LACP + or static)", + "type": "object", + "properties": { + "portchannel": { + "type": "object", + "properties": { + "mode": { + "type": "string", + "enum": ["trunk", "access"] + }, + "native_vlan": { + "type": "integer", + "minimum": 1, + "description": "The native VLAN ID for the switchport. If not + supplied then the switch global default VLAN ID is used." + }, + "allowed_vlans": { + "type": "array", + "items": { + "type": "integer", + "minimum": 1 + }, + "minItems": 1, + "description": "List of allowed VLANs for trunk mode. Only + applicable, and is mandatory, if mode=trunk." + }, + "aggregation_mode": { + "type": "string", + "enum": ["lacp", "static"], + }, + }, + "required": [ + "mode", + "aggregation_mode", + ] + } + }, + "required": [ + "portchannel", + ], + } + +For example, the following data is stored in the ``extra`` property of a +portgroup to specify that a corresponding portchannel must be created and +managed on the switch. The portchannel should be configured as a trunk port +having a specific default VLAN and a set of allowed VLANs, and operate in the +LACP link aggregation mode. + +.. code-block:: + + {'portchannel': + {'mode': 'trunk', + 'native_vlan': 1, + 'allowed_vlans': [2, 3, 4], + 'aggregation_mode': 'lacp' + } + } + +The following data is stored in the ``extra`` property of a portgroup to +specify that a corresponding portchannel must be created and managed on the +switch. The portchannel must be configured as an access port having a specific +default VLAN, and operation in the static link aggregation mode. + +.. code-block:: + + {'portchannel': + {'mode': 'access', + 'native_vlan': 1, + 'aggregation_mode': 'static' + } + } + + +Client (CLI) impact +------------------- +None + +"openstack baremetal" CLI +~~~~~~~~~~~~~~~~~~~~~~~~~ +None + +"openstacksdk" +~~~~~~~~~~~~~~ +None + +RPC API impact +-------------- + +1. Conductor RPC API: + None + +2. Standalone Service API: + The RPC API used between the driver and the standalone networking service + can be defined as follows + ++------------------------------+---------------------------------------------+ +| Method | Signature | ++==============================+=============================================+ +| update_port | .. code-block:: | +| | | +| | {"switch_id": "xx:xx:xx:xx:xx:xx", | +| | "port_name": "xxx", | +| | "description": "xxx", | +| | "mode": "[access|trunk]", | +| | "default_vlan": n, | +| | "allowed_vlans": [x, y, z], | +| | "portchannel_name": "xxx"} | ++------------------------------+---------------------------------------------+ +| reset_port | .. code-block:: | +| | | +| | {"switch_id": "xx:xx:xx:xx:xx:xx", | +| | "port_name": "xxx"} | +| | | ++------------------------------+---------------------------------------------+ +| disable_port | .. code-block:: | +| | | +| | {"switch_id": "xx:xx:xx:xx:xx:xx", | +| | "port_name": "xxx"} | +| | | ++------------------------------+---------------------------------------------+ +| create_portchannel | .. code-block:: | +| | | +| | {"switch_ids": ["xx:xx:xx:xx:xx:xx",...] | +| | "portchannel_name": "xxx", | +| | "description": "xxx", | +| | "mode": "[access|trunk]", | +| | "default_vlan": {1 to 4094}, | +| | "allowed_vlans": [x, y, z], | +| | "aggregation_mode": "[lacp|static]"} | +| | | ++------------------------------+---------------------------------------------+ +| delete_portchannel | .. code-block:: | +| | | +| | {"switch_ids": ["xx:xx:xx:xx:xx:xx",...], | +| | "port_name": "xxx"} | +| | | ++------------------------------+---------------------------------------------+ + + +Driver API impact +----------------- +No changes to the existing driver interfaces. It is expected that this +functionality can be implemented within the existing definition of the +Networking Driver API. This means that the existing Conductor workflow does +need to be modified. The new networking driver must implement the defined +abstract interfaces of the base network interface in a way that ensures that +switch port configurations are updated correctly as the baremetal node +transitions through the full management life cycle state machine. + +Proper operation of the driver depends on the port being populated with LLDP +information in the form of the ``link_local_connection`` property. This +ensures that the driver can associate the configuration to the correct port on +the switch. + ++------------------------------+---------------------------------------------+ +| Interface | Actions | ++==============================+=============================================+ +| port_changed | Update switch port configuration to match | +| | if necessary. | ++------------------------------+---------------------------------------------+ +| portgroup_changed | Update switch port configuration to match | +| | if necessary. | ++------------------------------+---------------------------------------------+ +| vif_attach | N/A. VIF attachments not expected or | +| | supported by this driver. | ++------------------------------+---------------------------------------------+ +| vif_detach | N/A. VIF attachments not expected or | +| | supported by this driver. | ++------------------------------+---------------------------------------------+ +| vif_list | N/A. VIF attachments not expected or | +| | supported by this driver. | ++------------------------------+---------------------------------------------+ +| get_current_vif | N/A. VIF attachments not expected or | +| | supported by this driver. | ++------------------------------+---------------------------------------------+ +| add_provisioning_network | Configure each ports according to the | +| | provisioning network configured in the | +| | config file; otherwise, configure according | +| | to its defined switchport ``extra`` | +| | property. | ++------------------------------+---------------------------------------------+ +| remove_provisioning_network | Reset port back to switch port defaults | ++------------------------------+---------------------------------------------+ +| configure_tenant_networks | Configure each ports according to the | +| | ``switchport`` configuration defined in its | +| | ``extra`` property. | ++------------------------------+---------------------------------------------+ +| unconfigure_tenant_networks | Reset port back to switch port defaults | ++------------------------------+---------------------------------------------+ +| add_cleaning_network | Configure each ports according to the | +| | cleaning network configured in the | +| | config file; otherwise, configure according | +| | to its defined switchport ``extra`` | +| | property. | ++------------------------------+---------------------------------------------+ +| remove_cleaning_network | Reset port back to switch port defaults | ++------------------------------+---------------------------------------------+ +| validate_rescue | N/A | ++------------------------------+---------------------------------------------+ +| add_rescuing_network | Configure each ports according to the | +| | rescuing network configured in the | +| | config file; otherwise, configure according | +| | to its defined switchport ``extra`` | +| | property. | ++------------------------------+---------------------------------------------+ +| remove_rescuing_network | Reset port back to switch port defaults | ++------------------------------+---------------------------------------------+ +| validate_inspection | N/A | ++------------------------------+---------------------------------------------+ +| add_inspection_network | Configure each ports according to the | +| | inspection network configured in the | +| | config file; otherwise, configure according | +| | to its defined switchport ``extra`` | +| | property. | ++------------------------------+---------------------------------------------+ +| remove_inspection_network | Reset port back to switch port defaults | ++------------------------------+---------------------------------------------+ +| need_power_on | False | ++------------------------------+---------------------------------------------+ +| get_node_network_data | Build network data from configured ports | ++------------------------------+---------------------------------------------+ +| add_servicing_network | Configure each ports according to the | +| | servicing network configured in the | +| | config file; otherwise, configure according | +| | to its defined switchport ``extra`` | +| | property. | ++------------------------------+---------------------------------------------+ +| remove_servicing_network | Reset port back to switch port defaults | ++------------------------------+---------------------------------------------+ + +Nova driver impact +------------------ +None + +Ramdisk impact +-------------- +None + +Security impact +--------------- +* RPC API Authentication/Authorization: The intent of this design is to create + a standalone service that exists as part of the Ironic subsystem but that + could be run and managed separately by a group dedicated to managing + networking equipment. In this capacity the process must have sufficient + security controls such that only authorized users have access to its + configuration file and RPC API. A discussion is needed to settle on the + approach to be used. + +* Access control over switch resources: Ideally, an ACL mechanism native to the + switch operating system would be used to restrict access to switch resources + for the user entity assigned to the standalone networking service, but since + an objective of this design is to not assume that all switch vendors support + fine grained control over access to resources it may be beneficial to add + the ability to control access to resources directly from configuration data + stored in the configuration file for the standalone service. + + Resources to be controlled could include: + + 1. Allowed/Denied VLAN IDs + 2. Allowed/Denied port list + 3. Allowed port channel management + +Other end user impact +--------------------- +None + +Scalability impact +------------------ +Needing to perform switch operations as part of the management of baremetal +nodes will no doubt increase the time required to complete node operations. +Exactly how much additional time will depend entirely on the responsiveness of +the management software running on the networking switch. Having switch ports +configured before booting the node is a definite requirement; therefore, it is +likely not feasible to decouple the Conductor process from the configuration +of the switch by making it an asynchronous operation. + +This will have to be evaluated as the implementation progresses. + +Performance Impact +------------------ +See "Scalability impact" above. + +Other deployer impact +--------------------- +To define the switch port configuration details to be used for provider +networks the following config file attributes must be used. For each of the +network classes if no value is provided in the config file then the port's +``extra.switchport`` or ``extra.portchannel`` property will be used, if the +port or portgroup does not contain any such attribute then it will be ignored +by the driver. The following attributes can be added to the main +``ironic.conf`` file. + +.. code-block:: + + [standalone_networking] + enabled = <[true|false]> + provisioning_network = <[access|trunk]/vlan-id={1 to 4094}> + cleaning_network = <[access|trunk]/vlan-id={1 to 4094}> + servicing_network = <[access|trunk]/vlan-id={1 to 4094}> + # Optionally, for inspection if a separate network is used + # inspection_network = <[access|trunk]/vlan-id={1 to 4094}> + +Example: + +.. code-block:: + + [standalone_networking] + enabled = true + provisioning_network = access/vlan-id=10 + cleaning_network = access/vlan-id=11 + servicing_network = trunk/vlan-id=12 + # Optionally, for inspection if a separate network is used + # inspection_network = access/vlan-id=13 + +To propagate switch port configuration details to switches, the new networking +service must be provided with details needed to access and configure the +switches. This includes user credentials, network address, and any other +attributes required to access the switch. + +The following example is a hybrid based on the configuration requirements of +two Neutron ML2 mechanism drivers ([1], [2]) combined with some additional +access control information to limit access to switch resources. It contains +provisions for eventually allowing different types of driver implementations to +interact with switches. These attributes can be added to the config file +supplied to the standalone networking service. + +.. code-block:: ini + + [DEFAULT] + enabled_devices = + allowed_vlans = + denied_vlans = + allowed_ports = + denied_ports = + portchannels_allowed = [true/false] + + [] + driver = + switch_id = + host = + username = + password = + key_filename = + hostkey_verify = <[true|false]> + allowed_vlans = + denied_vlans = + allowed_ports = + denied_ports = + portchannels_allowed = [true/false, Overrides global value if supplied] + + +Example: + +.. code-block:: ini + + [DEFAULT] + enabled_devices = netconf-based-device.example.net,cli-based-device.example.net + allowed_vlans = 3,4,5 + denied_vlans = 6,7,8 + + [netconf-based-device.example.net] + driver = netconf-openconfig + switch_id = + host = + username = user + key_filename = /etc/ironic/ssh_keys/device_a_sshkey + hostkey_verify = false + + [cli-based-device.example.net] + driver = netmiko_cisco_ios + switch_id = + host = + username = user + password = secret + +Developer impact +---------------- +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + alegacy + +Other contributors: + n/a + +Work Items +---------- +* Define RPC API for standalone service + +* Split networking-generic-switch into a reusable library without any Neutron + entanglements. + +* Implement standalone service + +* Implement network driver to interface with standalone service + +* Implement switch driver (using refactored parts of the + networking-generic-switch package, see above) to interface with networking + equipment + +* Revisit the use of the ``extra`` properties and replace with new API models + and endpoints. + +Dependencies +============ +* The network-generic-switch library contains code to implement interactions + with network switches. Unfortunately, that code is entangled with Neutron + API dependencies. It is desirable to reuse as much of that library as + possible to implement part of this service, but the Neutron specific aspects + of its API and implementation must be removed. A possible path forward is to + refactor that library so that the Neutron specific parts remain in the actual + networking-generic-switch library while the switch specific parts get moved + into a new library that can be referenced from the networking-generic-switch + project and this new standalone service separately. + +Testing +======= +* Unit testing +* Possibly a Devstack setup with a stubbed-out driver to simulate switch + configurations. + +Upgrades and Backwards Compatibility +==================================== +None + +Documentation Impact +==================== +TBD + +References +========== +* [0] https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/mercury.html +* [1] https://etherpad.opendev.org/p/ironic-standalone-networking +* [2] https://docs.openstack.org/networking-baremetal/latest/index.html +* [3] https://docs.openstack.org/networking-generic-switch/latest/index.html