Propose spec for OVN BGP integration

The spec proposes changes in Neutron to leverage OVN BGP capabilities. Related-Bug: #2111276 Change-Id: I8fef9b7e444b84448105bc60fc5551b0650aa214 Signed-off-by: Jakub Libosvar <jlibosva@redhat.com>
2025-06-18 22:05:54 +00:00
parent af9e9eba69
commit e03456dd97
2 changed files with 360 additions and 0 deletions
--- a/images/ovn-bgp-topology.jpg
+++ b/images/ovn-bgp-topology.jpg
--- a/specs/2025.2/ovn-bgp-integration.rst
+++ b/specs/2025.2/ovn-bgp-integration.rst
@@ -0,0 +1,360 @@
+====================================
+Core OVN BGP integration
+====================================
+
+https://bugs.launchpad.net/neutron/+bug/2111276
+
+OVN 25.03 introduces BGP-related capabilities that provide parity with the
+current ovn-bgp-agent underlay exposing method.
+This spec is to introduce a design using the OVN capabilities integrated with
+Neutron to replace the ovn-bgp-agent.
+
+
+Problem Description
+===================
+
+The ovn-bgp-agent is another process running on compute and network nodes.
+There is a lot of processing happening when new workloads are created or moved
+around because the agent needs to pay attention to these changes and rewire
+configuration on the node as needed. As OVN is well aware of the locality of its
+resources we can leave all the processing up to OVN and only manage underlying
+BGP OVN topology in Neutron and still use the pure L3 spine-and-leaf topology
+for the dataplane traffic.
+
+
+Acronyms used in this spec
+==========================
+
+- BGP: Border Gateway Protocol
+- ECMP: Equal-Cost Multi-Path
+- LRP: Logical Router Port
+- LSP: Logical Switch Port
+- LS: Logical Switch
+- LR: Logical Router
+- VRF: Virtual Routing and Forwarding
+- FRR: Free Range Routing (https://github.com/FRRouting/frr)
+
+
+Proposed Change
+===============
+
+This spec proposes to introduce a new Neutron service plugin that manages the
+underlying BGP topology in OVN. Its main purpose is to make sure the OVN
+resources related to BGP are correctly configured at all times by being a
+reconciler over those resources. Additionally it takes care of scaling in and
+out the compute nodes because every compute node needs to have their own bound
+resources, such as a router and a logical switch with a localnet port.
+
+There is no need to make any changes to API or database models. However, there
+is a need to modify Neutron OVN DB sync scripts to not monitor the underlying
+BGP resources. Possibly this was already planned to exist in Neutron with the
+spec at [1]_ so we need to revive the work. That can be achieved by setting an
+explicit tag in the external_ids column of the BGP managed resources that
+Neutron would not touch.  Also, we need to make sure on the presentation layer
+that all the underlying BGP resources are not exposed to the users through an
+API. For example, a router list command must not return the BGP routers.
+
+Each compute node requires a running FRR instance that monitors the local VRF
+and advertises the routes to the BGP peers. It is the installer's responsibility
+to configure the FRR instance to use the correct BGP parameters and to connect
+to the correct BGP peers.
+
+As it is easier to understand the topology visually than through description,
+the following diagram shows the underlying BGP logical topology in OVN. For
+better resolution, it is recommended to open the image in a new tab.
+
+.. figure:: ../../images/ovn-bgp-topology.jpg
+   :target: ../../_images/ovn-bgp-topology.jpg
+
+OVN BGP Logical Topology (click for full resolution)
+
+
+BGP distributed logical router
+------------------------------
+
+A new router with the OVN BGP enabled capabilities is introduced, the router is
+named "BGP distributed router" in the diagram above, with dynamic routing flag
+enabled. The router is connected to the provider logical switch with a dummy
+connection. This connection is not used for any traffic and serves only to
+logically connect the logical switch and the BGP router so the northd can create
+entries in the Advertised_Route table in the Southbound DB for the IPs that need
+to be advertised.
+
+The router also connects to a LS with a localnet port. This LS is connected to
+the provider bridge br-bgp that needs to be configured on every chassis since
+the traffic here is distributed and can happen on any node. This bridge connects
+to the ls-public LS through the localnet port created by Neutron. This LS is
+what typically connects the physical network in the traditional deployments. We
+need to have localnet ports to avoid OVN sending traffic over the geneve tunnel
+to the node hosting the logical router gateway.
+
+The BGP distributed router is connected to the per-chassis logical routers
+through peered LRPs and bound to the corresponding chassis. The LRs per chassis
+are described in the next section. Because the BGP router is distributed we need
+to pick the right LRP so the traffic is not forwarded to a different chassis.
+For example, if there is egress traffic coming from a tenant LSP on chassis A,
+the BGP distributed router needs to route the traffic to the LRP on chassis A.
+For this we will use logical routing policy and the is_chassis_resident match.
+An example of the logical routing policy is shown below:
+
+.. code-block:: text
+
+  action              : reroute
+  bfd_sessions        : []
+  chain               : []
+  external_ids        : {}
+  jump_chain          : []
+  match               : "inport==\"lrp-bgp-main-router-to-ls-interconnect\" && is_chassis_resident(\"cr-lrp-bgp-main-router-to-bgp-router-r0-compute-0\")"
+  nexthop             : []
+  nexthops            : ["169.254.0.1"]
+  options             : {}
+  priority            : 10
+
+The nexthop in this case is the LRP on the chassis A and for now must be an IPv4
+as OVN currently contains a bug that prevents the use of IPv6 LLAs as nexthops,
+reported at [2]_.  The policy is implemented only on the chassis defined in
+is_chassis_resident and hence the traffic will always remain local to the
+chassis. Because the policy is at a later stage in the LR pipeline we need to
+create a logical router static route in order to pass the routing phase. Hence
+the BGP distributed logical router needs to contain two static routes. One to
+route ingress traffic to the provider network and one unused route that serves
+only to pass the routing stage in the pipeline until the reroute policy is hit.
+
+The first static route can look like this:
+
+.. code-block:: text
+
+  bfd                 : []
+  external_ids        : {}
+  ip_prefix           : "192.168.111.0/24"
+  nexthop             : "192.168.111.30"
+  options             : {}
+  output_port         : lrp-bgp-main-router-to-ls-interconnect
+  policy              : []
+  route_table         : ""
+  selection_fields    : []
+
+where the ip_prefix is the provider network prefix and the output_port is the
+LRP connecting to the provider LS. The nexthop is the LRP of the Neutron router
+port that serves as a gateway.
+
+The second static route is unused and can look like this:
+
+.. _fake-static-route:
+
+.. code-block:: text
+
+  bfd                 : []
+  external_ids        : {}
+  ip_prefix           : "0.0.0.0/0"
+  nexthop             : "192.168.111.30"
+  options             : {}
+  output_port         : []
+  policy              : []
+  route_table         : ""
+  selection_fields    : []
+
+The route needs to match all traffic and the nexthop doesn't matter because it
+will be determined by the reroute policies based on the chassis locality. The
+ingress logical router pipeline with the route implemented looks like this:
+
+.. code-block:: text
+
+  ... the other routes are here but none matches 0.0.0.0/0 ...
+  table=15(lr_in_ip_routing   ), priority=4    , match=(reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 192.168.111.30; reg5 = 192.168.111.30; eth.src = 00:de:ad:10:00:00; outport = "lrp-bgp-main-router-to-ls-interconnect"; flags.loopback = 1; reg9[9] = 1; next;)
+  table=15(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
+  table=16(lr_in_ip_routing_ecmp), priority=150  , match=(reg8[0..15] == 0), action=(next;)
+  table=16(lr_in_ip_routing_ecmp), priority=0    , match=(1), action=(drop;)
+  table=17(lr_in_policy       ), priority=10   , match=(inport=="lrp-bgp-main-router-to-ls-interconnect" && is_chassis_resident("cr-lrp-bgp-main-router-to-bgp-router-r0-compute-0")), action=(reg0 = 169.254.0.1; reg5 = 169.254.0.2; eth.src = 00:de:ad:00:10:00; outport = "lrp-bgp-main-router-to-bgp-router-r0-compute-0"; flags.loopback = 1; reg8[0..15] = 0; reg9[9] = 1; next;)
+
+As we need to get to the stage where the reroute policy is hit, we need to pass
+the lr_in_ip_routing stage first and this stage is implemented with a static
+route. That means we match the 0.0.0.0/0 prefix using the first rule and then
+later we change the output_port with the last rule with its reroute action. If
+the static route would not be present, the traffic would be dropped with the
+second rule containing the drop action.
+
+
+Per-chassis logical routers
+---------------------------
+
+There is also a logical router created and bound to each chassis. These routers
+serve to learn ECMP routes from the BGP peers and to forward traffic between the
+provider bridges and the BGP distributed router.
+
+For cases where the compute nodes share data plane and control plane traffic
+over the same spine-and-leaf topology, there is a need to maintain openflow
+rules on the provider bridge that differentiate traffic between control plane,
+and hence forward traffic to the host, and the dataplane traffic that needs to
+go to the OVN overlay. The following openflow rules could be used to achieve
+this:
+
+.. _openflow-rules:
+
+.. code-block:: text
+
+  priority=10,ip,in_port=eth0,nw_dst=<host IPs> actions=NORMAL
+  priority=10,ipv6,in_port=eth0,ipv6_dst=<host IPv6s> actions=NORMAL
+  priority=10,arp actions=NORMAL
+  priority=10,icmp6,icmp_type=133 actions=NORMAL
+  priority=10,icmp6,icmp_type=134 actions=NORMAL
+  priority=10,icmp6,icmp_type=135 actions=NORMAL
+  priority=10,icmp6,icmp_type=136 actions=NORMAL
+  priority=10,ipv6,in_port=eth0,ipv6_dst=fe80::/64 actions=NORMAL
+  priority=8,in_port=eth0 actions=mod_dl_dst:<LRP MAC>,output:<patch_port_to_ovn>
+
+Those rules match traffic that is destined to the host and forward it to the
+host. Everything else is forwarded to the OVN overlay. The patch_port_to_ovn is
+a patch port that ovn-controller created based on the ovn-bridge-mappings
+configuration.
+
+The router itself needs to implement routes for traffic coming from the provider
+network and for traffic coming from the OVN overlay. For ingress provider
+network traffic, the routes can look as follows:
+
+.. code-block:: text
+
+  bfd                 : []
+  external_ids        : {}
+  ip_prefix           : "192.168.111.0/24"
+  nexthop             : "169.254.0.2"
+  options             : {}
+  output_port         : lrp-bgp-router-r0-compute-0-to-bgp-main-router
+  policy              : []
+  route_table         : ""
+  selection_fields    : []
+
+where ip_prefix matches the subnet of the provider network and the nexthop is
+set to the address of the LRP attached to the BGP distributed router and the
+output_port is set to its peer LRP.
+
+The egress traffic from the OVN overlay needs to be routed with ECMP to the BGP
+network. This can be achieved with the following static routes for each BGP
+peer:
+
+.. code-block:: text
+
+  bfd                 : []
+  external_ids        : {}
+  ip_prefix           : "0.0.0.0/0"
+  nexthop             : "100.64.0.1"
+  options             : {}
+  output_port         : lrp-bgp-router-r0-compute-0-to-ls-r0-compute-0-eth0
+  policy              : []
+  route_table         : ""
+  selection_fields    : []
+
+  bfd                 : []
+  external_ids        : {}
+  ip_prefix           : "0.0.0.0/0"
+  nexthop             : "100.65.0.1"
+  options             : {}
+  output_port         : lrp-bgp-router-r0-compute-0-to-ls-r0-compute-0-eth1
+  policy              : []
+  route_table         : ""
+  selection_fields    : []
+
+
+Traffic flow
+============
+
+This section describes the traffic flow from and to a LSP hosted on a chassis.
+
+An example of the traffic from the external network to a VM with a Floating IP on chassis 1
+-------------------------------------------------------------------------------------------
+
+Because of the dummy connection between the ls-public LS and the BGP distributed
+router, OVN creates an Advertised_Route entry for the Floating IP.  Because the
+associated logical port is bound to the chassis 1, OVN populates the local VRF
+on the chassis 1 with the route to the Floating IP and the local FRR instance
+advertises the route to the BGP peers.
+
+The fabric learns the route to the Floating IP from the BGP peers and forwards
+the traffic to the chassis 1 to either eth0 or eth1 because of the ECMP routes.
+
+The traffic does not match any of the higher priority :ref:`openflow rules
+<openflow-rules>` on the provider bridge and matches the last rule. The rule
+changes the destination MAC to the LRP MAC address of the per-chassis router
+associated with the NIC and the traffic is forwarded to OVN. The traffic enters
+the per chassis logical router that has Logical_Static_Route configured to
+forward the traffic to the distributed BGP router. The BGP distributed router is
+configured to forward the traffic to the ls-inter-public switch with a
+Logical_Static_Route matching the destination IP with the provider network
+subnet and through the br-bgp provider bridge the traffic gets to the ls-public
+logical switch. From here the traffic follows the same path as without BGP and
+is NAT'ed by the Neutron router.
+
+
+An example of the traffic from a VM with a Floating IP on chassis 1 to the external network
+-------------------------------------------------------------------------------------------
+
+The egress VM traffic is NAT'ed by the Neutron router and the traffic is
+forwarded to the provider network gateway which is connected to the ls-public
+LS. Because of the presence of the localnet ports the traffic gets through the
+br-bgp bridge to the distributed BGP router where it matches the artificial
+:ref:`Logical_Router_Static_Route <fake-static-route>` to skip the
+lr_in_ip_routing stage in the pipeline and will be matched with the BGP router
+policy based on the chassis locality. The reroute action of the policy will pick
+the right LRP that is connected to the per-chassis router. Here the traffic
+matches static routes per peer and with the ECMP is
+forwarded to the BGP networks.
+
+
+Testing
+=======
+
+Existing tempest tests should provide good regression testing. We can reuse the
+existing topology from the ovn-bgp-agent project that peer with VMs simulating a
+BGP router.
+
+Implementation
+==============
+
+The implementation is split into two parts. The first part creates the service
+plugin that takes care of the BGP topology in OVN including the configuration of
+static routes and router policies.
+
+The second part is an OVN agent extension [3]_ that configures per chassis host
+configurations. The OVN agent itself is orthogonal to the BGP service plugin and
+can be replaced with any third-party tool that takes care of the node dynamic
+configuration. The OVN agent extension is responsible for steering the traffic
+to the OVN and for configuring the per chassis host configurations such as
+adding ovn-bridge-mappings per the BGP peer, and for implementing local openflow
+rules to differentiate traffic between the control plane and the dataplane. It
+also monitors patch ports on the br-bgp and creates direct connection between
+the localnet ports to avoid any FDB learning on the bridge.
+An example of the simple openflow rules is shown below:
+
+.. code-block:: text
+
+  priority=10,in_port=2, actions=output:3
+  priority=10,in_port=3, actions=output:2
+
+where 2 is the patch port to the logical switch connected to the BGP distributed
+router and 3 is the patch port connected to the Neutron public switch.
+
+Where BGP is used with Neutron or not is determined by enabling the service
+plugin and the OVN agent extension.
+
+As it is written in the first paragraph of this spec, the BGP support in OVN
+was introduced in 25.03. Therefore the BGP service plugin requires OVN 25.03 or
+later.
+
+Assignee(s)
+-----------
+
+* Jakub Libosvar <jlibosva@redhat.com>
+
+
+Documentation
+=============
+
+Deployment guide will be written in describing how to enable the service plugin
+and what needs to be configured on the nodes, such as steering traffic to the
+OVN or configuration of a BGP speaker advertising the routes to its peer. An
+example using FRR configuration will be introduced so the operators have a
+reference for the configuration.
+
+.. [1] https://review.opendev.org/c/openstack/neutron-specs/+/891204
+.. [2] https://issues.redhat.com/browse/FDP-1554
+.. [3] https://opendev.org/openstack/neutron/src/commit/1e6381cbd25f8ab4fc9a3bcaa1ab7af1d605946e/doc/source/ovn/ovn_agent.rst