Propose spec for OVN BGP integration

The spec proposes changes in Neutron to leverage OVN BGP capabilities.

Related-Bug: #2111276

Change-Id: I8fef9b7e444b84448105bc60fc5551b0650aa214
Signed-off-by: Jakub Libosvar <jlibosva@redhat.com>
This commit is contained in:
Jakub Libosvar
2025-06-18 22:05:54 +00:00
committed by Jakub Libosvar
parent af9e9eba69
commit e03456dd97
2 changed files with 360 additions and 0 deletions

BIN
images/ovn-bgp-topology.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 MiB

View File

@@ -0,0 +1,360 @@
====================================
Core OVN BGP integration
====================================
https://bugs.launchpad.net/neutron/+bug/2111276
OVN 25.03 introduces BGP-related capabilities that provide parity with the
current ovn-bgp-agent underlay exposing method.
This spec is to introduce a design using the OVN capabilities integrated with
Neutron to replace the ovn-bgp-agent.
Problem Description
===================
The ovn-bgp-agent is another process running on compute and network nodes.
There is a lot of processing happening when new workloads are created or moved
around because the agent needs to pay attention to these changes and rewire
configuration on the node as needed. As OVN is well aware of the locality of its
resources we can leave all the processing up to OVN and only manage underlying
BGP OVN topology in Neutron and still use the pure L3 spine-and-leaf topology
for the dataplane traffic.
Acronyms used in this spec
==========================
- BGP: Border Gateway Protocol
- ECMP: Equal-Cost Multi-Path
- LRP: Logical Router Port
- LSP: Logical Switch Port
- LS: Logical Switch
- LR: Logical Router
- VRF: Virtual Routing and Forwarding
- FRR: Free Range Routing (https://github.com/FRRouting/frr)
Proposed Change
===============
This spec proposes to introduce a new Neutron service plugin that manages the
underlying BGP topology in OVN. Its main purpose is to make sure the OVN
resources related to BGP are correctly configured at all times by being a
reconciler over those resources. Additionally it takes care of scaling in and
out the compute nodes because every compute node needs to have their own bound
resources, such as a router and a logical switch with a localnet port.
There is no need to make any changes to API or database models. However, there
is a need to modify Neutron OVN DB sync scripts to not monitor the underlying
BGP resources. Possibly this was already planned to exist in Neutron with the
spec at [1]_ so we need to revive the work. That can be achieved by setting an
explicit tag in the external_ids column of the BGP managed resources that
Neutron would not touch. Also, we need to make sure on the presentation layer
that all the underlying BGP resources are not exposed to the users through an
API. For example, a router list command must not return the BGP routers.
Each compute node requires a running FRR instance that monitors the local VRF
and advertises the routes to the BGP peers. It is the installer's responsibility
to configure the FRR instance to use the correct BGP parameters and to connect
to the correct BGP peers.
As it is easier to understand the topology visually than through description,
the following diagram shows the underlying BGP logical topology in OVN. For
better resolution, it is recommended to open the image in a new tab.
.. figure:: ../../images/ovn-bgp-topology.jpg
:target: ../../_images/ovn-bgp-topology.jpg
OVN BGP Logical Topology (click for full resolution)
BGP distributed logical router
------------------------------
A new router with the OVN BGP enabled capabilities is introduced, the router is
named "BGP distributed router" in the diagram above, with dynamic routing flag
enabled. The router is connected to the provider logical switch with a dummy
connection. This connection is not used for any traffic and serves only to
logically connect the logical switch and the BGP router so the northd can create
entries in the Advertised_Route table in the Southbound DB for the IPs that need
to be advertised.
The router also connects to a LS with a localnet port. This LS is connected to
the provider bridge br-bgp that needs to be configured on every chassis since
the traffic here is distributed and can happen on any node. This bridge connects
to the ls-public LS through the localnet port created by Neutron. This LS is
what typically connects the physical network in the traditional deployments. We
need to have localnet ports to avoid OVN sending traffic over the geneve tunnel
to the node hosting the logical router gateway.
The BGP distributed router is connected to the per-chassis logical routers
through peered LRPs and bound to the corresponding chassis. The LRs per chassis
are described in the next section. Because the BGP router is distributed we need
to pick the right LRP so the traffic is not forwarded to a different chassis.
For example, if there is egress traffic coming from a tenant LSP on chassis A,
the BGP distributed router needs to route the traffic to the LRP on chassis A.
For this we will use logical routing policy and the is_chassis_resident match.
An example of the logical routing policy is shown below:
.. code-block:: text
action : reroute
bfd_sessions : []
chain : []
external_ids : {}
jump_chain : []
match : "inport==\"lrp-bgp-main-router-to-ls-interconnect\" && is_chassis_resident(\"cr-lrp-bgp-main-router-to-bgp-router-r0-compute-0\")"
nexthop : []
nexthops : ["169.254.0.1"]
options : {}
priority : 10
The nexthop in this case is the LRP on the chassis A and for now must be an IPv4
as OVN currently contains a bug that prevents the use of IPv6 LLAs as nexthops,
reported at [2]_. The policy is implemented only on the chassis defined in
is_chassis_resident and hence the traffic will always remain local to the
chassis. Because the policy is at a later stage in the LR pipeline we need to
create a logical router static route in order to pass the routing phase. Hence
the BGP distributed logical router needs to contain two static routes. One to
route ingress traffic to the provider network and one unused route that serves
only to pass the routing stage in the pipeline until the reroute policy is hit.
The first static route can look like this:
.. code-block:: text
bfd : []
external_ids : {}
ip_prefix : "192.168.111.0/24"
nexthop : "192.168.111.30"
options : {}
output_port : lrp-bgp-main-router-to-ls-interconnect
policy : []
route_table : ""
selection_fields : []
where the ip_prefix is the provider network prefix and the output_port is the
LRP connecting to the provider LS. The nexthop is the LRP of the Neutron router
port that serves as a gateway.
The second static route is unused and can look like this:
.. _fake-static-route:
.. code-block:: text
bfd : []
external_ids : {}
ip_prefix : "0.0.0.0/0"
nexthop : "192.168.111.30"
options : {}
output_port : []
policy : []
route_table : ""
selection_fields : []
The route needs to match all traffic and the nexthop doesn't matter because it
will be determined by the reroute policies based on the chassis locality. The
ingress logical router pipeline with the route implemented looks like this:
.. code-block:: text
... the other routes are here but none matches 0.0.0.0/0 ...
table=15(lr_in_ip_routing ), priority=4 , match=(reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 192.168.111.30; reg5 = 192.168.111.30; eth.src = 00:de:ad:10:00:00; outport = "lrp-bgp-main-router-to-ls-interconnect"; flags.loopback = 1; reg9[9] = 1; next;)
table=15(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;)
table=16(lr_in_ip_routing_ecmp), priority=150 , match=(reg8[0..15] == 0), action=(next;)
table=16(lr_in_ip_routing_ecmp), priority=0 , match=(1), action=(drop;)
table=17(lr_in_policy ), priority=10 , match=(inport=="lrp-bgp-main-router-to-ls-interconnect" && is_chassis_resident("cr-lrp-bgp-main-router-to-bgp-router-r0-compute-0")), action=(reg0 = 169.254.0.1; reg5 = 169.254.0.2; eth.src = 00:de:ad:00:10:00; outport = "lrp-bgp-main-router-to-bgp-router-r0-compute-0"; flags.loopback = 1; reg8[0..15] = 0; reg9[9] = 1; next;)
As we need to get to the stage where the reroute policy is hit, we need to pass
the lr_in_ip_routing stage first and this stage is implemented with a static
route. That means we match the 0.0.0.0/0 prefix using the first rule and then
later we change the output_port with the last rule with its reroute action. If
the static route would not be present, the traffic would be dropped with the
second rule containing the drop action.
Per-chassis logical routers
---------------------------
There is also a logical router created and bound to each chassis. These routers
serve to learn ECMP routes from the BGP peers and to forward traffic between the
provider bridges and the BGP distributed router.
For cases where the compute nodes share data plane and control plane traffic
over the same spine-and-leaf topology, there is a need to maintain openflow
rules on the provider bridge that differentiate traffic between control plane,
and hence forward traffic to the host, and the dataplane traffic that needs to
go to the OVN overlay. The following openflow rules could be used to achieve
this:
.. _openflow-rules:
.. code-block:: text
priority=10,ip,in_port=eth0,nw_dst=<host IPs> actions=NORMAL
priority=10,ipv6,in_port=eth0,ipv6_dst=<host IPv6s> actions=NORMAL
priority=10,arp actions=NORMAL
priority=10,icmp6,icmp_type=133 actions=NORMAL
priority=10,icmp6,icmp_type=134 actions=NORMAL
priority=10,icmp6,icmp_type=135 actions=NORMAL
priority=10,icmp6,icmp_type=136 actions=NORMAL
priority=10,ipv6,in_port=eth0,ipv6_dst=fe80::/64 actions=NORMAL
priority=8,in_port=eth0 actions=mod_dl_dst:<LRP MAC>,output:<patch_port_to_ovn>
Those rules match traffic that is destined to the host and forward it to the
host. Everything else is forwarded to the OVN overlay. The patch_port_to_ovn is
a patch port that ovn-controller created based on the ovn-bridge-mappings
configuration.
The router itself needs to implement routes for traffic coming from the provider
network and for traffic coming from the OVN overlay. For ingress provider
network traffic, the routes can look as follows:
.. code-block:: text
bfd : []
external_ids : {}
ip_prefix : "192.168.111.0/24"
nexthop : "169.254.0.2"
options : {}
output_port : lrp-bgp-router-r0-compute-0-to-bgp-main-router
policy : []
route_table : ""
selection_fields : []
where ip_prefix matches the subnet of the provider network and the nexthop is
set to the address of the LRP attached to the BGP distributed router and the
output_port is set to its peer LRP.
The egress traffic from the OVN overlay needs to be routed with ECMP to the BGP
network. This can be achieved with the following static routes for each BGP
peer:
.. code-block:: text
bfd : []
external_ids : {}
ip_prefix : "0.0.0.0/0"
nexthop : "100.64.0.1"
options : {}
output_port : lrp-bgp-router-r0-compute-0-to-ls-r0-compute-0-eth0
policy : []
route_table : ""
selection_fields : []
bfd : []
external_ids : {}
ip_prefix : "0.0.0.0/0"
nexthop : "100.65.0.1"
options : {}
output_port : lrp-bgp-router-r0-compute-0-to-ls-r0-compute-0-eth1
policy : []
route_table : ""
selection_fields : []
Traffic flow
============
This section describes the traffic flow from and to a LSP hosted on a chassis.
An example of the traffic from the external network to a VM with a Floating IP on chassis 1
-------------------------------------------------------------------------------------------
Because of the dummy connection between the ls-public LS and the BGP distributed
router, OVN creates an Advertised_Route entry for the Floating IP. Because the
associated logical port is bound to the chassis 1, OVN populates the local VRF
on the chassis 1 with the route to the Floating IP and the local FRR instance
advertises the route to the BGP peers.
The fabric learns the route to the Floating IP from the BGP peers and forwards
the traffic to the chassis 1 to either eth0 or eth1 because of the ECMP routes.
The traffic does not match any of the higher priority :ref:`openflow rules
<openflow-rules>` on the provider bridge and matches the last rule. The rule
changes the destination MAC to the LRP MAC address of the per-chassis router
associated with the NIC and the traffic is forwarded to OVN. The traffic enters
the per chassis logical router that has Logical_Static_Route configured to
forward the traffic to the distributed BGP router. The BGP distributed router is
configured to forward the traffic to the ls-inter-public switch with a
Logical_Static_Route matching the destination IP with the provider network
subnet and through the br-bgp provider bridge the traffic gets to the ls-public
logical switch. From here the traffic follows the same path as without BGP and
is NAT'ed by the Neutron router.
An example of the traffic from a VM with a Floating IP on chassis 1 to the external network
-------------------------------------------------------------------------------------------
The egress VM traffic is NAT'ed by the Neutron router and the traffic is
forwarded to the provider network gateway which is connected to the ls-public
LS. Because of the presence of the localnet ports the traffic gets through the
br-bgp bridge to the distributed BGP router where it matches the artificial
:ref:`Logical_Router_Static_Route <fake-static-route>` to skip the
lr_in_ip_routing stage in the pipeline and will be matched with the BGP router
policy based on the chassis locality. The reroute action of the policy will pick
the right LRP that is connected to the per-chassis router. Here the traffic
matches static routes per peer and with the ECMP is
forwarded to the BGP networks.
Testing
=======
Existing tempest tests should provide good regression testing. We can reuse the
existing topology from the ovn-bgp-agent project that peer with VMs simulating a
BGP router.
Implementation
==============
The implementation is split into two parts. The first part creates the service
plugin that takes care of the BGP topology in OVN including the configuration of
static routes and router policies.
The second part is an OVN agent extension [3]_ that configures per chassis host
configurations. The OVN agent itself is orthogonal to the BGP service plugin and
can be replaced with any third-party tool that takes care of the node dynamic
configuration. The OVN agent extension is responsible for steering the traffic
to the OVN and for configuring the per chassis host configurations such as
adding ovn-bridge-mappings per the BGP peer, and for implementing local openflow
rules to differentiate traffic between the control plane and the dataplane. It
also monitors patch ports on the br-bgp and creates direct connection between
the localnet ports to avoid any FDB learning on the bridge.
An example of the simple openflow rules is shown below:
.. code-block:: text
priority=10,in_port=2, actions=output:3
priority=10,in_port=3, actions=output:2
where 2 is the patch port to the logical switch connected to the BGP distributed
router and 3 is the patch port connected to the Neutron public switch.
Where BGP is used with Neutron or not is determined by enabling the service
plugin and the OVN agent extension.
As it is written in the first paragraph of this spec, the BGP support in OVN
was introduced in 25.03. Therefore the BGP service plugin requires OVN 25.03 or
later.
Assignee(s)
-----------
* Jakub Libosvar <jlibosva@redhat.com>
Documentation
=============
Deployment guide will be written in describing how to enable the service plugin
and what needs to be configured on the nodes, such as steering traffic to the
OVN or configuration of a BGP speaker advertising the routes to its peer. An
example using FRR configuration will be introduced so the operators have a
reference for the configuration.
.. [1] https://review.opendev.org/c/openstack/neutron-specs/+/891204
.. [2] https://issues.redhat.com/browse/FDP-1554
.. [3] https://opendev.org/openstack/neutron/src/commit/1e6381cbd25f8ab4fc9a3bcaa1ab7af1d605946e/doc/source/ovn/ovn_agent.rst