Propose spec for OVN BGP integration
The spec proposes changes in Neutron to leverage OVN BGP capabilities. Related-Bug: #2111276 Change-Id: I8fef9b7e444b84448105bc60fc5551b0650aa214 Signed-off-by: Jakub Libosvar <jlibosva@redhat.com>
This commit is contained in:

committed by
Jakub Libosvar

parent
af9e9eba69
commit
e03456dd97
BIN
images/ovn-bgp-topology.jpg
Normal file
BIN
images/ovn-bgp-topology.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.5 MiB |
360
specs/2025.2/ovn-bgp-integration.rst
Normal file
360
specs/2025.2/ovn-bgp-integration.rst
Normal file
@@ -0,0 +1,360 @@
|
|||||||
|
====================================
|
||||||
|
Core OVN BGP integration
|
||||||
|
====================================
|
||||||
|
|
||||||
|
https://bugs.launchpad.net/neutron/+bug/2111276
|
||||||
|
|
||||||
|
OVN 25.03 introduces BGP-related capabilities that provide parity with the
|
||||||
|
current ovn-bgp-agent underlay exposing method.
|
||||||
|
This spec is to introduce a design using the OVN capabilities integrated with
|
||||||
|
Neutron to replace the ovn-bgp-agent.
|
||||||
|
|
||||||
|
|
||||||
|
Problem Description
|
||||||
|
===================
|
||||||
|
|
||||||
|
The ovn-bgp-agent is another process running on compute and network nodes.
|
||||||
|
There is a lot of processing happening when new workloads are created or moved
|
||||||
|
around because the agent needs to pay attention to these changes and rewire
|
||||||
|
configuration on the node as needed. As OVN is well aware of the locality of its
|
||||||
|
resources we can leave all the processing up to OVN and only manage underlying
|
||||||
|
BGP OVN topology in Neutron and still use the pure L3 spine-and-leaf topology
|
||||||
|
for the dataplane traffic.
|
||||||
|
|
||||||
|
|
||||||
|
Acronyms used in this spec
|
||||||
|
==========================
|
||||||
|
|
||||||
|
- BGP: Border Gateway Protocol
|
||||||
|
- ECMP: Equal-Cost Multi-Path
|
||||||
|
- LRP: Logical Router Port
|
||||||
|
- LSP: Logical Switch Port
|
||||||
|
- LS: Logical Switch
|
||||||
|
- LR: Logical Router
|
||||||
|
- VRF: Virtual Routing and Forwarding
|
||||||
|
- FRR: Free Range Routing (https://github.com/FRRouting/frr)
|
||||||
|
|
||||||
|
|
||||||
|
Proposed Change
|
||||||
|
===============
|
||||||
|
|
||||||
|
This spec proposes to introduce a new Neutron service plugin that manages the
|
||||||
|
underlying BGP topology in OVN. Its main purpose is to make sure the OVN
|
||||||
|
resources related to BGP are correctly configured at all times by being a
|
||||||
|
reconciler over those resources. Additionally it takes care of scaling in and
|
||||||
|
out the compute nodes because every compute node needs to have their own bound
|
||||||
|
resources, such as a router and a logical switch with a localnet port.
|
||||||
|
|
||||||
|
There is no need to make any changes to API or database models. However, there
|
||||||
|
is a need to modify Neutron OVN DB sync scripts to not monitor the underlying
|
||||||
|
BGP resources. Possibly this was already planned to exist in Neutron with the
|
||||||
|
spec at [1]_ so we need to revive the work. That can be achieved by setting an
|
||||||
|
explicit tag in the external_ids column of the BGP managed resources that
|
||||||
|
Neutron would not touch. Also, we need to make sure on the presentation layer
|
||||||
|
that all the underlying BGP resources are not exposed to the users through an
|
||||||
|
API. For example, a router list command must not return the BGP routers.
|
||||||
|
|
||||||
|
Each compute node requires a running FRR instance that monitors the local VRF
|
||||||
|
and advertises the routes to the BGP peers. It is the installer's responsibility
|
||||||
|
to configure the FRR instance to use the correct BGP parameters and to connect
|
||||||
|
to the correct BGP peers.
|
||||||
|
|
||||||
|
As it is easier to understand the topology visually than through description,
|
||||||
|
the following diagram shows the underlying BGP logical topology in OVN. For
|
||||||
|
better resolution, it is recommended to open the image in a new tab.
|
||||||
|
|
||||||
|
.. figure:: ../../images/ovn-bgp-topology.jpg
|
||||||
|
:target: ../../_images/ovn-bgp-topology.jpg
|
||||||
|
|
||||||
|
OVN BGP Logical Topology (click for full resolution)
|
||||||
|
|
||||||
|
|
||||||
|
BGP distributed logical router
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
A new router with the OVN BGP enabled capabilities is introduced, the router is
|
||||||
|
named "BGP distributed router" in the diagram above, with dynamic routing flag
|
||||||
|
enabled. The router is connected to the provider logical switch with a dummy
|
||||||
|
connection. This connection is not used for any traffic and serves only to
|
||||||
|
logically connect the logical switch and the BGP router so the northd can create
|
||||||
|
entries in the Advertised_Route table in the Southbound DB for the IPs that need
|
||||||
|
to be advertised.
|
||||||
|
|
||||||
|
The router also connects to a LS with a localnet port. This LS is connected to
|
||||||
|
the provider bridge br-bgp that needs to be configured on every chassis since
|
||||||
|
the traffic here is distributed and can happen on any node. This bridge connects
|
||||||
|
to the ls-public LS through the localnet port created by Neutron. This LS is
|
||||||
|
what typically connects the physical network in the traditional deployments. We
|
||||||
|
need to have localnet ports to avoid OVN sending traffic over the geneve tunnel
|
||||||
|
to the node hosting the logical router gateway.
|
||||||
|
|
||||||
|
The BGP distributed router is connected to the per-chassis logical routers
|
||||||
|
through peered LRPs and bound to the corresponding chassis. The LRs per chassis
|
||||||
|
are described in the next section. Because the BGP router is distributed we need
|
||||||
|
to pick the right LRP so the traffic is not forwarded to a different chassis.
|
||||||
|
For example, if there is egress traffic coming from a tenant LSP on chassis A,
|
||||||
|
the BGP distributed router needs to route the traffic to the LRP on chassis A.
|
||||||
|
For this we will use logical routing policy and the is_chassis_resident match.
|
||||||
|
An example of the logical routing policy is shown below:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
action : reroute
|
||||||
|
bfd_sessions : []
|
||||||
|
chain : []
|
||||||
|
external_ids : {}
|
||||||
|
jump_chain : []
|
||||||
|
match : "inport==\"lrp-bgp-main-router-to-ls-interconnect\" && is_chassis_resident(\"cr-lrp-bgp-main-router-to-bgp-router-r0-compute-0\")"
|
||||||
|
nexthop : []
|
||||||
|
nexthops : ["169.254.0.1"]
|
||||||
|
options : {}
|
||||||
|
priority : 10
|
||||||
|
|
||||||
|
The nexthop in this case is the LRP on the chassis A and for now must be an IPv4
|
||||||
|
as OVN currently contains a bug that prevents the use of IPv6 LLAs as nexthops,
|
||||||
|
reported at [2]_. The policy is implemented only on the chassis defined in
|
||||||
|
is_chassis_resident and hence the traffic will always remain local to the
|
||||||
|
chassis. Because the policy is at a later stage in the LR pipeline we need to
|
||||||
|
create a logical router static route in order to pass the routing phase. Hence
|
||||||
|
the BGP distributed logical router needs to contain two static routes. One to
|
||||||
|
route ingress traffic to the provider network and one unused route that serves
|
||||||
|
only to pass the routing stage in the pipeline until the reroute policy is hit.
|
||||||
|
|
||||||
|
The first static route can look like this:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
bfd : []
|
||||||
|
external_ids : {}
|
||||||
|
ip_prefix : "192.168.111.0/24"
|
||||||
|
nexthop : "192.168.111.30"
|
||||||
|
options : {}
|
||||||
|
output_port : lrp-bgp-main-router-to-ls-interconnect
|
||||||
|
policy : []
|
||||||
|
route_table : ""
|
||||||
|
selection_fields : []
|
||||||
|
|
||||||
|
where the ip_prefix is the provider network prefix and the output_port is the
|
||||||
|
LRP connecting to the provider LS. The nexthop is the LRP of the Neutron router
|
||||||
|
port that serves as a gateway.
|
||||||
|
|
||||||
|
The second static route is unused and can look like this:
|
||||||
|
|
||||||
|
.. _fake-static-route:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
bfd : []
|
||||||
|
external_ids : {}
|
||||||
|
ip_prefix : "0.0.0.0/0"
|
||||||
|
nexthop : "192.168.111.30"
|
||||||
|
options : {}
|
||||||
|
output_port : []
|
||||||
|
policy : []
|
||||||
|
route_table : ""
|
||||||
|
selection_fields : []
|
||||||
|
|
||||||
|
The route needs to match all traffic and the nexthop doesn't matter because it
|
||||||
|
will be determined by the reroute policies based on the chassis locality. The
|
||||||
|
ingress logical router pipeline with the route implemented looks like this:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
... the other routes are here but none matches 0.0.0.0/0 ...
|
||||||
|
table=15(lr_in_ip_routing ), priority=4 , match=(reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 192.168.111.30; reg5 = 192.168.111.30; eth.src = 00:de:ad:10:00:00; outport = "lrp-bgp-main-router-to-ls-interconnect"; flags.loopback = 1; reg9[9] = 1; next;)
|
||||||
|
table=15(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;)
|
||||||
|
table=16(lr_in_ip_routing_ecmp), priority=150 , match=(reg8[0..15] == 0), action=(next;)
|
||||||
|
table=16(lr_in_ip_routing_ecmp), priority=0 , match=(1), action=(drop;)
|
||||||
|
table=17(lr_in_policy ), priority=10 , match=(inport=="lrp-bgp-main-router-to-ls-interconnect" && is_chassis_resident("cr-lrp-bgp-main-router-to-bgp-router-r0-compute-0")), action=(reg0 = 169.254.0.1; reg5 = 169.254.0.2; eth.src = 00:de:ad:00:10:00; outport = "lrp-bgp-main-router-to-bgp-router-r0-compute-0"; flags.loopback = 1; reg8[0..15] = 0; reg9[9] = 1; next;)
|
||||||
|
|
||||||
|
As we need to get to the stage where the reroute policy is hit, we need to pass
|
||||||
|
the lr_in_ip_routing stage first and this stage is implemented with a static
|
||||||
|
route. That means we match the 0.0.0.0/0 prefix using the first rule and then
|
||||||
|
later we change the output_port with the last rule with its reroute action. If
|
||||||
|
the static route would not be present, the traffic would be dropped with the
|
||||||
|
second rule containing the drop action.
|
||||||
|
|
||||||
|
|
||||||
|
Per-chassis logical routers
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
There is also a logical router created and bound to each chassis. These routers
|
||||||
|
serve to learn ECMP routes from the BGP peers and to forward traffic between the
|
||||||
|
provider bridges and the BGP distributed router.
|
||||||
|
|
||||||
|
For cases where the compute nodes share data plane and control plane traffic
|
||||||
|
over the same spine-and-leaf topology, there is a need to maintain openflow
|
||||||
|
rules on the provider bridge that differentiate traffic between control plane,
|
||||||
|
and hence forward traffic to the host, and the dataplane traffic that needs to
|
||||||
|
go to the OVN overlay. The following openflow rules could be used to achieve
|
||||||
|
this:
|
||||||
|
|
||||||
|
.. _openflow-rules:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
priority=10,ip,in_port=eth0,nw_dst=<host IPs> actions=NORMAL
|
||||||
|
priority=10,ipv6,in_port=eth0,ipv6_dst=<host IPv6s> actions=NORMAL
|
||||||
|
priority=10,arp actions=NORMAL
|
||||||
|
priority=10,icmp6,icmp_type=133 actions=NORMAL
|
||||||
|
priority=10,icmp6,icmp_type=134 actions=NORMAL
|
||||||
|
priority=10,icmp6,icmp_type=135 actions=NORMAL
|
||||||
|
priority=10,icmp6,icmp_type=136 actions=NORMAL
|
||||||
|
priority=10,ipv6,in_port=eth0,ipv6_dst=fe80::/64 actions=NORMAL
|
||||||
|
priority=8,in_port=eth0 actions=mod_dl_dst:<LRP MAC>,output:<patch_port_to_ovn>
|
||||||
|
|
||||||
|
Those rules match traffic that is destined to the host and forward it to the
|
||||||
|
host. Everything else is forwarded to the OVN overlay. The patch_port_to_ovn is
|
||||||
|
a patch port that ovn-controller created based on the ovn-bridge-mappings
|
||||||
|
configuration.
|
||||||
|
|
||||||
|
The router itself needs to implement routes for traffic coming from the provider
|
||||||
|
network and for traffic coming from the OVN overlay. For ingress provider
|
||||||
|
network traffic, the routes can look as follows:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
bfd : []
|
||||||
|
external_ids : {}
|
||||||
|
ip_prefix : "192.168.111.0/24"
|
||||||
|
nexthop : "169.254.0.2"
|
||||||
|
options : {}
|
||||||
|
output_port : lrp-bgp-router-r0-compute-0-to-bgp-main-router
|
||||||
|
policy : []
|
||||||
|
route_table : ""
|
||||||
|
selection_fields : []
|
||||||
|
|
||||||
|
where ip_prefix matches the subnet of the provider network and the nexthop is
|
||||||
|
set to the address of the LRP attached to the BGP distributed router and the
|
||||||
|
output_port is set to its peer LRP.
|
||||||
|
|
||||||
|
The egress traffic from the OVN overlay needs to be routed with ECMP to the BGP
|
||||||
|
network. This can be achieved with the following static routes for each BGP
|
||||||
|
peer:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
bfd : []
|
||||||
|
external_ids : {}
|
||||||
|
ip_prefix : "0.0.0.0/0"
|
||||||
|
nexthop : "100.64.0.1"
|
||||||
|
options : {}
|
||||||
|
output_port : lrp-bgp-router-r0-compute-0-to-ls-r0-compute-0-eth0
|
||||||
|
policy : []
|
||||||
|
route_table : ""
|
||||||
|
selection_fields : []
|
||||||
|
|
||||||
|
bfd : []
|
||||||
|
external_ids : {}
|
||||||
|
ip_prefix : "0.0.0.0/0"
|
||||||
|
nexthop : "100.65.0.1"
|
||||||
|
options : {}
|
||||||
|
output_port : lrp-bgp-router-r0-compute-0-to-ls-r0-compute-0-eth1
|
||||||
|
policy : []
|
||||||
|
route_table : ""
|
||||||
|
selection_fields : []
|
||||||
|
|
||||||
|
|
||||||
|
Traffic flow
|
||||||
|
============
|
||||||
|
|
||||||
|
This section describes the traffic flow from and to a LSP hosted on a chassis.
|
||||||
|
|
||||||
|
An example of the traffic from the external network to a VM with a Floating IP on chassis 1
|
||||||
|
-------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Because of the dummy connection between the ls-public LS and the BGP distributed
|
||||||
|
router, OVN creates an Advertised_Route entry for the Floating IP. Because the
|
||||||
|
associated logical port is bound to the chassis 1, OVN populates the local VRF
|
||||||
|
on the chassis 1 with the route to the Floating IP and the local FRR instance
|
||||||
|
advertises the route to the BGP peers.
|
||||||
|
|
||||||
|
The fabric learns the route to the Floating IP from the BGP peers and forwards
|
||||||
|
the traffic to the chassis 1 to either eth0 or eth1 because of the ECMP routes.
|
||||||
|
|
||||||
|
The traffic does not match any of the higher priority :ref:`openflow rules
|
||||||
|
<openflow-rules>` on the provider bridge and matches the last rule. The rule
|
||||||
|
changes the destination MAC to the LRP MAC address of the per-chassis router
|
||||||
|
associated with the NIC and the traffic is forwarded to OVN. The traffic enters
|
||||||
|
the per chassis logical router that has Logical_Static_Route configured to
|
||||||
|
forward the traffic to the distributed BGP router. The BGP distributed router is
|
||||||
|
configured to forward the traffic to the ls-inter-public switch with a
|
||||||
|
Logical_Static_Route matching the destination IP with the provider network
|
||||||
|
subnet and through the br-bgp provider bridge the traffic gets to the ls-public
|
||||||
|
logical switch. From here the traffic follows the same path as without BGP and
|
||||||
|
is NAT'ed by the Neutron router.
|
||||||
|
|
||||||
|
|
||||||
|
An example of the traffic from a VM with a Floating IP on chassis 1 to the external network
|
||||||
|
-------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
The egress VM traffic is NAT'ed by the Neutron router and the traffic is
|
||||||
|
forwarded to the provider network gateway which is connected to the ls-public
|
||||||
|
LS. Because of the presence of the localnet ports the traffic gets through the
|
||||||
|
br-bgp bridge to the distributed BGP router where it matches the artificial
|
||||||
|
:ref:`Logical_Router_Static_Route <fake-static-route>` to skip the
|
||||||
|
lr_in_ip_routing stage in the pipeline and will be matched with the BGP router
|
||||||
|
policy based on the chassis locality. The reroute action of the policy will pick
|
||||||
|
the right LRP that is connected to the per-chassis router. Here the traffic
|
||||||
|
matches static routes per peer and with the ECMP is
|
||||||
|
forwarded to the BGP networks.
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
Existing tempest tests should provide good regression testing. We can reuse the
|
||||||
|
existing topology from the ovn-bgp-agent project that peer with VMs simulating a
|
||||||
|
BGP router.
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
The implementation is split into two parts. The first part creates the service
|
||||||
|
plugin that takes care of the BGP topology in OVN including the configuration of
|
||||||
|
static routes and router policies.
|
||||||
|
|
||||||
|
The second part is an OVN agent extension [3]_ that configures per chassis host
|
||||||
|
configurations. The OVN agent itself is orthogonal to the BGP service plugin and
|
||||||
|
can be replaced with any third-party tool that takes care of the node dynamic
|
||||||
|
configuration. The OVN agent extension is responsible for steering the traffic
|
||||||
|
to the OVN and for configuring the per chassis host configurations such as
|
||||||
|
adding ovn-bridge-mappings per the BGP peer, and for implementing local openflow
|
||||||
|
rules to differentiate traffic between the control plane and the dataplane. It
|
||||||
|
also monitors patch ports on the br-bgp and creates direct connection between
|
||||||
|
the localnet ports to avoid any FDB learning on the bridge.
|
||||||
|
An example of the simple openflow rules is shown below:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
priority=10,in_port=2, actions=output:3
|
||||||
|
priority=10,in_port=3, actions=output:2
|
||||||
|
|
||||||
|
where 2 is the patch port to the logical switch connected to the BGP distributed
|
||||||
|
router and 3 is the patch port connected to the Neutron public switch.
|
||||||
|
|
||||||
|
Where BGP is used with Neutron or not is determined by enabling the service
|
||||||
|
plugin and the OVN agent extension.
|
||||||
|
|
||||||
|
As it is written in the first paragraph of this spec, the BGP support in OVN
|
||||||
|
was introduced in 25.03. Therefore the BGP service plugin requires OVN 25.03 or
|
||||||
|
later.
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
* Jakub Libosvar <jlibosva@redhat.com>
|
||||||
|
|
||||||
|
|
||||||
|
Documentation
|
||||||
|
=============
|
||||||
|
|
||||||
|
Deployment guide will be written in describing how to enable the service plugin
|
||||||
|
and what needs to be configured on the nodes, such as steering traffic to the
|
||||||
|
OVN or configuration of a BGP speaker advertising the routes to its peer. An
|
||||||
|
example using FRR configuration will be introduced so the operators have a
|
||||||
|
reference for the configuration.
|
||||||
|
|
||||||
|
.. [1] https://review.opendev.org/c/openstack/neutron-specs/+/891204
|
||||||
|
.. [2] https://issues.redhat.com/browse/FDP-1554
|
||||||
|
.. [3] https://opendev.org/openstack/neutron/src/commit/1e6381cbd25f8ab4fc9a3bcaa1ab7af1d605946e/doc/source/ovn/ovn_agent.rst
|
Reference in New Issue
Block a user