L3 router support ECMP
This spec outlines the Implementation plan of ECMP in neutron. Patch for this spec: https://review.opendev.org/#/c/743661 Related-Bug: #1880532 Change-Id: I67ebf642fbb130a7701792d66629dbab2d76181b
This commit is contained in:

committed by
Rodolfo Alonso

parent
a7b0484b54
commit
9cbcaa13e3
410
specs/wallaby/l3-router-support-ecmp.rst
Normal file
410
specs/wallaby/l3-router-support-ecmp.rst
Normal file
@@ -0,0 +1,410 @@
|
|||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
======================
|
||||||
|
L3 router support ECMP
|
||||||
|
======================
|
||||||
|
|
||||||
|
Blueprint:
|
||||||
|
https://blueprints.launchpad.net/neutron/+spec/support-for-ecmp
|
||||||
|
|
||||||
|
Launchpad Bug:
|
||||||
|
https://bugs.launchpad.net/neutron/+bug/1880532
|
||||||
|
|
||||||
|
ECMP is a kind of routing technology which allows traffic to reach the
|
||||||
|
same destination via multiple different links. Neutron does not need to
|
||||||
|
calculate the equivalent route path, but leave that part of the work to
|
||||||
|
those applications using ECMP API. Neutron just receives those parameters
|
||||||
|
and configures routers. Since we have "ip route" command provided by the
|
||||||
|
iproute2 utility in Linux, Neutron can simply address ECMP by using pyroute2
|
||||||
|
and adding route entry into Neutron router namespace.
|
||||||
|
|
||||||
|
This feature is currently designed to support Octavia's multi-active scheme,
|
||||||
|
allowing LoadBalancer in Octavia to have multiple amphoras at the same time.
|
||||||
|
By configuring the ECMP route in the router, multiple amphoras can have a
|
||||||
|
virtual IP at the same time to serve a set of functions that require high
|
||||||
|
concurrency support.
|
||||||
|
|
||||||
|
.. _P2:
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Items marked with [`P2`_] refer to lower priority features
|
||||||
|
to be designed / implemented only after initial release.
|
||||||
|
|
||||||
|
[`P2`_] Currently the equal cost route is a simple 5 tuple, that means if
|
||||||
|
we have one <nexthop> unreachable and remove it from ECMP routes, all
|
||||||
|
connections get redistributed. To avoid this, we intend to use a consistent
|
||||||
|
hashing instead of the original scheme. This scheme which can support
|
||||||
|
consistent hashing is based on hmark which was added in iptables-1.4.15 or
|
||||||
|
later. See the history file of the iptables on [1]_.
|
||||||
|
|
||||||
|
Then this spec describes how to implement ECMP in Neutron.
|
||||||
|
|
||||||
|
|
||||||
|
Problem Description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Octavia has proposed an active-active load balancing design on [2]_.
|
||||||
|
|
||||||
|
Topology Description
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Tenant Backend
|
||||||
|
+----------------+ Network
|
||||||
|
| | +
|
||||||
|
Internet+-------------->+ router/gw +----------------->
|
||||||
|
| | ECMP |
|
||||||
|
+----------------+ |
|
||||||
|
|
|
||||||
|
Management |
|
||||||
|
Network |
|
||||||
|
+ |
|
||||||
|
| | +----------+
|
||||||
|
| +-----------------------+ | | Tenant |
|
||||||
|
| +----+ +---------+ <---------+Service(1)|
|
||||||
|
| |MGMT| loadbalancer(1) | VIP|Back| | | |
|
||||||
|
<----------+ IP | | | IP +---------> +----------+
|
||||||
|
| +---------------------------------+ |
|
||||||
|
| | | +----------+
|
||||||
|
| | | | Tenant |
|
||||||
|
| | ICMP <---------+Service(2)|
|
||||||
|
| | DETECT | | |
|
||||||
|
| | | +----------+
|
||||||
|
| | |
|
||||||
|
| +-----------------------+ v | +----------+
|
||||||
|
| +----+ +---------+ | | Tenant |
|
||||||
|
| |MGMT| loadbalancer(2) | VIP|Back| <---------+service(3)|
|
||||||
|
<----------+ IP | | | IP +---------> | |
|
||||||
|
| +---------------------------------+ | +----------+
|
||||||
|
| | |
|
||||||
|
| | |
|
||||||
|
| +-------------+ | | ● ● ●
|
||||||
|
| |Octavia Lbaas| | |
|
||||||
|
<---------+ Controller | ● ● ● | ICMP |
|
||||||
|
| +-------------+ | DETECT | +----------+
|
||||||
|
| | | | Tenant |
|
||||||
|
| | <---------+Service(M)|
|
||||||
|
| | | | |
|
||||||
|
| +-----------------------+ v | +----------+
|
||||||
|
| +----+ +---------+ |
|
||||||
|
| |MGMT| loadbalancer(n)| VIP|Back| |
|
||||||
|
<----------+ IP | | | IP +--------->
|
||||||
|
| +---------------------------------+ |
|
||||||
|
+ +
|
||||||
|
|
||||||
|
This program proposed such a scheme:
|
||||||
|
|
||||||
|
* Multiple load balancing servers in a vip-subnet, sharing one virtual IP
|
||||||
|
and one or more back end pools to response clients' request, and each
|
||||||
|
loadbalancer has its own IP address.
|
||||||
|
|
||||||
|
* Clients send requests to VIP, then the router distributes every single
|
||||||
|
request to a load balancing server which has the correct VIP configured
|
||||||
|
on it.
|
||||||
|
|
||||||
|
* Finally, the load balancing server distributes the request to a back end.
|
||||||
|
The loadbalancers and tenant service vm can be in the same subnet or
|
||||||
|
different networks.
|
||||||
|
|
||||||
|
In such a situation, Octavia needs the router to support ECMP for distributing
|
||||||
|
requests. So Octavia can send a request to Neutron for creating an ECMP route,
|
||||||
|
then Neutron L3 agent executes command in the Neutron router's namespace to
|
||||||
|
create an ECMP entry in it, using VIP as the destination IP of the route's
|
||||||
|
entry, and several load balancers' IP as nexthop IP. So those requests having
|
||||||
|
VIP as their destinations can be distributed to each loadbalancer.
|
||||||
|
|
||||||
|
The whole process implements two levels of load balancing, i.e. load balancing
|
||||||
|
between multiple loadbalancers and load balancing between the backend
|
||||||
|
real servers
|
||||||
|
|
||||||
|
[`P2`_] Based on current public cloud operator implementations in production
|
||||||
|
environments, tenants usually only see IPs in the same network, so
|
||||||
|
considering the same broadcast domain, the router needs to enable proxy
|
||||||
|
ARP on the corresponding interface.(Users need to disable the proxy ARP
|
||||||
|
capability of vms in nexthops by themselves)
|
||||||
|
|
||||||
|
User Workflow
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Generally, users can use the ECMP function for their own purposes.
|
||||||
|
For putting an ECMP entry into the router namespace,
|
||||||
|
user can set routes with same destination by using command::
|
||||||
|
|
||||||
|
openstack router add route \
|
||||||
|
--route destination=20.0.20.0/24,gateway=12.0.0.11 \
|
||||||
|
--route destination=20.0.20.0/24,gateway=12.0.0.12 router-ecmp
|
||||||
|
|
||||||
|
And withdraw the ECMP entry with::
|
||||||
|
|
||||||
|
openstack router add route \
|
||||||
|
--route destination=20.0.20.0/24,gateway=12.0.0.11 \
|
||||||
|
--route destination=20.0.20.0/24,gateway=12.0.0.12 router-ecmp
|
||||||
|
|
||||||
|
For more information about router related OSC, please read [3]_.
|
||||||
|
|
||||||
|
An integrated sequence diagram of the Octavia's use case is here:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
+------+ +--------+ +-------+ +--------+ +-------+ +------------+
|
||||||
|
|client| |Octavia | |Neutron| |LB Node | |qrouter| |service pool|
|
||||||
|
+------+ +---+----+ +---+---+ +---+----+ +---+---+ +------+-----+
|
||||||
|
|create LB | | | | |
|
||||||
|
+-------------> | create ecmp | | | |
|
||||||
|
|service +--------------> | | |
|
||||||
|
| | LB server boot | | |
|
||||||
|
| +--------------+---------->+ | |
|
||||||
|
| | | set ecmp route | |
|
||||||
|
| | ecmp done +-----------+--------->+ |
|
||||||
|
| +<-------------| | | |
|
||||||
|
| | LB server boot done | | |
|
||||||
|
|LB service done+<-------------+-----------+ | |
|
||||||
|
+<--------------+ | | | |
|
||||||
|
| | | | | |
|
||||||
|
| | | | | |
|
||||||
|
|sending request| | | | |
|
||||||
|
+---------------------------------------------------->| |
|
||||||
|
| | | | pick a LB node |
|
||||||
|
| | | +<---------| |
|
||||||
|
| | | | pick a service node |
|
||||||
|
| | | +---------------------->+
|
||||||
|
| | | | |response |
|
||||||
|
| | | +<----------------------+
|
||||||
|
| | response | | | |
|
||||||
|
+<-----------------------------------------+ | |
|
||||||
|
| | | | | |
|
||||||
|
| | | | | |
|
||||||
|
v v + v v v
|
||||||
|
|
||||||
|
|
||||||
|
Suppose a user has a set of services that require a multi-active
|
||||||
|
load-balancing scheme, so the user send a request to Octavia to create a
|
||||||
|
loadbalancer, specifying topology as multi-active. And post a vip-subnet
|
||||||
|
to Octavia to assign an IP or directly post a virtual port, which is
|
||||||
|
defined by Octavia, and then users need to submit parameters such as
|
||||||
|
pool, member, listener, etc., but the latter are irrelevant to Neutron,
|
||||||
|
you can find them in Octavia document.
|
||||||
|
|
||||||
|
While Octavia is creating a loadbalancer, it will also send an `update_router`
|
||||||
|
request or an `add_extraroutes` request to Neutron, post severval `routes`
|
||||||
|
entries with same `destination` param, and load balancers' IPs as
|
||||||
|
`nexthop` param.
|
||||||
|
|
||||||
|
Neutron receives the request from Octavia, determines whether to add an ECMP
|
||||||
|
route by calculating whether there are multiple routes with the same
|
||||||
|
destination address, making sure the router will distribute those packets
|
||||||
|
with vip as their destination.
|
||||||
|
|
||||||
|
Those ECMP routes will be removed when user drops the multi-active
|
||||||
|
loadbalancer, and it could be modified when adding or removing a load balancing
|
||||||
|
node.
|
||||||
|
|
||||||
|
|
||||||
|
Data flow
|
||||||
|
---------
|
||||||
|
|
||||||
|
* [`P2`_] (If on a same network, use ARP proxy) A client requests mac
|
||||||
|
address of the VIP and accesses the service based on this mac address.
|
||||||
|
the router will use gateway MAC address to respond.
|
||||||
|
|
||||||
|
* The client's datagram will be transmitted to the router first.
|
||||||
|
|
||||||
|
* The router gateway checks ECMP routing entries then forwards the
|
||||||
|
client's packets to the load balancers.
|
||||||
|
|
||||||
|
* Load balancer accepts connections from clients, receives traffic, then
|
||||||
|
distributes it to the back-end server pool.
|
||||||
|
|
||||||
|
* The reply traffic from the back-end server pool go through load balancers
|
||||||
|
and then comes to the router (directly comes back to intranet clients if on
|
||||||
|
a same network), these packets are eventually forwarded back by the router.
|
||||||
|
|
||||||
|
Proposed Change
|
||||||
|
===============
|
||||||
|
|
||||||
|
Overview
|
||||||
|
--------
|
||||||
|
|
||||||
|
In Server Side
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
* There are no changes that have to be made in server side.
|
||||||
|
|
||||||
|
In Agent Side
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Modify the logic of processing router_update event in L3 agent to
|
||||||
|
support adding ECMP routes in routers.
|
||||||
|
The `routes_updated` function in RouterInfo will behave as below:
|
||||||
|
|
||||||
|
* When more than one route is found to have the same destination, L3
|
||||||
|
agent should execute a pyroute2 code, which looks like
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
ip.route('replace', dst='<destination_ip>',multipath=[{"gateway":
|
||||||
|
"<nexthop1>"},{"gateway":"<nexthop2>"}])
|
||||||
|
|
||||||
|
* Then there will be an ip route entry in the namespace, which looks like
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
<vip> proto static
|
||||||
|
nexthop via <nexthop_ip1> dev qr-xxxxxxxx-nn weight 1
|
||||||
|
nexthop via <nexthop_ip2> dev qr-xxxxxxxx-nn weight 1
|
||||||
|
|
||||||
|
Then router will randomly pick a <nexthop_ip> and fill its mac address into
|
||||||
|
the package's dst_mac address when it wants to get to the <destination_ip>.
|
||||||
|
|
||||||
|
[`p2`_]For keeping connection while removing a load balancing node, use
|
||||||
|
iptables instead of simply a ip route entry.
|
||||||
|
|
||||||
|
- Use `HMARK` to mark flows in mangle table, the `fwmark` values
|
||||||
|
determined by the source address.
|
||||||
|
- Distribute flows to different tables by `fwmark` values.
|
||||||
|
- There is a mapping between the `fwmark` values and the table values
|
||||||
|
- For each table, give it a default nexthop ip.
|
||||||
|
- Modify the mapping between `fwmark` values and table values
|
||||||
|
when a `nexthop` is unreachable.
|
||||||
|
|
||||||
|
[`p2`_]In order to let traffic from the same network to pass through the
|
||||||
|
router, L3 agent will also let router to use Proxy ARP by setting command::
|
||||||
|
|
||||||
|
sysctl -w net.ipv4.conf.<NIC_1>.proxy_arp_pvlan=1
|
||||||
|
|
||||||
|
* <NIC_1> is the name of the router interface to which the destination
|
||||||
|
subnet is connected. For example, router `R1` is connected to a
|
||||||
|
subnet `sub-1` whose cidr is `10.10.10.0/24`, so there will be a
|
||||||
|
virtual network interface device `qr-abcdefgh` in the router related
|
||||||
|
namespace as the gateway for the subnet `sub-1`, then add an
|
||||||
|
ECMP route with a destination like `10.10.10.5/32` which is in the
|
||||||
|
scope of the subnet `sub-1`, at this point, the above command
|
||||||
|
will be executed and <NIC_1> will be `qr-abcdefgh`.
|
||||||
|
|
||||||
|
* For making the ARP proxy optional, add an config option in L3Agent.ini::
|
||||||
|
|
||||||
|
[ECMP]
|
||||||
|
|
||||||
|
router_interface_arp_proxy = True
|
||||||
|
|
||||||
|
|
||||||
|
Data Model Impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
REST API Impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
|
||||||
|
Following REST APIs wil be affected::
|
||||||
|
|
||||||
|
PUT /v2.0/routers/<router_id>/add_extraroutes
|
||||||
|
|
||||||
|
PUT /v2.0/routers/<router_id>/remove_extraroutes
|
||||||
|
|
||||||
|
PUT /v2.0/routers/<router_id>
|
||||||
|
|
||||||
|
The above three APIs are the current methods used to add/remove custom
|
||||||
|
routes. See the usage of `extraroutes` on [4]_. (The third API
|
||||||
|
`PUT /v2.0/routers/<router_id>` is not recommended for adding routes)
|
||||||
|
|
||||||
|
Before the ECMP routing Implementation, when L3 agent receive several route
|
||||||
|
entries with same destination and different nexthops, it will only keep one
|
||||||
|
entry of them, or replace the existing route with a new one. But now after
|
||||||
|
these changes, there will be an ECMP route in the router. So you can add an
|
||||||
|
ECMP route entry like this:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
PUT /v2.0/routers/{router_id}/add_extraroutes
|
||||||
|
|
||||||
|
{ "router":
|
||||||
|
{ "routes":
|
||||||
|
[ { "destination": "192.168.1.6/32",
|
||||||
|
"nexthop": "192.168.1.88" },
|
||||||
|
{ "destination": "192.168.1.6/32",
|
||||||
|
"nexthop": "192.168.1.99" }
|
||||||
|
...
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Then you can find the ECMP route in router related namespace:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#ip route
|
||||||
|
|
||||||
|
192.168.1.6/32 proto static
|
||||||
|
nexthop via 192.168.1.88 dev qr-9adb238b-c2 weight 1
|
||||||
|
nexthop via 192.168.1.99 dev qr-9adb238b-c2 weight 1
|
||||||
|
|
||||||
|
To make this behavior change discoverable, a shim extension called
|
||||||
|
'ecmp_routes' will be added.
|
||||||
|
[`p2`_]To make ARP proxy behavior discoverable, a shim extension called
|
||||||
|
'ecmp_arp' will be added, it will be removed dynamically when related option
|
||||||
|
`router_interface_arp_proxy` in config file is `False`.
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
* XiaoYu Zhu
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* L3 Agent Update
|
||||||
|
* Tests
|
||||||
|
* Documentation
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
Tempest Tests
|
||||||
|
-------------
|
||||||
|
* Tempest tests
|
||||||
|
|
||||||
|
Functional Tests
|
||||||
|
----------------
|
||||||
|
* New tests need to be written
|
||||||
|
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
User Documentation
|
||||||
|
------------------
|
||||||
|
* User documentation
|
||||||
|
* API reference
|
||||||
|
|
||||||
|
Developer Documentation
|
||||||
|
-----------------------
|
||||||
|
* Needs devref documentation
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [1] http://netfilter.org/projects/iptables/files/changes-iptables-1.4.15.txt
|
||||||
|
|
||||||
|
.. [2] https://review.opendev.org/723864
|
||||||
|
|
||||||
|
.. [3] https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/router.html
|
||||||
|
|
||||||
|
.. [4] https://specs.openstack.org/openstack/neutron-specs/specs/train/improve-extraroute-api.html
|
||||||
|
|
Reference in New Issue
Block a user