Files
kernel/kernel-modules/intel-iavf/debian/deb_folder/patches/0003-Fix-the-invalid-check-in-iavf_remove.patch
Jiping Ma ba86fee885 kernel-modules: re-enable the OOT drivers
This commit re-enables ice, i40e, iavf, igb-uio, opae-fpga-driver and
iqvlinux OOT drivers.

In order to deal with a firmware incompatibility or if a problem is
found with the in-tree driver, we will continue to support one
version of each out-of-tree driver. The out-of-tree driver versions
that we plan to keep in StarlingX have been validated in the past.
We will be preserving i40e-2.20.12, iavf-4.5.3.2, and ice-1.9.11
as out-of-tree drivers. For the X700 series of NICs, i40e-2.20.12
is compatible with NVM (non-volatile memory, or firmware) version
9.20, and for the E810 series NICs, ice-1.9.11 is compatible with
NVM version 4.22.

The drivers igb-uio, opae-fpga-driver and iqvlinux are known to only
exist as OOT.

We encountered a few the version compatibility issues after upgrading
the kernel to 6.6.7 version. We adapted drivers' code to the kernel
6.6.7 by referring to the upstream commits.

We also change the driver location folder to "weak-updates" from
"updates" for ice, iavf and i40e so that the default drivers will be
the in-tree drivers.
    ice: 1.9.11
    i40e: 2.20.12
    iavf: 4.5.3.2

ice, i40e, iavf:
* commit b48b89f9c ("net: drop the weight argument from netif_napi_add")
  https://git.yoctoproject.org/linux-yocto/commit/?h=b48b89f9c

* commit 068c38ad ("net: Remove the obsolte u64_stats_fetch_*_irq()
  users (drivers).")
  https://git.yoctoproject.org/linux-yocto/commit/?h=068c38ad

* commit ba153552c ("ice: Remove redundant
  pci_enable_pcie_error_reporting()")
  https://git.yoctoproject.org/linux-yocto/commit/?h=ba153552c

* commit 680ee0456a ("net: invert the netdevice.h vs xdp.h dependency")
  https://git.yoctoproject.org/linux-yocto/commit/?h=680ee0456a

ice:
* commit ac73d4bf2 ("net: make drivers to use SET_NETDEV_DEVLINK_PORT
  to set devlink_port")
  https://git.yoctoproject.org/linux-yocto/commit/?h=ac73d4bf2

* commit 226bf98055 ("net: devlink: let the core report the driver name
  instead of the drivers")
  https://git.yoctoproject.org/linux-yocto/commit/?h=226bf98055

* commit fb8421a9 ("devlink: remove devlink features")
  https://git.yoctoproject.org/linux-yocto/commit/?h=fb8421a9

i40e:
* commit 3626a690b ("i40e: use mul_u64_u64_div_u64 for PTP frequency
  calculation")
  https://git.yoctoproject.org/linux-yocto/commit/?h=3626a690b

* commit ccd3bf985 ("i40e: convert .adjfreq to .adjfine")
  https://git.yoctoproject.org/linux-yocto/commit/?h=ccd3bf985

igb_uio:
Cherry pick upstream commit to fix the build error.
* commit 29b1c1e4 ("linux/igb_uio: fix build with kernel 5.18+")
  http://git.dpdk.org/dpdk-kmods/commit/?id=29b1c1e4

intel-opae-fpga:
* commit 1aaba11da9 ("driver core: class: remove module * from
  class_create()")
  https://git.yoctoproject.org/linux-yocto/commit/?h=1aaba11da9

iqvlinux:
* commit 79687789 ("PCI: Remove the deprecated "pci-dma-compat.h" API")
  https://git.yoctoproject.org/linux-yocto/commit/?h=79687789

Verification:
* ice, i40e, iavf:
  - installs from iso succeed on servers with ice(ntel E810-2C-QDA2
    Chapman beach) and i40e hw(Intel Ethernet Controller X710) for
    rt and std.
  - interfaces are up and pass packets for rt and std.
  - create vfs, ensure that they are picked up by the new iavf
    driver and that the interface can come up and pass packets
    on rt and std system.
  - Check dmesg to see DDP package is loaded successfully and
    the version is 1.3.30.0 for rt and std.
* Switch drivers between the OOT and in-tree drivers.
  - switch to the OOT drivers
    1. Add cmdline parameter multi-drivers-switch=cvl-4.0.1
    2. reboot
    can switch to the OOT drivers.
  - switch to the in-tree drivers
    1. Remove cmdline parameter multi-drivers-switch=cvl-4.0.1
    2. reboot
    can switch to the in-tree drivers.
* igb_uio, intel-opae-fpga, iqvlinux
    Did not do the tests, it looks need docker image to do the test.
    So need the test team help to do the full tests.

Story: 2011056
Task: 49672

Change-Id: I2c05dee6f35ed431e8a53d2680a3c7558f08abef
Signed-off-by: Jiping Ma <jiping.ma2@windirver.com>
(cherry picked from commit d4f8973274)
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
2024-07-10 23:17:25 +00:00

105 lines
4.4 KiB
Diff

From cb27bf3ae02dde94988f643e71f59d94943c8798 Mon Sep 17 00:00:00 2001
From: Jiping Ma <jiping.ma2@windriver.com>
Date: Tue, 4 Apr 2023 23:40:48 -0700
Subject: [PATCH 3/8] Fix the invalid check in iavf_remove()
If the netdev pointer is NULL, then iavf_remove() returns early to
ensure that it does not proceed with an already-freed netdev instance.
However, drvdata field of the iavf driver's pci_dev structure continues
to keep the former value of the netdev pointer, and this value can be
acquired from the pci_dev structure via pci_get_drvdata(). This causes a
kernel panic when a forced reboot/shutdown is in progress due to the
following sequence of events:
- The iavf_shutdown() callback is called by the kernel. This function
detaches the device, brings it down if it was running and frees
resources.
- Later, the associated PF driver's shutdown callback is called:
ice_shutdown(). That callback calls, among others, sriov_disable(),
which then indirectly calls iavf_remove() again.
- Kernel WARNING is reported because the work adminq_task->func is NULL
in cancel_work_sync(&adapter->adminq_task) during iavf_remove(), that
reason is the resource already had been freed in the first iavf_remove()
running stage.
"WARNING: CPU: 63 PID: 93678 at kernel/workqueue.c:3047
__flush_work.isra.0+0x6b/0x80"
The patch for iavf resolves this issue by checking the pci_dev
structure's is_busmaster field at the beginning of iavf_remove(). If the
PCI device had already been disabled by an earlier call to
iavf_shutdown() or iavf_remove(), via a call to pci_disable_device(),
then the is_busmaster field would be set to zero. Based on this logic,
if the is_busmaster field is set to zero, then the iavf_remove function
returns early. This in turn avoids the aforementioned kernel panic
caused by multiple calls to iavf_remove().
Reproducer:
1. Create container with VF on PF driven by ice.
2. Ensure that the VF is bound to iavf driver
3. Reboot -f
[ 341.561449] iavf 0000:51:05.2: Removing device
[ 341.730407] iavf 0000:51:05.1: Removing device
[ 341.924457] iavf 0000:51:05.0: Removing device
[ 347.130324] pci 0000:51:05.0: Removing from iommu group 161
[ 347.130367] ------------[ cut here ]------------
[ 347.130372] WARNING: CPU: 63 PID: 93678 at kernel/workqueue.c:3047 \
__flush_work.isra.0+0x6b/0x80
[ 347.130373] Modules linked in: ...
[ 347.130688] ...
[ 347.130958] CPU: 63 PID: 93678 Comm: reboot Kdump: loaded \
Tainted: G S O \
5.10.0-6-amd64 #1 Debian 5.10.162-1.stx.64
[ 347.130990] Hardware name: ...
[ 347.130995] RIP: 0010:__flush_work.isra.0+0x6b/0x80
...
[ 347.131076] Call Trace:
[ 347.131083] __cancel_work_timer+0xff/0x190
[ 347.131089] ? kernfs_put.part.0+0xd9/0x1a0
[ 347.131150] ? kmem_cache_free+0x3bd/0x410
[ 347.131158] iavf_remove+0x5e/0xe0 [iavf]
[ 347.131163] ? pci_device_remove+0x38/0xa0
[ 347.131167] ? __device_release_driver+0x17b/0x250
[ 347.131169] ? device_release_driver+0x24/0x30
[ 347.131172] ? pci_stop_bus_device+0x6c/0x90
[ 347.131174] ? pci_stop_and_remove_bus_device+0xe/0x20
[ 347.131179] ? pci_iov_remove_virtfn+0xc0/0x130
[ 347.131185] ? sriov_disable+0x34/0xe0
[ 347.131210] ? ice_free_vfs+0x77/0x350 [ice]
[ 347.131215] ? flow_indr_dev_unregister+0x243/0x250
[ 347.131226] ? ice_remove+0x3e5/0x430 [ice]
[ 347.131237] ? ice_shutdown+0x16/0x50 [ice]
[ 347.131240] ? pci_device_shutdown+0x31/0x60
[ 347.131243] ? device_shutdown+0x156/0x1b0
[ 347.131248] ? __do_sys_reboot.cold+0x2f/0x5b
[ 347.131251] ? vfs_writev+0xc5/0x160
[ 347.131254] ? get_max_files+0x20/0x20
[ 347.131258] ? sched_clock+0x5/0x10
[ 347.131264] ? get_vtime_delta+0xf/0xc0
[ 347.131267] ? vtime_user_exit+0x1c/0x70
[ 347.131272] ? do_syscall_64+0x30/0x40
[ 347.131276] ? entry_SYSCALL_64_after_hwframe+0x61/0xc6
Signed-off-by: Jiping Ma <Jiping.ma2@windriver.com>
---
src/iavf_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/iavf_main.c b/src/iavf_main.c
index cc5ca85..8f8c459 100644
--- a/src/iavf_main.c
+++ b/src/iavf_main.c
@@ -5757,6 +5757,9 @@ static void iavf_remove(struct pci_dev *pdev)
struct iavf_mac_filter *f, *ftmp;
struct iavf_hw *hw = &adapter->hw;
+ /* Don't proceed with remove if pci device is already disable */
+ if(pdev->is_busmaster == 0)
+ return;
/* Indicate we are in remove and not to run/schedule any driver tasks */
set_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section);
cancel_work_sync(&adapter->adminq_task);
--
2.42.0