
Update kernel source to 6.6.7 from linux-yocto upstream.
Update "debian" folder source to 6.1.27-1~bpo11+1 from debian upstream,
because kernel 6.6.7 is ported to our bullseye platform now and
the newest "debian" folder from debian upstream for bullseye platform
is for 6.1.
Add an optimization for the StarlingX debian kernel building framework:
We used to always maintain kernel with patches on "debian" folder
and they are put at kernel/kernel-std(rt)/debian/deb_patches dir.
Most of these patches are about "changelog" (debian/changelog) and
"config" (debian/config/amd64/none/config). The patches in "deb_patches"
dir increased rapidly.
Next We will put "changelog" and "config" at the dir
kernel/kernel-std(rt)/debian/source and use them to replace
debian/changelog and debian/config/amd64/none/config
after the upstream "debian" folder is extracted. This can not only
keep a clean "deb_patches" folder, but also avoid using a big patch to
remove the "changelog" file in the upstream "debian" folder before any
kernel build.
Below are changes about "deb_patches"/"patches" for kernel-rt:
(We use the patches' serial number in their name to represent them
becuase so many patches are involved here.)
(1)about "deb_patches" folder:
(1.1)Because of the optimization, all the patches about changelog
and config for 5.10 can be abandoned and they will be changed directly
in the files under "source" dir for 6.6.
Patches for 5.10 that are abandoned because they are about config:
0003/0005/0006/0007/0008/0010/0011/0016/0018/0022/0026/0028
Patches for 5.10 that are abandoned because they are about changelog:
0001/0002/0007/0013/0020/0023/0024/0025/0027/0029/0030/0032/0033/0034
The "changelog" and "config" under "source" dir for 6.6 are verified
to be aligned with those for 5.10 build.
CONFIG_FANOTIFY is enabled in "config" as a new request.
(1.2)Patch 0017 for 5.10 is abandoned because the new commit
<Use parallel XZ for source tar generation> is available in new
version "debian" folder, which does the same work.
Refer to: https://salsa.debian.org/kernel-team/linux/-/commit/
50b61a14e6dbc50b19dfe938c4679ecda50b83ee
(1.3)Below patches for 5.10 are ported to 6.6:
0004/0009/0015 (0009/0015 are merged into 0004) compose patch 0001
for 6.6;
0014 is ported to 0002 for 6.6;
0021 is ported to 0003 for 6.6;
0005/0019 (0005 is merged into 0019) compose patch 0004 for 6.6;
0012 is ported to 0005 for 6.6;
0031 is ported to 0006 for 6.6.
List the new patches for 6.6:
New patches 0001-0006 are ported from "deb_patches" for 5.10;
New patches 0007-0010 are added for building kernel 6.6.7 with
6.1.27-1~bpo11+1 "debian" folder.
(2)about "patches" folder:
(2.1)Patches for 5.10 that are abandoned because they are already in
6.6.7 include:
0017-0027/0031-0038/0041-0056/0058-0071/0073-0081/0083
(2.2)Patch 0011 for 5.10 is abandoned for new upstream commit
<scsi: smartpqi: Expose SAS address for SATA drives> in 6.6.
Refer to: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=00598b056aa6d46c7a6819efa850ec9d0d690d76
The new upstream commit has done what 0011 does.
(2.3)Patch 0039 for 5.10 is abandoned for new upstream commit
<samples/bpf: replace broken overhead microbenchmark with
fib_table_lookup> in 6.6.
Refer to: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=58e975d014e1e31bd1586be7d2be6d61bd3e3ada
0038 isn't needed any more with the new commit merged.
(2.4)Patch 0030 for 5.10 is abandoned because the related code has
been changed in 6.6 and the issue was verified to disappear.
(2.5)Patch 0010 for 5.10 is abandoned and the issue will be fixed by
setting /config/target/iscsi/cpus_allowed_list to be same with kernel
parameter "kthread_cpus". Because the new patch
<scsi: target: Add iscsi/cpus_allowed_list in configfs>
adds iscsi/cpus_allowed_list in configfs. The available CPU set of
iSCSI connection RX/TX threads is allowed_cpus & online_cpus.
This will do the same thing with patch 0010 so long as we set
cpus_allowed_list properly.
Refer to: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=d72d827f2f2636d8d72f0f3ebe5b661c9a24d343>
This issue will be addressed by later patches on stx framework part.
(2.6)Patch 0015-0016 are abandoned because the issue has been fixed
from the user space side by using the stable /dev/disk/by-path/...
symbolic links instead of names like /dev/sda that can change
(confirmed by M. Vefa Bicakci).
(2.7)Below patches for 5.10 are ported to 6.6:
0001-0009/0012/0028-0029/0040/0057/0082
(3)about kernel config:
(3.1) Enable CONFIG_GNSS for the ice driver.
Test plan:
The out of tree kernel modules for 6.6 aren't ready by now.
So many tests can't be done yet because the related test environments
need those OOT drivers. Here list the tests which have been done with
a test patch to remove the OOT drivers from the ISO temporarily.
There are also 2 patches as workaround for solving 2 issues met when
installing lab in jenkins job.
PASS: Build linux/linux-rt OK.
PASS: Build ISO OK.
PASS: Install and boot up OK on a AIO-SX lab with std/rt kernel.
PASS: The 12 hours cyclictest result for rt kernel is:
samples: 259199998 avg: 1658 max: 5455
99.9999th percentile: 3725 overflows: 0
Story: 2011000
Task: 49365
Signed-off-by: Li Zhou <li.zhou@windriver.com>
Change-Id: I6601fd2d7be4fc314ef2bc03b46f851eabebe3ea
(cherry picked from commit 06f53ed8e2
)
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
288 lines
9.2 KiB
Diff
288 lines
9.2 KiB
Diff
From d3a94bc5b2139aeb6f6d1f05c2bd47a8f9ad2650 Mon Sep 17 00:00:00 2001
|
|
From: Jim Somerville <jim.somerville@windriver.com>
|
|
Date: Fri, 14 Apr 2023 15:29:22 -0400
|
|
Subject: [PATCH] Port negative dentries limit feature from 3.10
|
|
|
|
This ports the Redhat feature forward from the 3.10 kernel version.
|
|
|
|
This feature allows one to specifiy a loose maximum of total memory
|
|
which is allowed to be used for negative dentries. This is done
|
|
via setting a sysctl variable which is used to calculate a
|
|
negative dentry limit for the system. Every 15 seconds a kworker
|
|
task will prune back the negative dentries that exceed the limit,
|
|
plus an extra 1% for hysteresis purposes.
|
|
|
|
Intent is that the feature code is kept as close to the 3.10 version
|
|
as possible.
|
|
|
|
Main differences from the 3.10 version of the code:
|
|
- count of dentries associated with a superblock is kept in a
|
|
different location, requiring a procedure call to obtain
|
|
- superblocks are now kept by node id and memcg, requiring
|
|
more calls into iterate_super
|
|
|
|
Signed-off-by: Jim Somerville <jim.somerville@windriver.com>
|
|
[zp: Adapted the patch for context and code changes.]
|
|
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
|
|
[lz: Adapted the patch for upgrading kernel from 5.10 to 6.6.
|
|
The "struct ctl_table fs_table" in kernel/sysctl.c has been removed
|
|
in 6.6. So move the proc file negative-dentry-limit's register
|
|
table to fs/dcache.c as part of "struct ctl_table fs_dcache_sysctls",
|
|
where the related functions and variables are defined.
|
|
Then the related symbol exports for them aren't needed any more.
|
|
Replace "&zero_ul" with "SYSCTL_LONG_ZERO" according to:
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
|
|
commit/?id=b1f2aff888af54a057c2c3c0d88a13ef5d37b52a.]
|
|
Signed-off-by: Li Zhou <li.zhou@windriver.com>
|
|
---
|
|
fs/dcache.c | 185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
|
|
1 file changed, 183 insertions(+), 2 deletions(-)
|
|
|
|
diff --git a/fs/dcache.c b/fs/dcache.c
|
|
index 576ad162c..0fff744af 100644
|
|
--- a/fs/dcache.c
|
|
+++ b/fs/dcache.c
|
|
@@ -32,6 +32,7 @@
|
|
#include <linux/bit_spinlock.h>
|
|
#include <linux/rculist_bl.h>
|
|
#include <linux/list_lru.h>
|
|
+#include <linux/memcontrol.h>
|
|
#include "internal.h"
|
|
#include "mount.h"
|
|
|
|
@@ -124,6 +125,65 @@ struct dentry_stat_t {
|
|
long dummy; /* Reserved for future use */
|
|
};
|
|
|
|
+/*
|
|
+ * dcache_negative_dentry_limit_sysctl:
|
|
+ * This is sysctl parameter "negative-dentry-limit" which specifies a
|
|
+ * limit for the number of negative dentries allowed in a system as a
|
|
+ * multiple of one-thousandth of the total system memory. The default
|
|
+ * is 0 which means there is no limit and the valid range is 0-100.
|
|
+ * So up to 10% of the total system memory can be used.
|
|
+ *
|
|
+ * negative_dentry_limit:
|
|
+ * The actual number of negative dentries allowed which is computed after
|
|
+ * the user changes dcache_negative_dentry_limit_sysctl.
|
|
+ */
|
|
+static long negative_dentry_limit;
|
|
+int dcache_negative_dentry_limit_sysctl;
|
|
+
|
|
+/*
|
|
+ * There will be a periodic check to see if the negative dentry limit
|
|
+ * is exceeded. If so, the excess negative dentries will be removed.
|
|
+ */
|
|
+#define NEGATIVE_DENTRY_CHECK_PERIOD (15 * HZ) /* Check every 15s */
|
|
+static void prune_negative_dentry(struct work_struct *work);
|
|
+static DECLARE_DELAYED_WORK(prune_negative_dentry_work, prune_negative_dentry);
|
|
+
|
|
+/*
|
|
+ * Sysctl proc handler for dcache_negativ3_dentry_limit_sysctl.
|
|
+ */
|
|
+int proc_dcache_negative_dentry_limit(struct ctl_table *ctl, int write,
|
|
+ void __user *buffer, size_t *lenp,
|
|
+ loff_t *ppos)
|
|
+{
|
|
+ /* Rough estimate of # of dentries allocated per page */
|
|
+ const unsigned int nr_dentry_page = PAGE_SIZE / sizeof(struct dentry);
|
|
+ int old = dcache_negative_dentry_limit_sysctl;
|
|
+ int ret;
|
|
+
|
|
+ ret = proc_dointvec_minmax(ctl, write, buffer, lenp, ppos);
|
|
+
|
|
+ if (!write || ret || (dcache_negative_dentry_limit_sysctl == old))
|
|
+ return ret;
|
|
+
|
|
+ negative_dentry_limit = totalram_pages() * nr_dentry_page *
|
|
+ dcache_negative_dentry_limit_sysctl / 1000;
|
|
+
|
|
+ /*
|
|
+ * The periodic dentry pruner only runs when the limit is non-zero.
|
|
+ * The sysctl handler is the only trigger mechanism that can be
|
|
+ * used to start/stop the prune work reliably, so we do that here
|
|
+ * after calculating the new limit.
|
|
+ */
|
|
+ if (dcache_negative_dentry_limit_sysctl && !old)
|
|
+ schedule_delayed_work(&prune_negative_dentry_work, 0);
|
|
+
|
|
+ if (!dcache_negative_dentry_limit_sysctl && old)
|
|
+ cancel_delayed_work_sync(&prune_negative_dentry_work);
|
|
+
|
|
+ pr_info("Negative dentry limits = %ld\n", negative_dentry_limit);
|
|
+ return 0;
|
|
+}
|
|
+
|
|
static DEFINE_PER_CPU(long, nr_dentry);
|
|
static DEFINE_PER_CPU(long, nr_dentry_unused);
|
|
static DEFINE_PER_CPU(long, nr_dentry_negative);
|
|
@@ -191,6 +251,15 @@ static struct ctl_table fs_dcache_sysctls[] = {
|
|
.mode = 0444,
|
|
.proc_handler = proc_nr_dentry,
|
|
},
|
|
+ {
|
|
+ .procname = "negative-dentry-limit",
|
|
+ .data = &dcache_negative_dentry_limit_sysctl,
|
|
+ .maxlen = sizeof(dcache_negative_dentry_limit_sysctl),
|
|
+ .mode = 0644,
|
|
+ .proc_handler = proc_dcache_negative_dentry_limit,
|
|
+ .extra1 = SYSCTL_LONG_ZERO,
|
|
+ .extra2 = SYSCTL_ONE_HUNDRED,
|
|
+ },
|
|
{ }
|
|
};
|
|
|
|
@@ -1202,8 +1271,9 @@ void shrink_dentry_list(struct list_head *list)
|
|
}
|
|
}
|
|
|
|
-static enum lru_status dentry_lru_isolate(struct list_head *item,
|
|
- struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
|
|
+static enum lru_status _dentry_lru_isolate(struct list_head *item,
|
|
+ struct list_lru_one *lru, spinlock_t *lru_lock, void *arg,
|
|
+ bool negative_only)
|
|
{
|
|
struct list_head *freeable = arg;
|
|
struct dentry *dentry = container_of(item, struct dentry, d_lru);
|
|
@@ -1254,12 +1324,29 @@ static enum lru_status dentry_lru_isolate(struct list_head *item,
|
|
return LRU_ROTATE;
|
|
}
|
|
|
|
+ if (negative_only && !d_is_negative(dentry)) {
|
|
+ spin_unlock(&dentry->d_lock);
|
|
+ return LRU_SKIP;
|
|
+ }
|
|
+
|
|
d_lru_shrink_move(lru, dentry, freeable);
|
|
spin_unlock(&dentry->d_lock);
|
|
|
|
return LRU_REMOVED;
|
|
}
|
|
|
|
+static enum lru_status dentry_lru_isolate(struct list_head *item,
|
|
+ struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
|
|
+{
|
|
+ return _dentry_lru_isolate(item, lru, lru_lock, arg, false);
|
|
+}
|
|
+
|
|
+static enum lru_status dentry_lru_isolate_negative(struct list_head *item,
|
|
+ struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
|
|
+{
|
|
+ return _dentry_lru_isolate(item, lru, lru_lock, arg, true);
|
|
+}
|
|
+
|
|
/**
|
|
* prune_dcache_sb - shrink the dcache
|
|
* @sb: superblock
|
|
@@ -1283,6 +1370,20 @@ long prune_dcache_sb(struct super_block *sb, struct shrink_control *sc)
|
|
return freed;
|
|
}
|
|
|
|
+/**
|
|
+ * Does the same thing as prune_dcache_sb but only gets rid of negative dentries
|
|
+ */
|
|
+long prune_dcache_sb_negative(struct super_block *sb, struct shrink_control *sc)
|
|
+{
|
|
+ LIST_HEAD(dispose);
|
|
+ long freed;
|
|
+
|
|
+ freed = list_lru_shrink_walk(&sb->s_dentry_lru, sc,
|
|
+ dentry_lru_isolate_negative, &dispose);
|
|
+ shrink_dentry_list(&dispose);
|
|
+ return freed;
|
|
+}
|
|
+
|
|
static enum lru_status dentry_lru_isolate_shrink(struct list_head *item,
|
|
struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
|
|
{
|
|
@@ -1677,6 +1778,86 @@ static enum d_walk_ret umount_check(void *_data, struct dentry *dentry)
|
|
return D_WALK_CONTINUE;
|
|
}
|
|
|
|
+struct prune_negative_ctrl
|
|
+{
|
|
+ long prune_count;
|
|
+ int prune_percent; /* Each unit = 0.01% */
|
|
+
|
|
+ struct shrink_control shrink_ctl;
|
|
+};
|
|
+
|
|
+/*
|
|
+ * Prune dentries from a super block.
|
|
+ */
|
|
+static void prune_negative_one_sb(struct super_block *sb, void *arg)
|
|
+{
|
|
+ struct prune_negative_ctrl *ctrl = arg;
|
|
+ unsigned long count = list_lru_count_one(&sb->s_dentry_lru, ctrl->shrink_ctl.nid, ctrl->shrink_ctl.memcg);
|
|
+ long scan = (count * ctrl->prune_percent) / 10000;
|
|
+ struct shrink_control shrink_ctl = ctrl->shrink_ctl;
|
|
+
|
|
+ if (scan) {
|
|
+ shrink_ctl.nr_to_scan = scan;
|
|
+ ctrl->prune_count += prune_dcache_sb_negative(sb, &shrink_ctl);
|
|
+ }
|
|
+}
|
|
+
|
|
+/*
|
|
+ * A workqueue function to prune negative dentry.
|
|
+ */
|
|
+static void prune_negative_dentry(struct work_struct *work)
|
|
+{
|
|
+ long count = get_nr_dentry_negative();
|
|
+ long limit = negative_dentry_limit;
|
|
+ struct prune_negative_ctrl ctrl;
|
|
+ unsigned long start;
|
|
+ struct mem_cgroup *memcg;
|
|
+ int nid;
|
|
+
|
|
+ if (!limit || count <= limit)
|
|
+ goto requeue_work;
|
|
+
|
|
+ /*
|
|
+ * Add an extra 1% as a minimum and to increase the chance
|
|
+ * that the after operation dentry count stays below the limit.
|
|
+ */
|
|
+ ctrl.prune_count = 0;
|
|
+ ctrl.prune_percent = ((count - limit) * 10000 / count) + 100;
|
|
+
|
|
+ ctrl.shrink_ctl.gfp_mask = GFP_KERNEL;
|
|
+ start = jiffies;
|
|
+
|
|
+
|
|
+ for_each_online_node(nid) {
|
|
+
|
|
+ ctrl.shrink_ctl.nid = nid;
|
|
+ memcg = mem_cgroup_iter(NULL, NULL, NULL);
|
|
+ do {
|
|
+ ctrl.shrink_ctl.memcg = memcg;
|
|
+ /*
|
|
+ * iterate_supers() will take a read lock on the supers blocking
|
|
+ * concurrent umount.
|
|
+ */
|
|
+ iterate_supers(prune_negative_one_sb, &ctrl);
|
|
+ } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
|
|
+ }
|
|
+
|
|
+ /*
|
|
+ * Report negative dentry pruning stat.
|
|
+ */
|
|
+ pr_debug("%ld negative dentries freed in %d ms\n",
|
|
+ ctrl.prune_count, jiffies_to_msecs(jiffies - start));
|
|
+
|
|
+requeue_work:
|
|
+ /*
|
|
+ * The requeuing will get cancelled if there is a concurrent
|
|
+ * cancel_delayed_work_sync() call from user sysctl operation.
|
|
+ * That call will wait until this work finishes and cancel it.
|
|
+ */
|
|
+ schedule_delayed_work(&prune_negative_dentry_work,
|
|
+ NEGATIVE_DENTRY_CHECK_PERIOD);
|
|
+}
|
|
+
|
|
static void do_one_tree(struct dentry *dentry)
|
|
{
|
|
shrink_dcache_parent(dentry);
|
|
--
|
|
2.17.1
|
|
|