There are some situations on AIO-SX with 1 platform core where the CPU
usage is moderately high (50% or more) and some tasks are executed,
such as collect or <system application-apply>, the system start to
present probe failures, because the underlying services (such as ssh,
docker, rsync, and cron) occupies the CPU for a long period of time and
do not leave room for high priority services, such as kube-api probes.
This change creates a new top-level cgroup slice (utils.slice) to move
high consuming utility processes that has no or low impact on system
timing response. The utils.slice is configured with 128 CPUShares.
The following services are moved into this slice using the following
CPUShares: cron(128), docker (128), rsync(128), ssh (1024),
and systemd-udev (1024). Also init.scope CPUShare is changed to 128.
Change [1] modified collectd cpu plugin to account for this change.
Change [2] reverts this change on DC system-controller systems
using the dcmanager puppet manifest, since DC have a different workload
than AIO-SX and we want to avoid any performance degradation.
[1] https://review.opendev.org/c/starlingx/monitoring/+/947264
[2] https://review.opendev.org/c/starlingx/stx-puppet/+/948476
Test-Plan (AIO-SX):
PASS: build-pkgs -p base-config-files
PASS: build-pkgs -p docker-config
PASS: build-pkgs -p openssh-config
PASS: build-pkgs -p systemd-config
PASS: Build ISO with changes from [1]
PASS: install and bootstrap
PASS: Configure tasks that stresses both the platform cores (using the
services from utils.slice) and application cores (using workload
pods). No probe-failures are observed.
Those tasks/workload include:
- infinite loop running collectd
- infinite loop downloading huge images from docker registry
- infinite loop scaling pods up and down on application cores
- infinite loop scaling pods up and down on platform cores
- stress-ng pods on application cores
- flex-ran simulation app on application cores
Test-Plan (DC):
PASS: Fresh install; verify overrides created on systemcontroller
have default values, e.g.,
/etc/systemd/system/<service>.service.d/<service>-cpu-shares.conf,
/etc/systemd/system/init.scope.d/init.scope-cpu-shares.conf
and services restarted
Story: 2011377
Task: 51901
Related-Bug: 2084714
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/948476
Change-Id: I49082f2ff190dd05da55cedc399f128d5a26f16d
Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>