Files
config-files/base-files-config
Alyson Deives Pereira 5df711b1fb Move utility services to new top level cgroup utils.slice
There are some situations on AIO-SX with 1 platform core where the CPU
usage is moderately high (50% or more) and some tasks are executed,
such as collect or <system application-apply>, the system start to
present probe failures, because the underlying services (such as ssh,
docker, rsync, and cron) occupies the CPU for a long period of time and
do not leave room for high priority services, such as kube-api probes.

This change creates a new top-level cgroup slice (utils.slice) to move
high consuming utility processes that has no or low impact on system
timing response. The utils.slice is configured with 128 CPUShares.

The following services are moved into this slice using the following
CPUShares: cron(128), docker (128), rsync(128), ssh (1024),
and systemd-udev (1024). Also init.scope CPUShare is changed to 128.

Change [1] modified collectd cpu plugin to account for this change.
Change [2] reverts this change on DC system-controller systems
using the dcmanager puppet manifest, since DC have a different workload
than AIO-SX and we want to avoid any performance degradation.

[1] https://review.opendev.org/c/starlingx/monitoring/+/947264
[2] https://review.opendev.org/c/starlingx/stx-puppet/+/948476

Test-Plan (AIO-SX):
PASS: build-pkgs -p base-config-files
PASS: build-pkgs -p docker-config
PASS: build-pkgs -p openssh-config
PASS: build-pkgs -p systemd-config
PASS: Build ISO with changes from [1]
PASS: install and bootstrap
PASS: Configure tasks that stresses both the platform cores (using the
      services from utils.slice) and application cores (using workload
      pods). No probe-failures are observed.
      Those tasks/workload include:
      - infinite loop running collectd
      - infinite loop downloading huge images from docker registry
      - infinite loop scaling pods up and down on application cores
      - infinite loop scaling pods up and down on platform cores
      - stress-ng pods on application cores
      - flex-ran simulation app on application cores

Test-Plan (DC):
PASS: Fresh install; verify overrides created on systemcontroller
      have default values, e.g.,
      /etc/systemd/system/<service>.service.d/<service>-cpu-shares.conf,
      /etc/systemd/system/init.scope.d/init.scope-cpu-shares.conf
      and services restarted

Story: 2011377
Task: 51901
Related-Bug: 2084714

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/948476

Change-Id: I49082f2ff190dd05da55cedc399f128d5a26f16d
Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>
2025-05-14 10:04:10 -03:00
..