
There are some situations on AIO-SX with 1 platform core where the CPU usage is moderately high (50% or more) and some tasks are executed, such as collect or <system application-apply>, the system start to present probe failures, because the underlying services (such as ssh, docker, rsync, and cron) occupies the CPU for a long period of time and do not leave room for high priority services, such as kube-api probes. This change creates a new top-level cgroup slice (utils.slice) to move high consuming utility processes that has no or low impact on system timing response. The utils.slice is configured with 128 CPUShares. The following services are moved into this slice using the following CPUShares: cron(128), docker (128), rsync(128), ssh (1024), and systemd-udev (1024). Also init.scope CPUShare is changed to 128. Change [1] modified collectd cpu plugin to account for this change. Change [2] reverts this change on DC system-controller systems using the dcmanager puppet manifest, since DC have a different workload than AIO-SX and we want to avoid any performance degradation. [1] https://review.opendev.org/c/starlingx/monitoring/+/947264 [2] https://review.opendev.org/c/starlingx/stx-puppet/+/948476 Test-Plan (AIO-SX): PASS: build-pkgs -p base-config-files PASS: build-pkgs -p docker-config PASS: build-pkgs -p openssh-config PASS: build-pkgs -p systemd-config PASS: Build ISO with changes from [1] PASS: install and bootstrap PASS: Configure tasks that stresses both the platform cores (using the services from utils.slice) and application cores (using workload pods). No probe-failures are observed. Those tasks/workload include: - infinite loop running collectd - infinite loop downloading huge images from docker registry - infinite loop scaling pods up and down on application cores - infinite loop scaling pods up and down on platform cores - stress-ng pods on application cores - flex-ran simulation app on application cores Test-Plan (DC): PASS: Fresh install; verify overrides created on systemcontroller have default values, e.g., /etc/systemd/system/<service>.service.d/<service>-cpu-shares.conf, /etc/systemd/system/init.scope.d/init.scope-cpu-shares.conf and services restarted Story: 2011377 Task: 51901 Related-Bug: 2084714 Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/948476 Change-Id: I49082f2ff190dd05da55cedc399f128d5a26f16d Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>
12 lines
361 B
Plaintext
12 lines
361 B
Plaintext
[Service]
|
|
# cgroup performance engineering
|
|
# - cron.service does not provide latency critical service
|
|
# - some cron jobs have significant significant sustained CPU and disk IO
|
|
# - set 1/8th default share
|
|
# - set lower IO priority (effective only with 'bfq' scheduler)
|
|
Slice=utils.slice
|
|
CPUShares=128
|
|
Nice=19
|
|
IOSchedulingClass=best-effort
|
|
IOSchedulingPriority=7
|