Run maintenance failsafe reboot using systemd-run

The current method used by Maintenance to launch its failsafe reboot
function causes systemd shutdown to stall due to recent changes in
cgroup containment behavior. Specifically, mtcAgent runs within the
'sm.service' cgroup, while mtcClient is now launched in its own
transient systemd-run cgroup.

When a grandchild process is created using a double-fork and execv,
it inherits the parent's cgroup unless explicitly detached. As a
result, the long-lived reboot sleeper remains part of the parent's
cgroup. During shutdown, systemd waits for all processes in a
service's cgroup to exit before completing the stop operation.

In the case of a failsafe reboot, the grandchild sleeps for a period
before issuing a SysRq-triggered reboot. This delay often exceeds
systemd’s default 90-second shutdown timeout, causing unnecessary
delays during node reboot. Even though the system could otherwise
shut down cleanly in less time.

This update resolves the issue by switching the failsafe reboot logic
to use systemd-run. This ensures the reboot script runs in its own
isolated transient unit and cgroup, fully detached from the parent
service.

A new `delayed_sysrq_reboot.sh` script is introduced to implement
the reboot logic. With this change, failsafe reboots now work as
expected without stalling systemd shutdown, whether triggered from
mtcAgent or mtcClient.

Test Plan:

PASS: Verify build, install and unlock/lock/unlock of each node in
PASS: ... AIO SX (hw) and AIO DX (hw)
PASS: ... AIO DX with SX subcloud (dc-libvirt)
PASS: ... Standard 2+1+1 storage (vbox)

Unit Testing new delayed_sysrq_reboot.sh testing

PASS: Verify reset after specified delay (success path)
PASS: Verify kernel sysrq auto enable feature (recovery path)
PASS: Verify sysrq reset is rejected when (failure path)
PASS: ... delay value is not specified
PASS: ... delay is out of range
PASS: ... called as non-root user
PASS: ... too few or many arguments
PASS: ... the /proc/sysrq-trigger file is not present (fit)

PASS: Verify file is owned as root:root and has root only permissions
PASS: Verify behavior if /proc/sysrq-trigger does not cause a reset
PASS: Verify execution logging
PASS: Verify shell check static analysis
PASS: Verify handling over /var/run/.node_reset flag file detection

Updated delayed failsafe sysrq function

PASS: Verify systemd-run command arguments
PASS: Verify sysrq reset occurs over unlock of local or remote system
      node after the specified delay.

General:

PASS: Verify no shutdown delay due to failsafe reboot launch
      over self unlock
PASS: Verify no unexpected shutdown kernel tracebacks
PASS: Verify kernel and console logging
PASS: Verify no coredumps or crashdumps during feature update testing
PASS: Verify mtcClient doesn't stall shutdown over 10 lock/unlocks of
      ... standby controller-1 (AIO DX hw)
PASS: ... worker (vbox)
PASS: ... storage (vbox)

Closes-Bug: 2111280
Change-Id: I86e0191548f8f13f61960a91e4e0bbe83134cca6
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This commit is contained in:
Eric MacDonald
2025-05-19 21:20:36 +00:00
parent 8850c25580
commit 9b2fb85c30
12 changed files with 141 additions and 75 deletions

View File

@@ -170,6 +170,12 @@ void daemon_exit ( void );
#define MTCAGENT_LOG_FILE ((const char *)"/var/log/mtcAgent.log")
#define MTCCLIENT_LOG_FILE ((const char *)"/var/log/mtcClient.log")
/* common binaries */
#define SYSTEMD_RUN "/usr/bin/systemd-run"
/* maintenance scripts */
#define MTC_DELAYED_SYSRQ_REBOOT_SCRIPT "/usr/local/sbin/delayed_sysrq_reboot"
/* supported BMC communication protocols ; access method */
typedef enum
{

View File

@@ -47,6 +47,9 @@ using namespace std;
#endif
#define __AREA__ "com"
/* allow import of this process name */
extern char *program_invocation_short_name;
/***************************************************************************
*
* Name : nodeUtil_latency_log
@@ -1005,87 +1008,61 @@ int double_fork ( void )
/***************************************************************************
*
* Name : fork_sysreq_reboot
* Name : launch_failsafe_reboot
*
* Purpose : Timed SYSREQ Reset service used as a backup mechanism
* to force a self reset after a specified period of time.
* Purpose : Launches a systemd-run-based timed SYSRQ reset service
* used as a backup mechanism to force a self-reset after
* a specified delay period (in seconds) in the event that
* systemd shutdown hangs with no reset.
*
* Description: Uses double-fork to ensure child detachment. Also uses
* dynamic service unit naming based on the process name to
* allow coexistence across multiple invocations from other
* maintenance processes, namely mtcAgent/mtcClient.
*
* Parameter: int delay_in_secs - seconds to wait before sysrq reset
*
**************************************************************************/
/* This is a common utility that forces a sysreq reboot */
void fork_sysreq_reboot ( int delay_in_secs )
void launch_failsafe_reboot ( int delay_in_secs )
{
int parent = 0 ;
/* Fork child to do a sysreq reboot. */
// Double fork in prep to run MTC_DELAYED_SYSRQ_REBOOT_SCRIPT as a
// detached grandchild using systemd-run command. The script does
// the SysRq reset.
if ( 0 > ( parent = double_fork()))
{
elog ("failed to fork fail-safe (backup) sysreq reboot\n");
return ;
}
else if( 0 == parent ) /* we're the child */
else if( 0 == parent ) /* we're the grandchild */
{
int sysrq_handler_fd;
int sysrq_tigger_fd ;
size_t temp ;
char delay_str [MAX_CHARS_IN_INT]; /* for the int to str conversion */
char unit_arg [MAX_FILENAME_LEN]; /* for the dynamic unit name */
setup_child ( false ) ;
// Convert the calling int parameter to a string so it can be passed to execv
snprintf(delay_str, sizeof(delay_str), "%d", delay_in_secs);
dlog ("*** Failsafe Reset Thread ***\n");
// Create a dynamic unit name using the current program name.
// Do this so that if multiple maintenance processes use this
// API they don't nave a unit name collision.
snprintf(unit_arg, sizeof(unit_arg),
"--unit=%s-delayed-failsafe-reboot",
program_invocation_short_name);
/* Commented this out because blocking SIGTERM in systemd environment
* causes any processes that spawn this sysreq will stall shutdown
*
* sigset_t mask , mask_orig ;
* sigemptyset (&mask);
* sigaddset (&mask, SIGTERM );
* sigprocmask (SIG_BLOCK, &mask, &mask_orig );
*
*/
// Enable sysrq handling.
sysrq_handler_fd = open( "/proc/sys/kernel/sysrq", O_RDWR | O_CLOEXEC );
if( 0 > sysrq_handler_fd )
{
elog ( "failed sysrq_handler open\n");
return ;
}
temp = write( sysrq_handler_fd, "1", 1 );
close( sysrq_handler_fd );
for ( int i = delay_in_secs ; i >= 0 ; --i )
{
sleep (1);
{
if ( 0 == (i % 5) )
{
dlog ( "sysrq reset in %d seconds\n", i );
}
}
}
// Trigger sysrq command.
sysrq_tigger_fd = open( "/proc/sysrq-trigger", O_RDWR | O_CLOEXEC );
if( 0 > sysrq_tigger_fd )
{
elog ( "failed sysrq_trigger open\n");
return ;
}
temp = write( sysrq_tigger_fd, "b", 1 );
close( sysrq_tigger_fd );
dlog ( "sysreq rc:%ld\n", temp );
UNUSED(temp);
sleep (10);
// Shouldn't get this far, else there was an error.
exit(-1);
const char *cmd[] = {
SYSTEMD_RUN,
unit_arg,
MTC_DELAYED_SYSRQ_REBOOT_SCRIPT,
delay_str,
NULL
};
execv(cmd[0], (char * const *)cmd);
exit(EXIT_FAILURE);
}
ilog ("Forked Fail-Safe (Backup) Reboot Action\n");
ilog ("failsafe reboot script launched ; reboot in %d seconds ; calling pid:%d",
delay_in_secs, parent);
}
/***************************************************************************

View File

@@ -149,7 +149,7 @@ int load_filenames_in_dir ( const char * directory, std::list<string> & filelis
int double_fork ( void );
int double_fork_host_cmd ( string hostname , char * cmd_string, const char * cmd_oper );
int setup_child ( bool close_file_descriptors );
void fork_sysreq_reboot ( int delay_in_secs );
void launch_failsafe_reboot ( int delay_in_secs );
void fork_graceful_reboot ( int delay_in_secs );
int get_node_health ( string hostname );

View File

@@ -33,6 +33,7 @@ usr/local/bin/mtcClient
usr/local/bin/mtcalarmd
usr/local/bin/mtclogd
usr/local/bin/wipedisk
usr/local/sbin/delayed_sysrq_reboot
usr/sbin/crash-dump-manager
usr/sbin/dmemchk.sh
usr/sbin/fsync

View File

@@ -135,6 +135,9 @@ override_dh_auto_install:
install -m 755 -d $(COLLECTDIR)
install -m 755 -p -D scripts/collect_bmc.sh $(COLLECTDIR)/collect_bmc
# general scripts
install -m 700 -p -D scripts/delayed_sysrq_reboot.sh $(LOCAL_SBINDIR)/delayed_sysrq_reboot
# syslog configuration
install -m 644 -p -D scripts/mtce.syslog $(SYSCONFDIR)/syslog-ng/conf.d/mtce.conf

View File

@@ -1643,7 +1643,7 @@ int nodeLinkClass::lazy_graceful_fs_reboot ( struct nodeLinkClass::node * node_p
{
/* issue a lazy reboot to the mtcClient and as a backup launch a sysreq reset thresd */
send_mtc_cmd ( node_ptr->hostname, MTC_CMD_LAZY_REBOOT, MGMNT_INTERFACE ) ;
fork_sysreq_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
launch_failsafe_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
/* loop until reboot */
for ( ; ; )
@@ -3114,7 +3114,7 @@ int nodeLinkClass::add_host ( node_inv_type & inv )
if ( delay > 0 )
{
mtcTimer_start ( node_ptr->mtcTimer, mtcTimer_handler, delay );
ilog ("Host add delay is %d seconds", delay );
ilog ("%s Host add delay is %d seconds", node_ptr->hostname.c_str(), delay );
node_ptr->addStage = MTC_ADD__START_DELAY ;
}
else

View File

@@ -174,7 +174,7 @@ int hbs_self_recovery ( unsigned int cmd )
/* Forced Self Reset Now */
else if ( cmd == STALL_SYSREQ_CMD )
{
fork_sysreq_reboot ( 60 ) ;
launch_failsafe_reboot ( 60 ) ;
/* parent returns */
return (PASS);

View File

@@ -443,7 +443,7 @@ void hostw_log_and_reboot()
/* start the process that will perform an ungraceful reboot, if
* the graceful reboot fails */
fork_sysreq_reboot ( FORCE_REBOOT_DELAY );
launch_failsafe_reboot ( FORCE_REBOOT_DELAY );
/* start the graceful reboot process */
fork_graceful_reboot ( GRACEFUL_REBOOT_DELAY );

View File

@@ -667,7 +667,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
stop_pmon();
ilog ("Reboot (%s)", iface_name_ptr);
daemon_log ( NODE_RESET_FILE, "reboot command" );
fork_sysreq_reboot ( delay );
launch_failsafe_reboot ( delay );
rc = system("/usr/bin/systemctl reboot");
}
if ( msg.cmd == MTC_CMD_LAZY_REBOOT )
@@ -696,7 +696,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
ilog ("Lazy Reboot (%s) ; now", iface_name_ptr);
}
fork_sysreq_reboot ( delay );
launch_failsafe_reboot ( delay );
rc = system("/usr/bin/systemctl reboot");
}
else if ( msg.cmd == MTC_CMD_RESET )
@@ -709,7 +709,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
stop_pmon();
ilog ("Reset 'reboot -f' (%s)", iface_name_ptr);
daemon_log ( NODE_RESET_FILE, "reset command" );
fork_sysreq_reboot ( delay/2 );
launch_failsafe_reboot ( delay/2 );
rc = system("/usr/bin/systemctl reboot --force");
}
else if ( msg.cmd == MTC_CMD_WIPEDISK )
@@ -725,7 +725,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
* If something goes wrong we should reboot anyway
*/
stop_pmon();
fork_sysreq_reboot ( delay/2 );
launch_failsafe_reboot ( delay/2 );
/* We fork the wipedisk command as it may take upwards of 30s
* If we hold this thread for that long pmon will kill mtcClient

View File

@@ -1460,7 +1460,7 @@ void daemon_service_run ( void )
if ( daemon_is_file_present ( NODE_RESET_FILE ) )
{
wlog ("mtce reboot required");
fork_sysreq_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
launch_failsafe_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
for ( ; ; )
{
wlog ("issuing reboot");

View File

@@ -4664,7 +4664,7 @@ int nodeLinkClass::reboot_handler ( struct nodeLinkClass::node * node_ptr )
node_ptr->resetProgStage = MTC_RESETPROG__WAIT ;
/* Launch a backup sysreq thread */
fork_sysreq_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
launch_failsafe_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
/* Tell SM we are unhealthy so that it shuts down all its services */
daemon_log ( SMGMT_UNHEALTHY_FILE, "Active Controller Reboot request" );

View File

@@ -0,0 +1,79 @@
#!/bin/bash
##############################################################################
#
# Copyright (c) 2025 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
##############################################################################
#
# Name : delayed_sysrq_reboot.sh
#
# Purpose : Backup reboot mechanism that triggers a SYSRQ forced reset
# after a specified delay.
#
# Usage : Used as a backup to force a reset if systemd shutdown stalls for
# too long or fails to reboot.
#
# This script is typically launched by the mtcAgent and/or mtcClient
# via `systemd-run` as an isolated transient service.
#
# Arguement: Accepts a single argument that specifies the delay before SYSRQ
#
# Usage : delayed_sysrq_reboot.sh <delay_seconds>
#
##############################################################################
LOGGER_TAG=$(basename "$0")
# log to both console and syslog
function ilog {
echo "$@"
logger -t "${LOGGER_TAG}" "$@"
}
# Check if an argument is provided
if [ $# -ne 1 ]; then
ilog "Usage: $0 <seconds_to_delay>"
exit 1
fi
DELAY="$1"
# Ensure it's a non-negative integer between 1 and 86400 (24h)
if ! [[ "$DELAY" =~ ^[0-9]+$ ]] || [ "$DELAY" -le 0 ] || [ "$DELAY" -gt 300 ]; then
ilog "Error: delay must be a positive integer between 1 and 300 seconds"
exit 1
fi
# Check if script is run as root (required for /proc/sysrq-trigger)
if [ "$EUID" -ne 0 ]; then
ilog "Error: script must be run as root"
exit 1
fi
# Check for sysrq file
if [ ! -w "/proc/sysrq-trigger" ]; then
ilog "Error: /proc/sysrq-trigger is not writable"
exit 1
fi
ilog "Delaying for $DELAY seconds before issuing SysRq reboot ..."
sleep "$DELAY"
# ensure sysrq is enabled (bitmask 1 = reboot allowed)
if [ -f "/proc/sys/kernel/sysrq" ]; then
SYSRQ_STATE=$(cat /proc/sys/kernel/sysrq)
if [ "$SYSRQ_STATE" -eq 0 ]; then
ilog "SysRq is disabled; enabling"
echo 1 > /proc/sys/kernel/sysrq
fi
fi
ilog "Triggering forced reboot via /proc/sysrq-trigger"
echo b > /proc/sysrq-trigger
# Should not get here unless reboot fails
ilog "Warning: reboot trigger failed or was blocked"
exit 1