Run maintenance failsafe reboot using systemd-run

The current method used by Maintenance to launch its failsafe reboot
function causes systemd shutdown to stall due to recent changes in
cgroup containment behavior. Specifically, mtcAgent runs within the
'sm.service' cgroup, while mtcClient is now launched in its own
transient systemd-run cgroup.

When a grandchild process is created using a double-fork and execv,
it inherits the parent's cgroup unless explicitly detached. As a
result, the long-lived reboot sleeper remains part of the parent's
cgroup. During shutdown, systemd waits for all processes in a
service's cgroup to exit before completing the stop operation.

In the case of a failsafe reboot, the grandchild sleeps for a period
before issuing a SysRq-triggered reboot. This delay often exceeds
systemd’s default 90-second shutdown timeout, causing unnecessary
delays during node reboot. Even though the system could otherwise
shut down cleanly in less time.

This update resolves the issue by switching the failsafe reboot logic
to use systemd-run. This ensures the reboot script runs in its own
isolated transient unit and cgroup, fully detached from the parent
service.

A new `delayed_sysrq_reboot.sh` script is introduced to implement
the reboot logic. With this change, failsafe reboots now work as
expected without stalling systemd shutdown, whether triggered from
mtcAgent or mtcClient.

Test Plan:

PASS: Verify build, install and unlock/lock/unlock of each node in
PASS: ... AIO SX (hw) and AIO DX (hw)
PASS: ... AIO DX with SX subcloud (dc-libvirt)
PASS: ... Standard 2+1+1 storage (vbox)

Unit Testing new delayed_sysrq_reboot.sh testing

PASS: Verify reset after specified delay (success path)
PASS: Verify kernel sysrq auto enable feature (recovery path)
PASS: Verify sysrq reset is rejected when (failure path)
PASS: ... delay value is not specified
PASS: ... delay is out of range
PASS: ... called as non-root user
PASS: ... too few or many arguments
PASS: ... the /proc/sysrq-trigger file is not present (fit)

PASS: Verify file is owned as root:root and has root only permissions
PASS: Verify behavior if /proc/sysrq-trigger does not cause a reset
PASS: Verify execution logging
PASS: Verify shell check static analysis
PASS: Verify handling over /var/run/.node_reset flag file detection

Updated delayed failsafe sysrq function

PASS: Verify systemd-run command arguments
PASS: Verify sysrq reset occurs over unlock of local or remote system
      node after the specified delay.

General:

PASS: Verify no shutdown delay due to failsafe reboot launch
      over self unlock
PASS: Verify no unexpected shutdown kernel tracebacks
PASS: Verify kernel and console logging
PASS: Verify no coredumps or crashdumps during feature update testing
PASS: Verify mtcClient doesn't stall shutdown over 10 lock/unlocks of
      ... standby controller-1 (AIO DX hw)
PASS: ... worker (vbox)
PASS: ... storage (vbox)

Closes-Bug: 2111280
Change-Id: I86e0191548f8f13f61960a91e4e0bbe83134cca6
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This commit is contained in:
Eric MacDonald
2025-05-19 21:20:36 +00:00
parent 8850c25580
commit 9b2fb85c30
12 changed files with 141 additions and 75 deletions

View File

@@ -170,6 +170,12 @@ void daemon_exit ( void );
#define MTCAGENT_LOG_FILE ((const char *)"/var/log/mtcAgent.log") #define MTCAGENT_LOG_FILE ((const char *)"/var/log/mtcAgent.log")
#define MTCCLIENT_LOG_FILE ((const char *)"/var/log/mtcClient.log") #define MTCCLIENT_LOG_FILE ((const char *)"/var/log/mtcClient.log")
/* common binaries */
#define SYSTEMD_RUN "/usr/bin/systemd-run"
/* maintenance scripts */
#define MTC_DELAYED_SYSRQ_REBOOT_SCRIPT "/usr/local/sbin/delayed_sysrq_reboot"
/* supported BMC communication protocols ; access method */ /* supported BMC communication protocols ; access method */
typedef enum typedef enum
{ {

View File

@@ -47,6 +47,9 @@ using namespace std;
#endif #endif
#define __AREA__ "com" #define __AREA__ "com"
/* allow import of this process name */
extern char *program_invocation_short_name;
/*************************************************************************** /***************************************************************************
* *
* Name : nodeUtil_latency_log * Name : nodeUtil_latency_log
@@ -1005,87 +1008,61 @@ int double_fork ( void )
/*************************************************************************** /***************************************************************************
* *
* Name : fork_sysreq_reboot * Name : launch_failsafe_reboot
* *
* Purpose : Timed SYSREQ Reset service used as a backup mechanism * Purpose : Launches a systemd-run-based timed SYSRQ reset service
* to force a self reset after a specified period of time. * used as a backup mechanism to force a self-reset after
* a specified delay period (in seconds) in the event that
* systemd shutdown hangs with no reset.
*
* Description: Uses double-fork to ensure child detachment. Also uses
* dynamic service unit naming based on the process name to
* allow coexistence across multiple invocations from other
* maintenance processes, namely mtcAgent/mtcClient.
*
* Parameter: int delay_in_secs - seconds to wait before sysrq reset
* *
**************************************************************************/ **************************************************************************/
/* This is a common utility that forces a sysreq reboot */ void launch_failsafe_reboot ( int delay_in_secs )
void fork_sysreq_reboot ( int delay_in_secs )
{ {
int parent = 0 ; int parent = 0 ;
/* Fork child to do a sysreq reboot. */ // Double fork in prep to run MTC_DELAYED_SYSRQ_REBOOT_SCRIPT as a
// detached grandchild using systemd-run command. The script does
// the SysRq reset.
if ( 0 > ( parent = double_fork())) if ( 0 > ( parent = double_fork()))
{ {
elog ("failed to fork fail-safe (backup) sysreq reboot\n"); elog ("failed to fork fail-safe (backup) sysreq reboot\n");
return ; return ;
} }
else if( 0 == parent ) /* we're the child */ else if( 0 == parent ) /* we're the grandchild */
{ {
int sysrq_handler_fd; char delay_str [MAX_CHARS_IN_INT]; /* for the int to str conversion */
int sysrq_tigger_fd ; char unit_arg [MAX_FILENAME_LEN]; /* for the dynamic unit name */
size_t temp ;
setup_child ( false ) ; // Convert the calling int parameter to a string so it can be passed to execv
snprintf(delay_str, sizeof(delay_str), "%d", delay_in_secs);
dlog ("*** Failsafe Reset Thread ***\n"); // Create a dynamic unit name using the current program name.
// Do this so that if multiple maintenance processes use this
// API they don't nave a unit name collision.
snprintf(unit_arg, sizeof(unit_arg),
"--unit=%s-delayed-failsafe-reboot",
program_invocation_short_name);
/* Commented this out because blocking SIGTERM in systemd environment const char *cmd[] = {
* causes any processes that spawn this sysreq will stall shutdown SYSTEMD_RUN,
* unit_arg,
* sigset_t mask , mask_orig ; MTC_DELAYED_SYSRQ_REBOOT_SCRIPT,
* sigemptyset (&mask); delay_str,
* sigaddset (&mask, SIGTERM ); NULL
* sigprocmask (SIG_BLOCK, &mask, &mask_orig ); };
* execv(cmd[0], (char * const *)cmd);
*/ exit(EXIT_FAILURE);
// Enable sysrq handling.
sysrq_handler_fd = open( "/proc/sys/kernel/sysrq", O_RDWR | O_CLOEXEC );
if( 0 > sysrq_handler_fd )
{
elog ( "failed sysrq_handler open\n");
return ;
} }
ilog ("failsafe reboot script launched ; reboot in %d seconds ; calling pid:%d",
temp = write( sysrq_handler_fd, "1", 1 ); delay_in_secs, parent);
close( sysrq_handler_fd );
for ( int i = delay_in_secs ; i >= 0 ; --i )
{
sleep (1);
{
if ( 0 == (i % 5) )
{
dlog ( "sysrq reset in %d seconds\n", i );
}
}
}
// Trigger sysrq command.
sysrq_tigger_fd = open( "/proc/sysrq-trigger", O_RDWR | O_CLOEXEC );
if( 0 > sysrq_tigger_fd )
{
elog ( "failed sysrq_trigger open\n");
return ;
}
temp = write( sysrq_tigger_fd, "b", 1 );
close( sysrq_tigger_fd );
dlog ( "sysreq rc:%ld\n", temp );
UNUSED(temp);
sleep (10);
// Shouldn't get this far, else there was an error.
exit(-1);
}
ilog ("Forked Fail-Safe (Backup) Reboot Action\n");
} }
/*************************************************************************** /***************************************************************************

View File

@@ -149,7 +149,7 @@ int load_filenames_in_dir ( const char * directory, std::list<string> & filelis
int double_fork ( void ); int double_fork ( void );
int double_fork_host_cmd ( string hostname , char * cmd_string, const char * cmd_oper ); int double_fork_host_cmd ( string hostname , char * cmd_string, const char * cmd_oper );
int setup_child ( bool close_file_descriptors ); int setup_child ( bool close_file_descriptors );
void fork_sysreq_reboot ( int delay_in_secs ); void launch_failsafe_reboot ( int delay_in_secs );
void fork_graceful_reboot ( int delay_in_secs ); void fork_graceful_reboot ( int delay_in_secs );
int get_node_health ( string hostname ); int get_node_health ( string hostname );

View File

@@ -33,6 +33,7 @@ usr/local/bin/mtcClient
usr/local/bin/mtcalarmd usr/local/bin/mtcalarmd
usr/local/bin/mtclogd usr/local/bin/mtclogd
usr/local/bin/wipedisk usr/local/bin/wipedisk
usr/local/sbin/delayed_sysrq_reboot
usr/sbin/crash-dump-manager usr/sbin/crash-dump-manager
usr/sbin/dmemchk.sh usr/sbin/dmemchk.sh
usr/sbin/fsync usr/sbin/fsync

View File

@@ -135,6 +135,9 @@ override_dh_auto_install:
install -m 755 -d $(COLLECTDIR) install -m 755 -d $(COLLECTDIR)
install -m 755 -p -D scripts/collect_bmc.sh $(COLLECTDIR)/collect_bmc install -m 755 -p -D scripts/collect_bmc.sh $(COLLECTDIR)/collect_bmc
# general scripts
install -m 700 -p -D scripts/delayed_sysrq_reboot.sh $(LOCAL_SBINDIR)/delayed_sysrq_reboot
# syslog configuration # syslog configuration
install -m 644 -p -D scripts/mtce.syslog $(SYSCONFDIR)/syslog-ng/conf.d/mtce.conf install -m 644 -p -D scripts/mtce.syslog $(SYSCONFDIR)/syslog-ng/conf.d/mtce.conf

View File

@@ -1643,7 +1643,7 @@ int nodeLinkClass::lazy_graceful_fs_reboot ( struct nodeLinkClass::node * node_p
{ {
/* issue a lazy reboot to the mtcClient and as a backup launch a sysreq reset thresd */ /* issue a lazy reboot to the mtcClient and as a backup launch a sysreq reset thresd */
send_mtc_cmd ( node_ptr->hostname, MTC_CMD_LAZY_REBOOT, MGMNT_INTERFACE ) ; send_mtc_cmd ( node_ptr->hostname, MTC_CMD_LAZY_REBOOT, MGMNT_INTERFACE ) ;
fork_sysreq_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay ); launch_failsafe_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
/* loop until reboot */ /* loop until reboot */
for ( ; ; ) for ( ; ; )
@@ -3114,7 +3114,7 @@ int nodeLinkClass::add_host ( node_inv_type & inv )
if ( delay > 0 ) if ( delay > 0 )
{ {
mtcTimer_start ( node_ptr->mtcTimer, mtcTimer_handler, delay ); mtcTimer_start ( node_ptr->mtcTimer, mtcTimer_handler, delay );
ilog ("Host add delay is %d seconds", delay ); ilog ("%s Host add delay is %d seconds", node_ptr->hostname.c_str(), delay );
node_ptr->addStage = MTC_ADD__START_DELAY ; node_ptr->addStage = MTC_ADD__START_DELAY ;
} }
else else

View File

@@ -174,7 +174,7 @@ int hbs_self_recovery ( unsigned int cmd )
/* Forced Self Reset Now */ /* Forced Self Reset Now */
else if ( cmd == STALL_SYSREQ_CMD ) else if ( cmd == STALL_SYSREQ_CMD )
{ {
fork_sysreq_reboot ( 60 ) ; launch_failsafe_reboot ( 60 ) ;
/* parent returns */ /* parent returns */
return (PASS); return (PASS);

View File

@@ -443,7 +443,7 @@ void hostw_log_and_reboot()
/* start the process that will perform an ungraceful reboot, if /* start the process that will perform an ungraceful reboot, if
* the graceful reboot fails */ * the graceful reboot fails */
fork_sysreq_reboot ( FORCE_REBOOT_DELAY ); launch_failsafe_reboot ( FORCE_REBOOT_DELAY );
/* start the graceful reboot process */ /* start the graceful reboot process */
fork_graceful_reboot ( GRACEFUL_REBOOT_DELAY ); fork_graceful_reboot ( GRACEFUL_REBOOT_DELAY );

View File

@@ -667,7 +667,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
stop_pmon(); stop_pmon();
ilog ("Reboot (%s)", iface_name_ptr); ilog ("Reboot (%s)", iface_name_ptr);
daemon_log ( NODE_RESET_FILE, "reboot command" ); daemon_log ( NODE_RESET_FILE, "reboot command" );
fork_sysreq_reboot ( delay ); launch_failsafe_reboot ( delay );
rc = system("/usr/bin/systemctl reboot"); rc = system("/usr/bin/systemctl reboot");
} }
if ( msg.cmd == MTC_CMD_LAZY_REBOOT ) if ( msg.cmd == MTC_CMD_LAZY_REBOOT )
@@ -696,7 +696,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
ilog ("Lazy Reboot (%s) ; now", iface_name_ptr); ilog ("Lazy Reboot (%s) ; now", iface_name_ptr);
} }
fork_sysreq_reboot ( delay ); launch_failsafe_reboot ( delay );
rc = system("/usr/bin/systemctl reboot"); rc = system("/usr/bin/systemctl reboot");
} }
else if ( msg.cmd == MTC_CMD_RESET ) else if ( msg.cmd == MTC_CMD_RESET )
@@ -709,7 +709,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
stop_pmon(); stop_pmon();
ilog ("Reset 'reboot -f' (%s)", iface_name_ptr); ilog ("Reset 'reboot -f' (%s)", iface_name_ptr);
daemon_log ( NODE_RESET_FILE, "reset command" ); daemon_log ( NODE_RESET_FILE, "reset command" );
fork_sysreq_reboot ( delay/2 ); launch_failsafe_reboot ( delay/2 );
rc = system("/usr/bin/systemctl reboot --force"); rc = system("/usr/bin/systemctl reboot --force");
} }
else if ( msg.cmd == MTC_CMD_WIPEDISK ) else if ( msg.cmd == MTC_CMD_WIPEDISK )
@@ -725,7 +725,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
* If something goes wrong we should reboot anyway * If something goes wrong we should reboot anyway
*/ */
stop_pmon(); stop_pmon();
fork_sysreq_reboot ( delay/2 ); launch_failsafe_reboot ( delay/2 );
/* We fork the wipedisk command as it may take upwards of 30s /* We fork the wipedisk command as it may take upwards of 30s
* If we hold this thread for that long pmon will kill mtcClient * If we hold this thread for that long pmon will kill mtcClient

View File

@@ -1460,7 +1460,7 @@ void daemon_service_run ( void )
if ( daemon_is_file_present ( NODE_RESET_FILE ) ) if ( daemon_is_file_present ( NODE_RESET_FILE ) )
{ {
wlog ("mtce reboot required"); wlog ("mtce reboot required");
fork_sysreq_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay ); launch_failsafe_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
for ( ; ; ) for ( ; ; )
{ {
wlog ("issuing reboot"); wlog ("issuing reboot");

View File

@@ -4664,7 +4664,7 @@ int nodeLinkClass::reboot_handler ( struct nodeLinkClass::node * node_ptr )
node_ptr->resetProgStage = MTC_RESETPROG__WAIT ; node_ptr->resetProgStage = MTC_RESETPROG__WAIT ;
/* Launch a backup sysreq thread */ /* Launch a backup sysreq thread */
fork_sysreq_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay ); launch_failsafe_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
/* Tell SM we are unhealthy so that it shuts down all its services */ /* Tell SM we are unhealthy so that it shuts down all its services */
daemon_log ( SMGMT_UNHEALTHY_FILE, "Active Controller Reboot request" ); daemon_log ( SMGMT_UNHEALTHY_FILE, "Active Controller Reboot request" );

View File

@@ -0,0 +1,79 @@
#!/bin/bash
##############################################################################
#
# Copyright (c) 2025 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
##############################################################################
#
# Name : delayed_sysrq_reboot.sh
#
# Purpose : Backup reboot mechanism that triggers a SYSRQ forced reset
# after a specified delay.
#
# Usage : Used as a backup to force a reset if systemd shutdown stalls for
# too long or fails to reboot.
#
# This script is typically launched by the mtcAgent and/or mtcClient
# via `systemd-run` as an isolated transient service.
#
# Arguement: Accepts a single argument that specifies the delay before SYSRQ
#
# Usage : delayed_sysrq_reboot.sh <delay_seconds>
#
##############################################################################
LOGGER_TAG=$(basename "$0")
# log to both console and syslog
function ilog {
echo "$@"
logger -t "${LOGGER_TAG}" "$@"
}
# Check if an argument is provided
if [ $# -ne 1 ]; then
ilog "Usage: $0 <seconds_to_delay>"
exit 1
fi
DELAY="$1"
# Ensure it's a non-negative integer between 1 and 86400 (24h)
if ! [[ "$DELAY" =~ ^[0-9]+$ ]] || [ "$DELAY" -le 0 ] || [ "$DELAY" -gt 300 ]; then
ilog "Error: delay must be a positive integer between 1 and 300 seconds"
exit 1
fi
# Check if script is run as root (required for /proc/sysrq-trigger)
if [ "$EUID" -ne 0 ]; then
ilog "Error: script must be run as root"
exit 1
fi
# Check for sysrq file
if [ ! -w "/proc/sysrq-trigger" ]; then
ilog "Error: /proc/sysrq-trigger is not writable"
exit 1
fi
ilog "Delaying for $DELAY seconds before issuing SysRq reboot ..."
sleep "$DELAY"
# ensure sysrq is enabled (bitmask 1 = reboot allowed)
if [ -f "/proc/sys/kernel/sysrq" ]; then
SYSRQ_STATE=$(cat /proc/sys/kernel/sysrq)
if [ "$SYSRQ_STATE" -eq 0 ]; then
ilog "SysRq is disabled; enabling"
echo 1 > /proc/sys/kernel/sysrq
fi
fi
ilog "Triggering forced reboot via /proc/sysrq-trigger"
echo b > /proc/sysrq-trigger
# Should not get here unless reboot fails
ilog "Warning: reboot trigger failed or was blocked"
exit 1