Run maintenance failsafe reboot using systemd-run
The current method used by Maintenance to launch its failsafe reboot function causes systemd shutdown to stall due to recent changes in cgroup containment behavior. Specifically, mtcAgent runs within the 'sm.service' cgroup, while mtcClient is now launched in its own transient systemd-run cgroup. When a grandchild process is created using a double-fork and execv, it inherits the parent's cgroup unless explicitly detached. As a result, the long-lived reboot sleeper remains part of the parent's cgroup. During shutdown, systemd waits for all processes in a service's cgroup to exit before completing the stop operation. In the case of a failsafe reboot, the grandchild sleeps for a period before issuing a SysRq-triggered reboot. This delay often exceeds systemd’s default 90-second shutdown timeout, causing unnecessary delays during node reboot. Even though the system could otherwise shut down cleanly in less time. This update resolves the issue by switching the failsafe reboot logic to use systemd-run. This ensures the reboot script runs in its own isolated transient unit and cgroup, fully detached from the parent service. A new `delayed_sysrq_reboot.sh` script is introduced to implement the reboot logic. With this change, failsafe reboots now work as expected without stalling systemd shutdown, whether triggered from mtcAgent or mtcClient. Test Plan: PASS: Verify build, install and unlock/lock/unlock of each node in PASS: ... AIO SX (hw) and AIO DX (hw) PASS: ... AIO DX with SX subcloud (dc-libvirt) PASS: ... Standard 2+1+1 storage (vbox) Unit Testing new delayed_sysrq_reboot.sh testing PASS: Verify reset after specified delay (success path) PASS: Verify kernel sysrq auto enable feature (recovery path) PASS: Verify sysrq reset is rejected when (failure path) PASS: ... delay value is not specified PASS: ... delay is out of range PASS: ... called as non-root user PASS: ... too few or many arguments PASS: ... the /proc/sysrq-trigger file is not present (fit) PASS: Verify file is owned as root:root and has root only permissions PASS: Verify behavior if /proc/sysrq-trigger does not cause a reset PASS: Verify execution logging PASS: Verify shell check static analysis PASS: Verify handling over /var/run/.node_reset flag file detection Updated delayed failsafe sysrq function PASS: Verify systemd-run command arguments PASS: Verify sysrq reset occurs over unlock of local or remote system node after the specified delay. General: PASS: Verify no shutdown delay due to failsafe reboot launch over self unlock PASS: Verify no unexpected shutdown kernel tracebacks PASS: Verify kernel and console logging PASS: Verify no coredumps or crashdumps during feature update testing PASS: Verify mtcClient doesn't stall shutdown over 10 lock/unlocks of ... standby controller-1 (AIO DX hw) PASS: ... worker (vbox) PASS: ... storage (vbox) Closes-Bug: 2111280 Change-Id: I86e0191548f8f13f61960a91e4e0bbe83134cca6 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This commit is contained in:
@@ -170,6 +170,12 @@ void daemon_exit ( void );
|
||||
#define MTCAGENT_LOG_FILE ((const char *)"/var/log/mtcAgent.log")
|
||||
#define MTCCLIENT_LOG_FILE ((const char *)"/var/log/mtcClient.log")
|
||||
|
||||
/* common binaries */
|
||||
#define SYSTEMD_RUN "/usr/bin/systemd-run"
|
||||
|
||||
/* maintenance scripts */
|
||||
#define MTC_DELAYED_SYSRQ_REBOOT_SCRIPT "/usr/local/sbin/delayed_sysrq_reboot"
|
||||
|
||||
/* supported BMC communication protocols ; access method */
|
||||
typedef enum
|
||||
{
|
||||
|
@@ -47,6 +47,9 @@ using namespace std;
|
||||
#endif
|
||||
#define __AREA__ "com"
|
||||
|
||||
/* allow import of this process name */
|
||||
extern char *program_invocation_short_name;
|
||||
|
||||
/***************************************************************************
|
||||
*
|
||||
* Name : nodeUtil_latency_log
|
||||
@@ -1005,87 +1008,61 @@ int double_fork ( void )
|
||||
|
||||
/***************************************************************************
|
||||
*
|
||||
* Name : fork_sysreq_reboot
|
||||
* Name : launch_failsafe_reboot
|
||||
*
|
||||
* Purpose : Timed SYSREQ Reset service used as a backup mechanism
|
||||
* to force a self reset after a specified period of time.
|
||||
* Purpose : Launches a systemd-run-based timed SYSRQ reset service
|
||||
* used as a backup mechanism to force a self-reset after
|
||||
* a specified delay period (in seconds) in the event that
|
||||
* systemd shutdown hangs with no reset.
|
||||
*
|
||||
* Description: Uses double-fork to ensure child detachment. Also uses
|
||||
* dynamic service unit naming based on the process name to
|
||||
* allow coexistence across multiple invocations from other
|
||||
* maintenance processes, namely mtcAgent/mtcClient.
|
||||
*
|
||||
* Parameter: int delay_in_secs - seconds to wait before sysrq reset
|
||||
*
|
||||
**************************************************************************/
|
||||
|
||||
/* This is a common utility that forces a sysreq reboot */
|
||||
void fork_sysreq_reboot ( int delay_in_secs )
|
||||
void launch_failsafe_reboot ( int delay_in_secs )
|
||||
{
|
||||
int parent = 0 ;
|
||||
|
||||
/* Fork child to do a sysreq reboot. */
|
||||
// Double fork in prep to run MTC_DELAYED_SYSRQ_REBOOT_SCRIPT as a
|
||||
// detached grandchild using systemd-run command. The script does
|
||||
// the SysRq reset.
|
||||
if ( 0 > ( parent = double_fork()))
|
||||
{
|
||||
elog ("failed to fork fail-safe (backup) sysreq reboot\n");
|
||||
return ;
|
||||
}
|
||||
else if( 0 == parent ) /* we're the child */
|
||||
else if( 0 == parent ) /* we're the grandchild */
|
||||
{
|
||||
int sysrq_handler_fd;
|
||||
int sysrq_tigger_fd ;
|
||||
size_t temp ;
|
||||
char delay_str [MAX_CHARS_IN_INT]; /* for the int to str conversion */
|
||||
char unit_arg [MAX_FILENAME_LEN]; /* for the dynamic unit name */
|
||||
|
||||
setup_child ( false ) ;
|
||||
// Convert the calling int parameter to a string so it can be passed to execv
|
||||
snprintf(delay_str, sizeof(delay_str), "%d", delay_in_secs);
|
||||
|
||||
dlog ("*** Failsafe Reset Thread ***\n");
|
||||
// Create a dynamic unit name using the current program name.
|
||||
// Do this so that if multiple maintenance processes use this
|
||||
// API they don't nave a unit name collision.
|
||||
snprintf(unit_arg, sizeof(unit_arg),
|
||||
"--unit=%s-delayed-failsafe-reboot",
|
||||
program_invocation_short_name);
|
||||
|
||||
/* Commented this out because blocking SIGTERM in systemd environment
|
||||
* causes any processes that spawn this sysreq will stall shutdown
|
||||
*
|
||||
* sigset_t mask , mask_orig ;
|
||||
* sigemptyset (&mask);
|
||||
* sigaddset (&mask, SIGTERM );
|
||||
* sigprocmask (SIG_BLOCK, &mask, &mask_orig );
|
||||
*
|
||||
*/
|
||||
|
||||
// Enable sysrq handling.
|
||||
sysrq_handler_fd = open( "/proc/sys/kernel/sysrq", O_RDWR | O_CLOEXEC );
|
||||
if( 0 > sysrq_handler_fd )
|
||||
{
|
||||
elog ( "failed sysrq_handler open\n");
|
||||
return ;
|
||||
}
|
||||
|
||||
temp = write( sysrq_handler_fd, "1", 1 );
|
||||
close( sysrq_handler_fd );
|
||||
|
||||
for ( int i = delay_in_secs ; i >= 0 ; --i )
|
||||
{
|
||||
sleep (1);
|
||||
{
|
||||
if ( 0 == (i % 5) )
|
||||
{
|
||||
dlog ( "sysrq reset in %d seconds\n", i );
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Trigger sysrq command.
|
||||
sysrq_tigger_fd = open( "/proc/sysrq-trigger", O_RDWR | O_CLOEXEC );
|
||||
if( 0 > sysrq_tigger_fd )
|
||||
{
|
||||
elog ( "failed sysrq_trigger open\n");
|
||||
return ;
|
||||
}
|
||||
|
||||
temp = write( sysrq_tigger_fd, "b", 1 );
|
||||
close( sysrq_tigger_fd );
|
||||
|
||||
dlog ( "sysreq rc:%ld\n", temp );
|
||||
|
||||
UNUSED(temp);
|
||||
|
||||
sleep (10);
|
||||
|
||||
// Shouldn't get this far, else there was an error.
|
||||
exit(-1);
|
||||
const char *cmd[] = {
|
||||
SYSTEMD_RUN,
|
||||
unit_arg,
|
||||
MTC_DELAYED_SYSRQ_REBOOT_SCRIPT,
|
||||
delay_str,
|
||||
NULL
|
||||
};
|
||||
execv(cmd[0], (char * const *)cmd);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
ilog ("Forked Fail-Safe (Backup) Reboot Action\n");
|
||||
ilog ("failsafe reboot script launched ; reboot in %d seconds ; calling pid:%d",
|
||||
delay_in_secs, parent);
|
||||
}
|
||||
|
||||
/***************************************************************************
|
||||
|
@@ -149,7 +149,7 @@ int load_filenames_in_dir ( const char * directory, std::list<string> & filelis
|
||||
int double_fork ( void );
|
||||
int double_fork_host_cmd ( string hostname , char * cmd_string, const char * cmd_oper );
|
||||
int setup_child ( bool close_file_descriptors );
|
||||
void fork_sysreq_reboot ( int delay_in_secs );
|
||||
void launch_failsafe_reboot ( int delay_in_secs );
|
||||
void fork_graceful_reboot ( int delay_in_secs );
|
||||
|
||||
int get_node_health ( string hostname );
|
||||
|
@@ -33,6 +33,7 @@ usr/local/bin/mtcClient
|
||||
usr/local/bin/mtcalarmd
|
||||
usr/local/bin/mtclogd
|
||||
usr/local/bin/wipedisk
|
||||
usr/local/sbin/delayed_sysrq_reboot
|
||||
usr/sbin/crash-dump-manager
|
||||
usr/sbin/dmemchk.sh
|
||||
usr/sbin/fsync
|
||||
|
@@ -135,6 +135,9 @@ override_dh_auto_install:
|
||||
install -m 755 -d $(COLLECTDIR)
|
||||
install -m 755 -p -D scripts/collect_bmc.sh $(COLLECTDIR)/collect_bmc
|
||||
|
||||
# general scripts
|
||||
install -m 700 -p -D scripts/delayed_sysrq_reboot.sh $(LOCAL_SBINDIR)/delayed_sysrq_reboot
|
||||
|
||||
# syslog configuration
|
||||
install -m 644 -p -D scripts/mtce.syslog $(SYSCONFDIR)/syslog-ng/conf.d/mtce.conf
|
||||
|
||||
|
@@ -1643,7 +1643,7 @@ int nodeLinkClass::lazy_graceful_fs_reboot ( struct nodeLinkClass::node * node_p
|
||||
{
|
||||
/* issue a lazy reboot to the mtcClient and as a backup launch a sysreq reset thresd */
|
||||
send_mtc_cmd ( node_ptr->hostname, MTC_CMD_LAZY_REBOOT, MGMNT_INTERFACE ) ;
|
||||
fork_sysreq_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
|
||||
launch_failsafe_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
|
||||
|
||||
/* loop until reboot */
|
||||
for ( ; ; )
|
||||
@@ -3114,7 +3114,7 @@ int nodeLinkClass::add_host ( node_inv_type & inv )
|
||||
if ( delay > 0 )
|
||||
{
|
||||
mtcTimer_start ( node_ptr->mtcTimer, mtcTimer_handler, delay );
|
||||
ilog ("Host add delay is %d seconds", delay );
|
||||
ilog ("%s Host add delay is %d seconds", node_ptr->hostname.c_str(), delay );
|
||||
node_ptr->addStage = MTC_ADD__START_DELAY ;
|
||||
}
|
||||
else
|
||||
|
@@ -174,7 +174,7 @@ int hbs_self_recovery ( unsigned int cmd )
|
||||
/* Forced Self Reset Now */
|
||||
else if ( cmd == STALL_SYSREQ_CMD )
|
||||
{
|
||||
fork_sysreq_reboot ( 60 ) ;
|
||||
launch_failsafe_reboot ( 60 ) ;
|
||||
|
||||
/* parent returns */
|
||||
return (PASS);
|
||||
|
@@ -443,7 +443,7 @@ void hostw_log_and_reboot()
|
||||
|
||||
/* start the process that will perform an ungraceful reboot, if
|
||||
* the graceful reboot fails */
|
||||
fork_sysreq_reboot ( FORCE_REBOOT_DELAY );
|
||||
launch_failsafe_reboot ( FORCE_REBOOT_DELAY );
|
||||
|
||||
/* start the graceful reboot process */
|
||||
fork_graceful_reboot ( GRACEFUL_REBOOT_DELAY );
|
||||
|
@@ -667,7 +667,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
|
||||
stop_pmon();
|
||||
ilog ("Reboot (%s)", iface_name_ptr);
|
||||
daemon_log ( NODE_RESET_FILE, "reboot command" );
|
||||
fork_sysreq_reboot ( delay );
|
||||
launch_failsafe_reboot ( delay );
|
||||
rc = system("/usr/bin/systemctl reboot");
|
||||
}
|
||||
if ( msg.cmd == MTC_CMD_LAZY_REBOOT )
|
||||
@@ -696,7 +696,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
|
||||
ilog ("Lazy Reboot (%s) ; now", iface_name_ptr);
|
||||
}
|
||||
|
||||
fork_sysreq_reboot ( delay );
|
||||
launch_failsafe_reboot ( delay );
|
||||
rc = system("/usr/bin/systemctl reboot");
|
||||
}
|
||||
else if ( msg.cmd == MTC_CMD_RESET )
|
||||
@@ -709,7 +709,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
|
||||
stop_pmon();
|
||||
ilog ("Reset 'reboot -f' (%s)", iface_name_ptr);
|
||||
daemon_log ( NODE_RESET_FILE, "reset command" );
|
||||
fork_sysreq_reboot ( delay/2 );
|
||||
launch_failsafe_reboot ( delay/2 );
|
||||
rc = system("/usr/bin/systemctl reboot --force");
|
||||
}
|
||||
else if ( msg.cmd == MTC_CMD_WIPEDISK )
|
||||
@@ -725,7 +725,7 @@ int mtc_service_command ( mtc_socket_type * sock_ptr, int interface )
|
||||
* If something goes wrong we should reboot anyway
|
||||
*/
|
||||
stop_pmon();
|
||||
fork_sysreq_reboot ( delay/2 );
|
||||
launch_failsafe_reboot ( delay/2 );
|
||||
|
||||
/* We fork the wipedisk command as it may take upwards of 30s
|
||||
* If we hold this thread for that long pmon will kill mtcClient
|
||||
|
@@ -1460,7 +1460,7 @@ void daemon_service_run ( void )
|
||||
if ( daemon_is_file_present ( NODE_RESET_FILE ) )
|
||||
{
|
||||
wlog ("mtce reboot required");
|
||||
fork_sysreq_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
|
||||
launch_failsafe_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
|
||||
for ( ; ; )
|
||||
{
|
||||
wlog ("issuing reboot");
|
||||
|
@@ -4664,7 +4664,7 @@ int nodeLinkClass::reboot_handler ( struct nodeLinkClass::node * node_ptr )
|
||||
node_ptr->resetProgStage = MTC_RESETPROG__WAIT ;
|
||||
|
||||
/* Launch a backup sysreq thread */
|
||||
fork_sysreq_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
|
||||
launch_failsafe_reboot ( daemon_get_cfg_ptr()->failsafe_shutdown_delay );
|
||||
|
||||
/* Tell SM we are unhealthy so that it shuts down all its services */
|
||||
daemon_log ( SMGMT_UNHEALTHY_FILE, "Active Controller Reboot request" );
|
||||
|
79
mtce/src/scripts/delayed_sysrq_reboot.sh
Executable file
79
mtce/src/scripts/delayed_sysrq_reboot.sh
Executable file
@@ -0,0 +1,79 @@
|
||||
#!/bin/bash
|
||||
##############################################################################
|
||||
#
|
||||
# Copyright (c) 2025 Wind River Systems, Inc.
|
||||
#
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
##############################################################################
|
||||
#
|
||||
# Name : delayed_sysrq_reboot.sh
|
||||
#
|
||||
# Purpose : Backup reboot mechanism that triggers a SYSRQ forced reset
|
||||
# after a specified delay.
|
||||
#
|
||||
# Usage : Used as a backup to force a reset if systemd shutdown stalls for
|
||||
# too long or fails to reboot.
|
||||
#
|
||||
# This script is typically launched by the mtcAgent and/or mtcClient
|
||||
# via `systemd-run` as an isolated transient service.
|
||||
#
|
||||
# Arguement: Accepts a single argument that specifies the delay before SYSRQ
|
||||
#
|
||||
# Usage : delayed_sysrq_reboot.sh <delay_seconds>
|
||||
#
|
||||
##############################################################################
|
||||
|
||||
LOGGER_TAG=$(basename "$0")
|
||||
|
||||
# log to both console and syslog
|
||||
function ilog {
|
||||
echo "$@"
|
||||
logger -t "${LOGGER_TAG}" "$@"
|
||||
}
|
||||
|
||||
# Check if an argument is provided
|
||||
if [ $# -ne 1 ]; then
|
||||
ilog "Usage: $0 <seconds_to_delay>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DELAY="$1"
|
||||
|
||||
# Ensure it's a non-negative integer between 1 and 86400 (24h)
|
||||
if ! [[ "$DELAY" =~ ^[0-9]+$ ]] || [ "$DELAY" -le 0 ] || [ "$DELAY" -gt 300 ]; then
|
||||
ilog "Error: delay must be a positive integer between 1 and 300 seconds"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if script is run as root (required for /proc/sysrq-trigger)
|
||||
if [ "$EUID" -ne 0 ]; then
|
||||
ilog "Error: script must be run as root"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check for sysrq file
|
||||
if [ ! -w "/proc/sysrq-trigger" ]; then
|
||||
ilog "Error: /proc/sysrq-trigger is not writable"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
ilog "Delaying for $DELAY seconds before issuing SysRq reboot ..."
|
||||
sleep "$DELAY"
|
||||
|
||||
# ensure sysrq is enabled (bitmask 1 = reboot allowed)
|
||||
if [ -f "/proc/sys/kernel/sysrq" ]; then
|
||||
SYSRQ_STATE=$(cat /proc/sys/kernel/sysrq)
|
||||
if [ "$SYSRQ_STATE" -eq 0 ]; then
|
||||
ilog "SysRq is disabled; enabling"
|
||||
echo 1 > /proc/sys/kernel/sysrq
|
||||
fi
|
||||
fi
|
||||
|
||||
ilog "Triggering forced reboot via /proc/sysrq-trigger"
|
||||
echo b > /proc/sysrq-trigger
|
||||
|
||||
# Should not get here unless reboot fails
|
||||
ilog "Warning: reboot trigger failed or was blocked"
|
||||
|
||||
exit 1
|
Reference in New Issue
Block a user