Files
kernel/kernel-rt
Jiping Ma b541465cc3 kernel-rt: beware of __put_task_struct() calling context
Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
locks. Therefore, it can't be called from an non-preemptible context.

Instead of calling __put_task_struct() directly, we defer it using
call_rcu(). A more natural approach would use a workqueue, but since
in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
the code would become more complex because we would need to put the
work_struct instance in the task_struct and initialize it when we
allocate a new task_struct.

We met 5 same panics, __put_task_struct is called during the process
holding a lock that caused the kernel BUG_ON. The below is the call
trace.

We also need cherry pick the following commits, because the necessary
context is not in 5.10.18x, such as there is not definition
DEFINE_WAIT_OVERRIDE_MAP.

* commit 5f2962401c6e
  ("locking/lockdep: Exclude local_lock_t from IRQ inversions")
* commit 175b1a60e880
  ("locking/lockdep: Clean up check_redundant() a bit")
* commit bc2dd71b2836
  ("locking/lockdep: Add a skip() function to __bfs()")
* commit 0cce06ba859a
  ("debugobjects,locking: Annotate debug_object_fill_pool() wait type
   violation")

kernel BUG at kernel/locking/rtmutex.c:1331!
invalid opcode: 0000 [#1] PREEMPT_RT SMP NOPTI
......
Call Trace:
 rt_spin_lock_slowlock_locked+0xb2/0x2a0
 ? update_load_avg+0x80/0x690
 rt_spin_lock_slowlock+0x50/0x80
 ? update_load_avg+0x80/0x690
 rt_spin_lock+0x2a/0x30
 free_unref_page+0xc5/0x280
 __vunmap+0x17f/0x240
 put_task_stack+0xc6/0x130
 __put_task_struct+0x3d/0x180
 rt_mutex_adjust_prio_chain+0x365/0x7b0
 task_blocks_on_rt_mutex+0x1eb/0x370
 rt_spin_lock_slowlock_locked+0xb2/0x2a0
 rt_spin_lock_slowlock+0x50/0x80
 rt_spin_lock+0x2a/0x30
 free_unref_page_list+0x128/0x5e0
 release_pages+0x2b4/0x320
 tlb_flush_mmu+0x44/0x150
 tlb_finish_mmu+0x3c/0x70
 zap_page_range+0x12a/0x170
 ? find_vma+0x16/0x70
 do_madvise+0x99d/0xba0
 ? do_epoll_wait+0xa2/0xe0
 ? __x64_sys_madvise+0x26/0x30
 __x64_sys_madvise+0x26/0x30
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Verification:
- build-pkgs; build-iso; install and boot up on aio-sx lab.
- Can not reproduce the isue during the stress-ng test for almost 24 hours.
  while true; do sudo stress-ng --sched rr --mmapfork 23 -t 20; done
  while true; do sudo stress-ng --sched fifo--mmapfork 23 -t 20; done

Closes-Bug: 2031597
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
Change-Id: If022441d61492eaec88eede8603a6cb052af99d1
2023-08-17 05:47:43 -04:00
..