Files
ironic-python-agent/releasenotes/notes/heartbeat-jitter-620bbcba591d2894.yaml
Dmitry Tantsur 2ab8364649 Add a jitter to heartbeat retries
Currently, if heartbeat fails, we reschedule it after 5 seconds.
This is fine for the first retry, but it can cause a thundering herd
problem when a lot of nodes fail to heartbeat at once.

This change adds jitter to the minimum wait of 5 seconds. The jitter is
not applied for forced heartbeats: they still have a minimum wait of
exactly 5 seconds from the last heartbeat.

The code is re-ordered to move the interval calculation to one place.
Bonus: correctly logging the next interval.

The unit tests have been rewritten to test the heartbeat process step by
step and not rely on the exact sequence of the calls.

Closes-Bug: #2038438
Change-Id: I4c4207b15fb3d48b55e340b7b3b54af833f92cb5
2023-12-13 17:34:24 +01:00

8 lines
258 B
YAML

---
fixes:
- |
Adds random jitter to retried heartbeats after Ironic returns an error.
Previously, heartbeats would be retried after 5 seconds, potentially
causing a thundering herd problem if many nodes fail to heartbeat at
the same time.