Journal logs may be lost in case of slow shutdown during failover

Description

We can lose logs in the following situation:

Setup: Master A is primary, Master B is secondary. Current log is 0x4-0xfffffff.

1. Loss of network connectivity triggers Master A to lose primacy. Master A begins shutting down its journal.
2. Master B becomes primary
3. Master B writes a journal entry, causing the log to be rotated. This renames 0x4-0xfffffff to 0x4-0x9
4. Master A finishes shutting down and tries to complete the current log. It sees that 0x4-0x9 already exists, so it deletes it with the idea of renaming 0x4-0xfffffff to 0x4-0x9. However, after deleting 0x4-0x9 it realizes that 0x4-0xfffffff is gone, and the 0x4-0x9 log has been lost.

To fix this issue, we should skip trying to complete the current log during shutdown.

Environment

None

Status

Assignee

Andrew Audibert

Reporter

Andrew Audibert

Labels

None

Components

Fix versions

Affects versions

1.7.0

Priority

Blocker
Configure