dm raid: fix incorrect sync_ratio when degraded
authorJonathan Brassow <jbrassow@redhat.com>
Tue, 27 Feb 2018 20:58:59 +0000 (21:58 +0100)
committerMike Snitzer <snitzer@redhat.com>
Wed, 7 Mar 2018 01:23:57 +0000 (20:23 -0500)
Upstream commit 4102d9de6d375 ("dm raid: fix rs_get_progress()
synchronization state/ratio") in combination with commit 7c29744ecce
("dm raid: simplify rs_get_progress()") introduced a regression by
incorrectly reporting a sync_ratio of 0 for degraded raid sets.  This
caused lvm2 to fail to repair raid legs automatically.

Fix by identifying the degraded state by checking the MD_RECOVERY_INTR
flag and returning mddev->recovery_cp in case it is set.

MD sets recovery = [ MD_RECOVERY_RECOVER MD_RECOVERY_INTR
MD_RECOVERY_NEEDED ] when a RAID member fails.  It then shuts down any
sync thread that is running and leaves us with all MD_RECOVERY_* flags
cleared.  The bug occurs if a status is requested in the short time it
takes to shut down any sync thread and clear the flags, because we were
keying in on the MD_RECOVERY_NEEDED - understanding it to be the initial
phase of a “recover” sync thread.  However, this is an incorrect
interpretation if MD_RECOVERY_INTR is also set.

This also explains why the bug only happened when automatic repair was
enabled and not a normal ‘manual’ method.  It is impossible to react
quick enough to hit the problematic window without it being automated.

Fix passes automatic repair tests.

Fixes: 7c29744ecce ("dm raid: simplify rs_get_progress()")
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
drivers/md/dm-raid.c

index 7ef469e902c620126b95f69d899e528ae114b3bc..c1d1034ff7b75eb740d40920615c0ae10a308c43 100644 (file)
@@ -3408,9 +3408,10 @@ static sector_t rs_get_progress(struct raid_set *rs, unsigned long recovery,
                set_bit(RT_FLAG_RS_IN_SYNC, &rs->runtime_flags);
 
        } else {
-               if (test_bit(MD_RECOVERY_NEEDED, &recovery) ||
-                   test_bit(MD_RECOVERY_RESHAPE, &recovery) ||
-                   test_bit(MD_RECOVERY_RUNNING, &recovery))
+               if (!test_bit(MD_RECOVERY_INTR, &recovery) &&
+                   (test_bit(MD_RECOVERY_NEEDED, &recovery) ||
+                    test_bit(MD_RECOVERY_RESHAPE, &recovery) ||
+                    test_bit(MD_RECOVERY_RUNNING, &recovery)))
                        r = mddev->curr_resync_completed;
                else
                        r = mddev->recovery_cp;