blk-mq: Fix a race between bt_clear_tag() and bt_get()
authorBart Van Assche <bvanassche@acm.org>
Tue, 9 Dec 2014 15:58:35 +0000 (16:58 +0100)
committerJens Axboe <axboe@fb.com>
Tue, 9 Dec 2014 16:07:16 +0000 (09:07 -0700)
What we need is the following two guarantees:
* Any thread that observes the effect of the test_and_set_bit() by
  __bt_get_word() also observes the preceding addition of 'current'
  to the appropriate wait list. This is guaranteed by the semantics
  of the spin_unlock() operation performed by prepare_and_wait().
  Hence the conversion of test_and_set_bit_lock() into
  test_and_set_bit().
* The wait lists are examined by bt_clear() after the tag bit has
  been cleared. clear_bit_unlock() guarantees that any thread that
  observes that the bit has been cleared also observes the store
  operations preceding clear_bit_unlock(). However,
  clear_bit_unlock() does not prevent that the wait lists are examined
  before that the tag bit is cleared. Hence the addition of a memory
  barrier between clear_bit() and the wait list examination.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: Jens Axboe <axboe@fb.com>
block/blk-mq-tag.c

index 0f5e22a7971fb43b2a08272ff4aa920511eca90e..e47c4c75fd338995af72e23511e1b88ef8bf8bff 100644 (file)
@@ -158,7 +158,7 @@ restart:
                        return -1;
                }
                last_tag = tag + 1;
-       } while (test_and_set_bit_lock(tag, &bm->word));
+       } while (test_and_set_bit(tag, &bm->word));
 
        return tag;
 }
@@ -357,11 +357,10 @@ static void bt_clear_tag(struct blk_mq_bitmap_tags *bt, unsigned int tag)
        struct bt_wait_state *bs;
        int wait_cnt;
 
-       /*
-        * The unlock memory barrier need to order access to req in free
-        * path and clearing tag bit
-        */
-       clear_bit_unlock(TAG_TO_BIT(bt, tag), &bt->map[index].word);
+       clear_bit(TAG_TO_BIT(bt, tag), &bt->map[index].word);
+
+       /* Ensure that the wait list checks occur after clear_bit(). */
+       smp_mb();
 
        bs = bt_wake_ptr(bt);
        if (!bs)