forked from luck/tmp_suning_uos_patched
7eb71e0351
It turns out it's possible to get __remove_osd() called twice on the same OSD. That doesn't sit well with rb_erase() - depending on the shape of the tree we can get a NULL dereference, a soft lockup or a random crash at some point in the future as we end up touching freed memory. One scenario that I was able to reproduce is as follows: <osd3 is idle, on the osd lru list> <con reset - osd3> con_fault_finish() osd_reset() <osdmap - osd3 down> ceph_osdc_handle_map() <takes map_sem> kick_requests() <takes request_mutex> reset_changed_osds() __reset_osd() __remove_osd() <releases request_mutex> <releases map_sem> <takes map_sem> <takes request_mutex> __kick_osd_requests() __reset_osd() __remove_osd() <-- !!! A case can be made that osd refcounting is imperfect and reworking it would be a proper resolution, but for now Sage and I decided to fix this by adding a safe guard around __remove_osd(). Fixes: http://tracker.ceph.com/issues/8087 Cc: Sage Weil <sage@redhat.com> Cc: stable@vger.kernel.org # 3.9+: |
||
---|---|---|
.. | ||
crush | ||
armor.c | ||
auth_none.c | ||
auth_none.h | ||
auth_x_protocol.h | ||
auth_x.c | ||
auth_x.h | ||
auth.c | ||
buffer.c | ||
ceph_common.c | ||
ceph_fs.c | ||
ceph_hash.c | ||
ceph_strings.c | ||
crypto.c | ||
crypto.h | ||
debugfs.c | ||
Kconfig | ||
Makefile | ||
messenger.c | ||
mon_client.c | ||
msgpool.c | ||
osd_client.c | ||
osdmap.c | ||
pagelist.c | ||
pagevec.c | ||
snapshot.c |