kernel_optimize_test/net/ceph
Ilya Dryomov 930c532869 libceph: apply new_state before new_up_client on incrementals
Currently, osd_weight and osd_state fields are updated in the encoding
order.  This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state.  A non-existent OSD is considered down by the
mapping code

2087    for (i = 0; i < pg->pg_temp.len; i++) {
2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089                    if (ceph_can_shift_osds(pi))
2090                            continue;
2091
2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[  493.566805] rbd: rbd0:   result -6 xferred 400000
[  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901

Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc: libceph: set 'exists' flag for newly up osd
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-07-22 15:17:40 +02:00
..
crush crush: add chooseleaf_stable tunable 2016-02-04 18:25:55 +01:00
armor.c
auth_none.c libceph: make authorizer destruction independent of ceph_auth_client 2016-04-25 20:54:13 +02:00
auth_none.h libceph: make authorizer destruction independent of ceph_auth_client 2016-04-25 20:54:13 +02:00
auth_x_protocol.h
auth_x.c libceph: make authorizer destruction independent of ceph_auth_client 2016-04-25 20:54:13 +02:00
auth_x.h libceph: make authorizer destruction independent of ceph_auth_client 2016-04-25 20:54:13 +02:00
auth.c libceph: make authorizer destruction independent of ceph_auth_client 2016-04-25 20:54:13 +02:00
buffer.c
ceph_common.c ceph: make logical calculation functions return bool 2016-05-26 01:15:39 +02:00
ceph_fs.c
ceph_hash.c
ceph_strings.c libceph, rbd: ceph_osd_linger_request, watch/notify v2 2016-05-26 01:15:02 +02:00
crypto.c libceph: Remove unnecessary ivsize variables 2016-01-27 20:36:25 +08:00
crypto.h
debugfs.c libceph: support for subscribing to "mdsmap.<id>" maps 2016-05-26 01:15:30 +02:00
Kconfig
Makefile
messenger.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
mon_client.c libceph: support for subscribing to "mdsmap.<id>" maps 2016-05-26 01:15:30 +02:00
msgpool.c
osd_client.c libceph: use %s instead of %pE in dout()s 2016-05-30 23:00:23 +02:00
osdmap.c libceph: apply new_state before new_up_client on incrementals 2016-07-22 15:17:40 +02:00
pagelist.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
pagevec.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
snapshot.c