问题描述
Ceph -s查看ceph集群状态时发现
[user@ceph-harbor-2 ~]$ ceph -s
2018-06-13 10:45:35.671151 7f4868189700 0 -- :/3203908075 >>192.9.200.83:6789/0 pipe(0x7f48640645c0 sd=3 :0 s=1 pgs=0 cs=0 l=1c=0x7f486405cc30).fault
cluster da2f1d4a-51fb-437c-b097-3f563f473cb8
health HEALTH_ERR
8 pgs are stuck inactive for more than 300 seconds
8 pgs peering
8 pgs stuck inactive
8 pgs stuck unclean
1 requests are blocked > 32 sec
解决策略
requests are blocked > 32 sec 有可能是在数据迁移过程中, 用户正在对该数据块进行访问, 但访问还没有完成, 数据就迁移到别的 OSD 中, 那么就会导致有请求被 block, 对用户也是有影响的
寻找 block 的请求
[user@ceph-harbor-2 ~]$ ceph health detail
1 ops are blocked > 524.288 sec on osd.5
1 osds have slow requests
找到问题osd.5是问题所在,重启osd.5得以解决