1
一、问题回顾
openstack 集群所有机器再经过一次异常断电后,存储在 ceph 集群中虚拟机起不来,ceph集群为双副本。
在断电之前关机的虚拟机是正常的,新建的虚拟机也正常。只有在断电时候处于运行的虚拟机是起不来的。
ceph集群状态:
[root@controller ~]# ceph -s
cluster:
id: 6abf44d1-8ad2-4155-88db-8df0e79d576b
health: HEALTH_OK
services:
mon: 2 daemons, quorum controller,compute (age 18m)
mgr: controller(active, since 17m), standbys: compute
osd: 12 osds: 12 up (since 17m), 12 in (since 3h)
data:
pools: 3 pools, 768 pgs
objects: 3.17k objects, 12 GiB
usage: 36 GiB used, 87 TiB / 87 TiB avail
pgs: 768 active+clean
二、问题解决
需要修复磁盘文件系统,因为 centos 和 ubuntu 文件系统不一样,所以修复的方式也不相同。
修复之前,先确认 ceph 集群是健康的,然后关闭需要修复的虚拟机。
1、centos 虚拟机
先将虚拟机的块设备映射到出来。块设备映射到操作系统的命令是:rbd map {image-name} --pool {pool-name}
# 查看
[root@controller ~]# openstack server list
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820 | test2 | SHUTOFF | int-net=10.0.0.33 | ubuntu-20.04 | 2核4G |
| b8ee1532-cf05-41e5-93cc-3ea3de0c96c9 | test1 | SHUTOFF | int-net=10.0.0.114 | centos-7.6.1810 | 2核4G |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
[root@controller ~]# rbd ls vms
6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk
b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk
# 禁用当前系统内核不支持的feature
[root@controller ~]# rbd feature disable b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk exclusive-lock, object-map, fast-diff, deep-flatten --pool vms
# 映射RBD MAP
[root@controller ~]# rbd map b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk --pool vms
/dev/rbd0
# 查看映射列表
[root@controller ~]# rbd showmapped
id pool namespace image snap device
0 vms b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk - /dev/rbd0
[root@controller ~]# lsblk
......
rbd0 252:0 0 200G 0 disk
└─rbd0p1 252:1 0 200G 0 part
# xfs文件系统修复
[root@controller ~]# xfs_repair -L /dev/rbd0p1
# 修复完取消映射
[root@controller ~]# rbd unmap b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk --pool vms
修复完成,再开启虚拟机,正常。
2、ubuntu 虚拟机
同样,先将虚拟机的块设备映射到出来。
# 查看
[root@controller ~]# openstack server list
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820 | test2 | SHUTOFF | int-net=10.0.0.33 | ubuntu-20.04 | 2核4G |
| b8ee1532-cf05-41e5-93cc-3ea3de0c96c9 | test1 | SHUTOFF | int-net=10.0.0.114 | centos-7.6.1810 | 2核4G |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
[root@controller ~]# rbd ls vms
6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk
b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk
# 禁用当前系统内核不支持的feature
[root@controller ~]# rbd feature disable 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk exclusive-lock, object-map, fast-diff, deep-flatten --pool vms
# 映射RBD MAP
[root@controller ~]# rbd map 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk --pool vms
/dev/rbd0
# 查看映射列表
[root@controller ~]# rbd showmapped
id pool namespace image snap device
0 vms b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk - /dev/rbd0
[root@controller ~]# lsblk
......
rbd0 252:0 0 200G 0 disk
├─rbd0p14 252:14 0 4M 0 part
├─rbd0p15 252:15 0 106M 0 part
└─rbd0p1 252:1 0 199.9G 0 part
# ext4文件系统修复(系统自带版本太老,需要新版版,在修复前先编译新版e2fsck)
[root@controller ~]# cd e2fsprogs-1.46.2/e2fsck/
[root@controller e2fsck]# ./e2fsck -a /dev/rbd0p1 # -fcv
# 修复完取消映射
[root@controller ~]# rbd unmap b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk --pool vms
编译新版e2fsck,e2fsck最新版下载地址: https://sourceforge.net/projects/e2fsprogs/
[root@controller ~]# wget https://nchc.dl.sourceforge.net/project/e2fsprogs/e2fsprogs/v1.46.2/e2fsprogs-1.46.2.tar.gz
[root@controller ~]# tar zxvf e2fsprogs-1.46.2.tar.gz
[root@controller ~]# cd e2fsprogs-1.46.2
[root@controller e2fsprogs-1.46.2]# ./configure
[root@controller e2fsprogs-1.46.2]# make -j 30
[root@controller e2fsprogs-1.46.2]# cd e2fsck/
[root@controller e2fsck]# ./e2fsck
Usage: ./e2fsck [-panyrcdfktvDFV] [-b superblock] [-B blocksize]
[-l|-L bad_blocks_file] [-C fd] [-j external_journal]
[-E extended-options] [-z undo_file] device
Emergency help:
-p Automatic repair (no questions)
-n Make no changes to the filesystem
-y Assume "yes" to all questions
-c Check for bad blocks and add them to the badblock list
-f Force checking even if filesystem is marked clean
-v Be verbose
-b superblock Use alternative superblock
-B blocksize Force blocksize when looking for superblock
-j external_journal Set location of the external journal
-l bad_blocks_file Add to badblocks list
-L bad_blocks_file Set badblocks list
-z undo_file Create an undo file
3、windows虚拟机
RBD映射出来,然后使用 ntfsfix 工具修复即可。
[root@controller ~]# yum install ntfs-3g ntfsprogs
[root@controller ~]# ntfsfix /dev/rbd0p1
参考文章:
https://azhegit.gitee.io/2019/08/16-openstack停电故障修复/
https://blog.frognew.com/2017/02/ceph-rbd.html
1