• Post author:
  • Post category:openstack
  • Post comments:0评论
1

一、问题回顾

openstack 集群所有机器再经过一次异常断电后,存储在 ceph 集群中虚拟机起不来,ceph集群为双副本。
在断电之前关机的虚拟机是正常的,新建的虚拟机也正常。只有在断电时候处于运行的虚拟机是起不来的。
ceph集群状态:

[root@controller ~]# ceph -s
  cluster:
    id:     6abf44d1-8ad2-4155-88db-8df0e79d576b
    health: HEALTH_OK

  services:
    mon: 2 daemons, quorum controller,compute (age 18m)
    mgr: controller(active, since 17m), standbys: compute
    osd: 12 osds: 12 up (since 17m), 12 in (since 3h)

  data:
    pools:   3 pools, 768 pgs
    objects: 3.17k objects, 12 GiB
    usage:   36 GiB used, 87 TiB / 87 TiB avail
    pgs:     768 active+clean

centos 虚拟机:

ubuntu 虚拟机:

windows 虚拟机:

二、问题解决

需要修复磁盘文件系统,因为 centos 和 ubuntu 文件系统不一样,所以修复的方式也不相同。
修复之前,先确认 ceph 集群是健康的,然后关闭需要修复的虚拟机。

1、centos 虚拟机

先将虚拟机的块设备映射到出来。块设备映射到操作系统的命令是:rbd map {image-name} --pool {pool-name}

# 查看
[root@controller ~]# openstack server list
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| ID                                   | Name  | Status  | Networks           | Image           | Flavor |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820 | test2 | SHUTOFF | int-net=10.0.0.33  | ubuntu-20.04    | 2核4G  |
| b8ee1532-cf05-41e5-93cc-3ea3de0c96c9 | test1 | SHUTOFF | int-net=10.0.0.114 | centos-7.6.1810 | 2核4G  |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
[root@controller ~]# rbd ls vms
6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk
b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk

# 禁用当前系统内核不支持的feature
[root@controller ~]# rbd feature disable  b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk exclusive-lock, object-map, fast-diff, deep-flatten --pool vms

# 映射RBD MAP
[root@controller ~]# rbd map b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk --pool vms
/dev/rbd0

# 查看映射列表
[root@controller ~]# rbd showmapped
id pool namespace image                                     snap device    
0  vms            b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk -    /dev/rbd0 
[root@controller ~]# lsblk
......
rbd0            252:0    0   200G  0 disk 
└─rbd0p1        252:1    0   200G  0 part 

# xfs文件系统修复
[root@controller ~]# xfs_repair -L /dev/rbd0p1

# 修复完取消映射
[root@controller ~]# rbd unmap b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk --pool vms

修复完成,再开启虚拟机,正常。

2、ubuntu 虚拟机

同样,先将虚拟机的块设备映射到出来。

# 查看
[root@controller ~]# openstack server list
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| ID                                   | Name  | Status  | Networks           | Image           | Flavor |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820 | test2 | SHUTOFF | int-net=10.0.0.33  | ubuntu-20.04    | 2核4G  |
| b8ee1532-cf05-41e5-93cc-3ea3de0c96c9 | test1 | SHUTOFF | int-net=10.0.0.114 | centos-7.6.1810 | 2核4G  |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
[root@controller ~]# rbd ls vms
6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk
b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk

# 禁用当前系统内核不支持的feature
[root@controller ~]# rbd feature disable 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk exclusive-lock, object-map, fast-diff, deep-flatten --pool vms

# 映射RBD MAP
[root@controller ~]# rbd map 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk --pool vms
/dev/rbd0

# 查看映射列表
[root@controller ~]# rbd showmapped
id pool namespace image                                     snap device    
0  vms            b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk -    /dev/rbd0 
[root@controller ~]# lsblk
......
rbd0                 252:0    0   200G  0 disk 
├─rbd0p14            252:14   0     4M  0 part 
├─rbd0p15            252:15   0   106M  0 part 
└─rbd0p1             252:1    0 199.9G  0 part 

# ext4文件系统修复(系统自带版本太老,需要新版版,在修复前先编译新版e2fsck)
[root@controller ~]# cd e2fsprogs-1.46.2/e2fsck/
[root@controller e2fsck]# ./e2fsck -a /dev/rbd0p1  # -fcv

# 修复完取消映射
[root@controller ~]# rbd unmap b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk --pool vms

编译新版e2fsck,e2fsck最新版下载地址: https://sourceforge.net/projects/e2fsprogs/

[root@controller ~]# wget https://nchc.dl.sourceforge.net/project/e2fsprogs/e2fsprogs/v1.46.2/e2fsprogs-1.46.2.tar.gz
[root@controller ~]# tar zxvf e2fsprogs-1.46.2.tar.gz
[root@controller ~]# cd e2fsprogs-1.46.2
[root@controller e2fsprogs-1.46.2]# ./configure
[root@controller e2fsprogs-1.46.2]# make -j 30
[root@controller e2fsprogs-1.46.2]# cd e2fsck/
[root@controller e2fsck]# ./e2fsck
Usage: ./e2fsck [-panyrcdfktvDFV] [-b superblock] [-B blocksize]
                [-l|-L bad_blocks_file] [-C fd] [-j external_journal]
                [-E extended-options] [-z undo_file] device

Emergency help:
 -p                   Automatic repair (no questions)
 -n                   Make no changes to the filesystem
 -y                   Assume "yes" to all questions
 -c                   Check for bad blocks and add them to the badblock list
 -f                   Force checking even if filesystem is marked clean
 -v                   Be verbose
 -b superblock        Use alternative superblock
 -B blocksize         Force blocksize when looking for superblock
 -j external_journal  Set location of the external journal
 -l bad_blocks_file   Add to badblocks list
 -L bad_blocks_file   Set badblocks list
 -z undo_file         Create an undo file

3、windows虚拟机

RBD映射出来,然后使用 ntfsfix 工具修复即可。

[root@controller ~]# yum install ntfs-3g ntfsprogs
[root@controller ~]# ntfsfix /dev/rbd0p1

参考文章:
https://azhegit.gitee.io/2019/08/16-openstack停电故障修复/
https://blog.frognew.com/2017/02/ceph-rbd.html

1

发表回复

验证码: − 3 = 1