联系:手机/微信(+86 13429648788) QQ(107644445)asm disk被加入vg恢复

标题:asm disk被加入vg恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户恢复请求:把oracle asm datagroup中的一个磁盘增加到vg中,现在磁盘组无法mount,数据库无法正常启动.远程登录现场进行分析发现情况如下:
操作系统层面分析
history操作记录
asm disk被加入vg恢复


这里比较明显把一个磁盘做成pv,并且加入到vg中,然后再分配199G给lv_home,系统层面分析lvm信息

 

--查看pv信息

[[email protected] ~]# pvdisplay

  --- Physical volume ---

  PV Name               /dev/sda2

  VG Name               VolGroup

  PV Size               277.98 GiB / not usable 3.00 MiB

  Allocatable           yes (but full)

  PE Size               4.00 MiB

  Total PE              71161

  Free PE               0

  Allocated PE          71161

  PV UUID               F6QO3f-065n-mwTW-Xbq2-Xx2y-c8HD-Tkr7V7

    

  --- Physical volume ---

  PV Name               /dev/sdg    <----新加入的磁盘

  VG Name               VolGroup

  PV Size               200.00 GiB / not usable 4.00 MiB

  Allocatable           yes

  PE Size               4.00 MiB

  Total PE              51199

  Free PE               255

  Allocated PE          50944

  PV UUID               i69vUG-nCIK-dtxL-FvpD-2WZd-bvLv-n7lwrb

 

[[email protected] ~]# lvdisplay

  --- Logical volume ---

  LV Path                /dev/VolGroup/lv_root

  LV Name                lv_root

  VG Name                VolGroup

  LV UUID                JUNnkN-m4zq-D0gh-h42b-cUM1-Wh1q-ZMtQE4

  LV Write Access        read/write

  LV Creation host, time localhost.localdomain, 2017-07-19 20:08:47 +0800

  LV Status              available

  # open                 1

  LV Size                50.00 GiB

  Current LE             12800

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:0

    

  --- Logical volume ---

  LV Path                /dev/VolGroup/lv_home

  LV Name                lv_home

  VG Name                VolGroup

  LV UUID                eZTkLt-cNGX-371i-m8Bd-VdD9-q6Hz-wYDRIJ

  LV Write Access        read/write

  LV Creation host, time localhost.localdomain, 2017-07-19 20:08:54 +0800

  LV Status              available

  # open                 1

  LV Size                422.97 GiB      <-----lv大小编程422G,应该是被扩了199G后结果

  Current LE             108281

  Segments               2

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:2

    

  --- Logical volume ---

  LV Path                /dev/VolGroup/lv_swap

  LV Name                lv_swap

  VG Name                VolGroup

  LV UUID                54P9ok-VpwO-zM68-hvwY-9GBf-89yb-8xQAMn

  LV Write Access        read/write

  LV Creation host, time localhost.localdomain, 2017-07-19 20:09:23 +0800

  LV Status              available

  # open                 1

  LV Size                4.00 GiB

  Current LE             1024

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:1

 

 

[[email protected] ~]# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/mapper/VolGroup-lv_root

                       50G  3.9G   43G   9% /

tmpfs                  63G  509M   63G   1% /dev/shm

/dev/sda1             477M   44M  408M  10% /boot

/dev/mapper/VolGroup-lv_home

                      417G  226G  170G  58% /home  <----增加了199g空间,剩余只剩170G,证明增加空间之后最少使用了30G以上 

基于这样的情况,基本上可以确定sdg盘加入VolGroup中并且被分配给 lv_home中,而且还写入了数据(/home空闲空间只剩余170G,lv_home当时扩了199G).
asm层面分析
asm磁盘组无法mount,提示缺少一块磁盘

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:12056:279} */

NOTE: cache registered group DATA number=1 incarn=0xa1dbff16

NOTE: cache began mount (first) of group DATA number=1 incarn=0xa1dbff16

NOTE: Assigning number (1,2) to disk (/dev/asmdisk3)

NOTE: Assigning number (1,1) to disk (/dev/asmdisk2)

Sat Apr 25 13:04:58 2020

ERROR: no read quorum in group: required 1, found 0 disks

NOTE: cache dismounting (clean) group 1/0xA1DBFF16 (DATA)

NOTE: messaging CKPT to quiesce pins Unix process pid: 81552, image: [email protected] (TNS V1-V3)

NOTE: dbwr not being msg'd to dismount

NOTE: lgwr not being msg'd to dismount

NOTE: cache dismounted group 1/0xA1DBFF16 (DATA)

NOTE: cache ending mount (fail) of group DATA number=1 incarn=0xa1dbff16

NOTE: cache deleting context for group DATA 1/0xa1dbff16

GMON dismounting group 1 at 19 for pid 30, osid 81552

NOTE: Disk DATA_0001 in mode 0x9 marked for de-assignment

NOTE: Disk DATA_0002 in mode 0x9 marked for de-assignment

ERROR: diskgroup DATA was not mounted

ORA-15032: not all alterations performed

ORA-15017: diskgroup "DATA" cannot be mounted

ORA-15040: diskgroup is incomplete

ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:12056:279} */

kfed分析磁盘信息
asm disk被加入vg恢复


报错比较明显asm disk磁盘头被lvm的信息取代(因为asm disk 被加入到vg中),根据前面的分析,该磁盘被写入数据很可能超过30G,使用kfed分析一个随意au,确认被破坏,证明开始判断基本正确

 

[email protected]:/home/oracle11g$kfed read /dev/asmdisk1 aun=10000

kfbh.endian:                         51 ; 0x000: 0x33

kfbh.hard:                           55 ; 0x001: 0x37

kfbh.type:                           32 ; 0x002: *** Unknown Enum ***

kfbh.datfmt:                         42 ; 0x003: 0x2a

kfbh.block.blk:              1329801248 ; 0x004: blk=1329801248

kfbh.block.obj:              1128615502 ; 0x008: file=347726

kfbh.check:                  1094999892 ; 0x00c: 0x41445f54

kfbh.fcn.base:                675103060 ; 0x010: 0x283d4154

kfbh.fcn.wrap:               1448232275 ; 0x014: 0x56524553

kfbh.spare1:                 1598374729 ; 0x018: 0x5f454349

kfbh.spare2:                 1162690894 ; 0x01c: 0x454d414e

7F7843EAD400 2A203733 4F432820 43454E4E 41445F54  [37 * (CONNECT_DA]

7F7843EAD410 283D4154 56524553 5F454349 454D414E  [TA=(SERVICE_NAME]

7F7843EAD420 6361723D 29626432 44494328 5250283D  [=rac2db)(CID=(PR]

7F7843EAD430 4152474F 3A443D4D 4341505C DFCF3153  [OGRAM=D:\PACS1..]

7F7843EAD440 B3BEB7BB 6369445C 65536D6F 72657672  [....\DicomServer]

7F7843EAD450 445C524D 6D6F6369 76726553 524D7265  [MR\DicomServerMR]

7F7843EAD460 6578652E 4F482829 573D5453 362D4E49  [.exe)(HOST=WIN-6]

7F7843EAD470 51414C38 54553645 28294A30 52455355  [8LAQE6UT0J)(USER]

7F7843EAD480 6D64413D 73696E69 74617274 2929726F  [=Administrator))]

7F7843EAD490 202A2029 44444128 53534552 5250283D  [) * (ADDRESS=(PR]

7F7843EAD4A0 434F544F 743D4C4F 28297063 54534F48  [OTOCOL=tcp)(HOST]

7F7843EAD4B0 2E30313D 2E303831 30332E31 4F502829  [=10.180.1.30)(PO]

7F7843EAD4C0 343D5452 37333539 2A202929 74736520  [RT=49537)) * est]

7F7843EAD4D0 696C6261 2A206873 63617220 20626432  [ablish * rac2db ]

7F7843EAD4E0 3231202A 0A343135 2D534E54 31353231  [* 12514.TNS-1251]

7F7843EAD4F0 54203A34 6C3A534E 65747369 2072656E  [4: TNS:listener ]

7F7843EAD500 73656F64 746F6E20 72756320 746E6572  [does not current]

7F7843EAD510 6B20796C 20776F6E 7320666F 69767265  [ly know of servi]

7F7843EAD520 72206563 65757165 64657473 206E6920  [ce requested in ]

7F7843EAD530 6E6E6F63 20746365 63736564 74706972  [connect descript]

………………

7F7843EAE300 6F636944 7265536D 4D726576 69445C52  [DicomServerMR\Di]

7F7843EAE310 536D6F63 65767265 2E524D72 29657865  [comServerMR.exe)]

7F7843EAE320 534F4828 49573D54 4F302D4E 314B304A  [(HOST=WIN-0OJ0K1]

7F7843EAE330 4955304E 55282954 3D524553 696D6441  [N0UIT)(USER=Admi]

7F7843EAE340 7473696E 6F746172 29292972 28202A20  [nistrator))) * (]

7F7843EAE350 52444441 3D535345 4F525028 4F434F54  [ADDRESS=(PROTOCO]

7F7843EAE360 63743D4C 48282970 3D54534F 312E3031  [L=tcp)(HOST=10.1]

7F7843EAE370 312E3038 2930332E 524F5028 35353D54  [80.1.30)(PORT=55]

7F7843EAE380 29383632 202A2029 61747365 73696C62  [268)) * establis]

7F7843EAE390 202A2068 32636172 2A206264 35323120  [h * rac2db * 125]

7F7843EAE3A0 540A3431 312D534E 34313532 4E54203A  [14.TNS-12514: TN]

7F7843EAE3B0 696C3A53 6E657473 64207265 2073656F  [S:listener does ]

7F7843EAE3C0 20746F6E 72727563 6C746E65 6E6B2079  [not currently kn]

7F7843EAE3D0 6F20776F 65732066 63697672 65722065  [ow of service re]

7F7843EAE3E0 73657571 20646574 63206E69 656E6E6F  [quested in conne]

7F7843EAE3F0 64207463 72637365 6F747069 34320A72  [ct descriptor.24]

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][32]

通过上述kfed可以看到第10000 au的位置被写入的是数据库异常之后listener.log的信息(该数据库安装在/home目录中),进一步证明覆盖,通过以下信息证明sdg就是asmdisk1

[[email protected] dev]# ls -l sdg

brw-rw---- 1 root disk 8,  96 Apr 25 00:05 sdg

[[email protected] dev]# ls -l asmdisk1

brw-rw---- 1 grid asmadmin 8,  96 Apr 25 00:05 asmdisk1

基于现在的情况,data磁盘组是由三块 200G的磁盘组成,第一块磁盘被意外加入vg,并且写入数据大于30G,无法从asm层面直接通过kfed修复磁盘组,然后直接mount,只能通过oracle asm磁盘数据块重组技术(asm disk header 彻底损坏恢复)实现没有覆盖数据的恢复.
asm disk被加入vg恢复


该客户运气还不错,通过仅剩的2019年12月份几天的不成功备份找出来所有的数据文件(无归档),然后强制拉库成功.通过碎片恢复的最新的数据文件数据结合2019年12月份备份,实现绝大部分业务数据恢复,最大限度减少客户损失.对于oracle rac数据库服务器磁盘操作需要谨慎.
如果不幸有类似oracle asm disk被破坏(格式化,dd部分,做成lv等),需要进行恢复支持,可以联系我们,做专业的恢复评估,最大限度,最快速度抢救数据,减少损失
Phone:13429648788    Q Q:107644445asm disk被加入vg恢复    E-Mail:[email protected]
恢复过部分asm被格式化案例:
又一例asm格式化文件系统恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统

相关文章: