Monday, September 28, 2015

Why did df command throw "disk" Input output error

Issue/Symptom  : While DBA was starting oracle instance, it was failing. On checking FS, it was found that arch volumes are not mounted. While tried to remount them and checked through "df -h", it was throwing below error:
[root@customer-pet-oracle-3d ~]#  df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p3      58G   48G  7.3G  87% /
/dev/cciss/c0d0p1     494M   18M  452M   4% /boot
tmpfs                  63G  232M   63G   1% /dev/shm
tmpfs                 4.0K     0  4.0K   0% /dev/vx
df: `/cinprds1_arch00': Input/output error
df: `/cinprd1_arch00': Input/output error
df: `/customerdcdp1_arch00': Input/output error
df: `/customerrptp1_arch00': Input/output error
example-prod-sea1utilnas-1a-pet:/vol/customerpet_data
                      450G  335G  116G  75% /filers/example-prod-sea1utilnas-1a-pet/customerpet_data
[root@customer-pet-oracle-3d ~]
OS Environment : RHEL 5.5
Software/Application :
DB : oracle 11.2.0.4
VxVm : VRTSvxvm-5.1.100.000-SP1_RHEL5, Symantec License Manager vxlicrep utility version 3.02.51.010
vxfs : VRTSvxfs-5.1.100.000-SP1_GA_RHEL5
Customer Environment : ATT PET Oracle DB
Investigation :
1.
$sanlun lun show|grep -i minipet_arc

customer-pet-sea1bfiler-1a:  /vol/customer_MINIPET_ARCH/lun1                /dev/sdag        host1    FCP        500.1g (536952700928)   GOOD
customer-pet-sea1bfiler-1a:  /vol/customer_MINIPET_ARCH/lun0                /dev/sdah        host1    FCP        500.1g (536952700928)   GOOD
customer-pet-sea1bfiler-1a:  /vol/customer_MINIPET_ARCH/lun2                /dev/sdai        host1    FCP          250g (268435456000)   GOOD
customer-pet-sea1bfiler-1a:  /vol/customer_MINIPET_ARCH/lun3                /dev/sdaj        host1    FCP          250g (268435456000)   GOOD
2. Check fstab entry how it is :
fstab was :
/dev/vx/dsk/minipet_arch_dg/cinprds_minipet_vol_arch00 /cinprds1_arch00 vxfs    _netdev 0 1
3. Check netfs service if running:
$/etc/init.d/netfs status
4. Search dg in log as root  :
$ awk '/arch_dg/ {print $0}' /var/log/messages.*

Sep 25 19:26:08 customer-pet-oracle-3d kernel: vxfs: msgcnt 1 mesg 037: V-2-37: vx_metaioerr - vx_inode_iodone - /dev/vx/dsk/minipet_arch_dg/rpt_minipet_vol_arch00 file system meta data write error in dev/block 0/1104
Sep 25 19:26:08 customer-pet-oracle-3d vxvm:vxconfigd: V-5-1-7935 Disk group minipet_arch_dg: update failed: Disk group has no valid configuration copies
Sep 25 19:26:08 customer-pet-oracle-3d vxvm:vxconfigd: V-5-1-7934 Disk group minipet_arch_dg: Disabled by errors
[...]
Sep 25 19:30:01 customer-pet-oracle-3d kernel: VxVM vxio V-5-3-1285 voldmp_errbuf_sio_start: Failed to flush the error buffer ffff811130c6aa00 on device 0xc900130 to DMP<4>vxfs: msgcnt 5 mesg 039: V-2-39: vx_writesuper - /dev/vx/dsk/minipet_arch_dg/cinprds_minipet_vol_arch00 file system super-block write error
Sep 25 19:30:01 customer-pet-oracle-3d kernel: vxfs: msgcnt 6 mesg 037: V-2-37: vx_metaioerr - vx_dirbread - /dev/vx/dsk/minipet_arch_dg/cinprds_minipet_vol_arch00 file system meta data write error in dev/block 0/1104
[...]
Sep 25 19:40:01 customer-pet-oracle-3d kernel: vxfs: msgcnt 21 mesg 039: V-2-39: vx_writesuper - /dev/vx/dsk/minipet_arch_dg/rpt_minipet_vol_arch00 file system super-block write error
Sep 25 19:40:01 customer-pet-oracle-3d kernel: vxfs: msgcnt 22 mesg 008: V-2-8: vx_direrr: vx_readdir_int_1 - /dev/vx/dsk/minipet_arch_dg/rpt_minipet_vol_arch00 file system dir inode 5 dev/block 0/150297879 dirent inode 0 error 5
Sep 25 19:40:01 customer-pet-oracle-3d kernel: vxfs: msgcnt 23 mesg 039: V-2-39: vx_writesuper - /dev/vx/dsk/minipet_arch_dg/rpt_minipet_vol_arch00 file system super-block write error
[...]
Sep 26 04:18:13 customer-pet-oracle-3d kernel: vxfs: msgcnt 334 mesg 016: V-2-16: vx_ilisterr: vx_iread - /dev/vx/dsk/minipet_arch_dg/rpt_minipet_vol_arch00 file system error reading inode 3
Sep 26 04:18:13 customer-pet-oracle-3d kernel: vxfs: msgcnt 335 mesg 039: V-2-39: vx_writesuper - /dev/vx/dsk/minipet_arch_dg/rpt_minipet_vol_arch00 file system super-block write error
Sep 26 04:18:13 customer-pet-oracle-3d kernel: vxfs: msgcnt 336 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/minipet_arch_dg/rpt_minipet_vol_arch00 file system disabled
Above confirms that few blocks are corrupted in disks which are under disk group "minipet_arch_dg"

Permanent Solution :
1. Unmount file system if mounted. 2. Run file system check through fsck :
$fsck -F vxfs -o full 
example :
$/opt/VRTS/bin/fsck -o full -y /dev/vx/rdsk/minipet_arch_dg/rpt_minipet_vol_arch00

or 

$fsck.vxfs -o full /dev/vx/dsk/minipet_arch_dg/rpt_minipet_vol_arch00
3. OR:
or reboot system [make sure fsck is enabled in fstab]

Root Cause Analysis :

Error messages in system log confirm that disk blocks are corrupted. vxiod was failing to write data. vxconfigd informed kernel that it was unable to change vxfs config.