UnixPedia : HPUX / LINUX / SOLARIS: System is rebooting abnormally due to block size issue on one of FS.

Thursday, November 22, 2018

System is rebooting abnormally due to block size issue on one of FS.



System is rebooting abnormally due to block size issue on one of FS.


 1 KiB blocksize is used for a file system and it was expressed in a previous case, 01960120 , that this should not be used to done.

------------------------------------------------------------8> - possible mitigation activities
o As the issue may be a function of the file system block size, refrain from using a file system block size of 1KiB.
------------------------------------------------------------8 sys
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.17.1.el7.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/664521057/crash/vmcore [PARTIAL DUMP]
CPUS: 20
DATE: Sat Nov 10 16:17:01 2018
UPTIME: 11 days, 10:24:29
LOAD AVERAGE: 8.78, 9.25, 8.80
TASKS: 1093
NODENAME: ITSUSRALSP05403
RELEASE: 3.10.0-693.17.1.el7.x86_64
VERSION: #1 SMP Sun Jan 14 10:36:03 EST 2018
MACHINE: x86_64 (2397 Mhz)
MEMORY: 96 GB
PANIC: "kernel BUG at fs/jbd2/journal.c:766!

crash> mod -t
NAME TAINTS
redirfs OE
gsch OE

o Existing file system errors
crash> log | grep -i ext | grep -v gsch
[778861.340894] EXT4-fs (dm-12): error count since last fsck: 115
[778861.340898] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[778861.340901] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[865368.129522] EXT4-fs (dm-12): error count since last fsck: 115
[865368.129526] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[865368.129528] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[951874.916775] EXT4-fs (dm-12): error count since last fsck: 115
[951874.916780] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[951874.916782] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[987861.755674] RIP: 0010:[] [] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
[987861.758359] RIP [] jbd2_journal_next_log_block+0x79/0x80 [jbd2]

o Several messages relating to the third-party kernel module and ext
crash> log | grep -i ext | grep gsch_flt | awk '{for (i=2;i<=NF;i++){printf "%s ",$i ; if (i==NF) print ""}}' | sort | uniq -c | sort -rn
243 gsch_flt_add_mnt(/var/tmp @ Unknown[ef53(ext3)]) done: 0
243 gsch_flt_add_mnt(/ @ Unknown[ef53(ext3)]) done: 0
243 gsch_flt_add_mnt(/tmp @ Unknown[ef53(ext3)]) done: 0
121 gsch_flt_add_mnt(/boot @ Unknown[ef53(ext3)]) done: 0

o Processes just started and were in an uninterruptible state.
crash> ps -m | grep UN
[ 0 00:00:00.000] [UN] PID: 4073 TASK: ffff880431bdcf10 CPU: 14 COMMAND: "oracle_4073_mra"
[ 0 00:00:00.000] [UN] PID: 29624 TASK: ffff8804ee2c0000 CPU: 12 COMMAND: "ora_j000_mraq04"
[ 0 00:00:00.005] [UN] PID: 3805 TASK: ffff8806cd771fa0 CPU: 1 COMMAND: "oracle_3805_mra"
[ 0 00:00:00.017] [UN] PID: 43209 TASK: ffff8807d3e09fa0 CPU: 6 COMMAND: "oracle_43209_mr"

o Crashing process
crash> bt
PID: 2296 TASK: ffff88115d290fd0 CPU: 4 COMMAND: "jbd2/dm-12-8"
#0 [ffff88115cbcf930] machine_kexec at ffffffff8105c63b
#1 [ffff88115cbcf990] __crash_kexec at ffffffff81106922
#2 [ffff88115cbcfa60] crash_kexec at ffffffff81106a10
#3 [ffff88115cbcfa78] oops_end at ffffffff816b0aa8
#4 [ffff88115cbcfaa0] die at ffffffff8102e87b
#5 [ffff88115cbcfad0] do_trap at ffffffff816b01f0
#6 [ffff88115cbcfb20] do_invalid_op at ffffffff8102b174
#7 [ffff88115cbcfbd0] invalid_op at ffffffff816bd1ae
[exception RIP: jbd2_journal_next_log_block+121]
RIP: ffffffffc014ad99 RSP: ffff88115cbcfc88 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff88115b417800 RCX: 0000000000000008
RDX: 0000000000038818 RSI: ffff88115cbcfd38 RDI: ffff88115b41782c
RBP: ffff88115cbcfca0 R8: ffff8804464fbbc8 R9: 0000000000000000
R10: 0000000000000001 R11: 0000040000000400 R12: ffff88115b417828
R13: ffff88115cbcfd38 R14: ffff88115b417800 R15: 000000000000000b
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88115cbcfc80] jbd2_journal_next_log_block at ffffffffc014ad40 [jbd2]
#9 [ffff88115cbcfca8] jbd2_journal_commit_transaction at ffffffffc01437c8 [jbd2]
#10 [ffff88115cbcfe48] kjournald2 at ffffffffc0149a79 [jbd2]
#11 [ffff88115cbcfec8] kthread at ffffffff810b270f
#12 [ffff88115cbcff50] ret_from_fork at ffffffff816b8798

crash> mount | awk 'NR == 1 || $0 ~ "vg_oraarch-lv_oraarch"'
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff881159887780 ffff88115b714000 ext3 /dev/mapper/vg_oraarch-lv_oraarch /u02/oraarch

o 1 KiB blocksize again.
crash> super_block.s_blocksize ffff88115b714000
s_blocksize = 1024

Is there a reason why the 1 KiB blocksize is still being used?

### Next Steps

o State why the 1 KiB block size is being used when it was expressed previously to avoid such a small blocksize.
~~~~~

Resolution : 
  1. increase the block size to 4K recommended.


No comments:

Post a Comment