System is rebooting abnormally due to block size issue on one of FS.
1 KiB blocksize is used for a file system and it was
expressed in a previous case, 01960120 , that this should not be used to done.
------------------------------------------------------------8> - possible mitigation activities
o As the issue may be a function of the file system block size, refrain from using a file system block size of 1KiB.
------------------------------------------------------------8 sys
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.17.1.el7.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/664521057/crash/vmcore [PARTIAL DUMP]
CPUS: 20
DATE: Sat Nov 10 16:17:01 2018
UPTIME: 11 days, 10:24:29
LOAD AVERAGE: 8.78, 9.25, 8.80
TASKS: 1093
NODENAME: ITSUSRALSP05403
RELEASE: 3.10.0-693.17.1.el7.x86_64
VERSION: #1 SMP Sun Jan 14 10:36:03 EST 2018
MACHINE: x86_64 (2397 Mhz)
MEMORY: 96 GB
PANIC: "kernel BUG at fs/jbd2/journal.c:766!
crash> mod -t
NAME TAINTS
redirfs OE
gsch OE
o Existing file system errors
crash> log | grep -i ext | grep -v gsch
[778861.340894] EXT4-fs (dm-12): error count since last fsck: 115
[778861.340898] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[778861.340901] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[865368.129522] EXT4-fs (dm-12): error count since last fsck: 115
[865368.129526] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[865368.129528] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[951874.916775] EXT4-fs (dm-12): error count since last fsck: 115
[951874.916780] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[951874.916782] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[987861.755674] RIP: 0010:[] [] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
[987861.758359] RIP [] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
o Several messages relating to the third-party kernel module and ext
crash> log | grep -i ext | grep gsch_flt | awk '{for (i=2;i<=NF;i++){printf "%s ",$i ; if (i==NF) print ""}}' | sort | uniq -c | sort -rn
243 gsch_flt_add_mnt(/var/tmp @ Unknown[ef53(ext3)]) done: 0
243 gsch_flt_add_mnt(/ @ Unknown[ef53(ext3)]) done: 0
243 gsch_flt_add_mnt(/tmp @ Unknown[ef53(ext3)]) done: 0
121 gsch_flt_add_mnt(/boot @ Unknown[ef53(ext3)]) done: 0
o Processes just started and were in an uninterruptible state.
crash> ps -m | grep UN
[ 0 00:00:00.000] [UN] PID: 4073 TASK: ffff880431bdcf10 CPU: 14 COMMAND: "oracle_4073_mra"
[ 0 00:00:00.000] [UN] PID: 29624 TASK: ffff8804ee2c0000 CPU: 12 COMMAND: "ora_j000_mraq04"
[ 0 00:00:00.005] [UN] PID: 3805 TASK: ffff8806cd771fa0 CPU: 1 COMMAND: "oracle_3805_mra"
[ 0 00:00:00.017] [UN] PID: 43209 TASK: ffff8807d3e09fa0 CPU: 6 COMMAND: "oracle_43209_mr"
o Crashing process
crash> bt
PID: 2296 TASK: ffff88115d290fd0 CPU: 4 COMMAND: "jbd2/dm-12-8"
#0 [ffff88115cbcf930] machine_kexec at ffffffff8105c63b
#1 [ffff88115cbcf990] __crash_kexec at ffffffff81106922
#2 [ffff88115cbcfa60] crash_kexec at ffffffff81106a10
#3 [ffff88115cbcfa78] oops_end at ffffffff816b0aa8
#4 [ffff88115cbcfaa0] die at ffffffff8102e87b
#5 [ffff88115cbcfad0] do_trap at ffffffff816b01f0
#6 [ffff88115cbcfb20] do_invalid_op at ffffffff8102b174
#7 [ffff88115cbcfbd0] invalid_op at ffffffff816bd1ae
[exception RIP: jbd2_journal_next_log_block+121]
RIP: ffffffffc014ad99 RSP: ffff88115cbcfc88 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff88115b417800 RCX: 0000000000000008
RDX: 0000000000038818 RSI: ffff88115cbcfd38 RDI: ffff88115b41782c
RBP: ffff88115cbcfca0 R8: ffff8804464fbbc8 R9: 0000000000000000
R10: 0000000000000001 R11: 0000040000000400 R12: ffff88115b417828
R13: ffff88115cbcfd38 R14: ffff88115b417800 R15: 000000000000000b
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88115cbcfc80] jbd2_journal_next_log_block at ffffffffc014ad40 [jbd2]
#9 [ffff88115cbcfca8] jbd2_journal_commit_transaction at ffffffffc01437c8 [jbd2]
#10 [ffff88115cbcfe48] kjournald2 at ffffffffc0149a79 [jbd2]
#11 [ffff88115cbcfec8] kthread at ffffffff810b270f
#12 [ffff88115cbcff50] ret_from_fork at ffffffff816b8798
crash> mount | awk 'NR == 1 || $0 ~ "vg_oraarch-lv_oraarch"'
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff881159887780 ffff88115b714000 ext3 /dev/mapper/vg_oraarch-lv_oraarch /u02/oraarch
o 1 KiB blocksize again.
crash> super_block.s_blocksize ffff88115b714000
s_blocksize = 1024
Is there a reason why the 1 KiB blocksize is still being used?
### Next Steps
o State why the 1 KiB block size is being used when it was expressed previously to avoid such a small blocksize.
------------------------------------------------------------8> - possible mitigation activities
o As the issue may be a function of the file system block size, refrain from using a file system block size of 1KiB.
------------------------------------------------------------8 sys
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.17.1.el7.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/664521057/crash/vmcore [PARTIAL DUMP]
CPUS: 20
DATE: Sat Nov 10 16:17:01 2018
UPTIME: 11 days, 10:24:29
LOAD AVERAGE: 8.78, 9.25, 8.80
TASKS: 1093
NODENAME: ITSUSRALSP05403
RELEASE: 3.10.0-693.17.1.el7.x86_64
VERSION: #1 SMP Sun Jan 14 10:36:03 EST 2018
MACHINE: x86_64 (2397 Mhz)
MEMORY: 96 GB
PANIC: "kernel BUG at fs/jbd2/journal.c:766!
crash> mod -t
NAME TAINTS
redirfs OE
gsch OE
o Existing file system errors
crash> log | grep -i ext | grep -v gsch
[778861.340894] EXT4-fs (dm-12): error count since last fsck: 115
[778861.340898] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[778861.340901] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[865368.129522] EXT4-fs (dm-12): error count since last fsck: 115
[865368.129526] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[865368.129528] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[951874.916775] EXT4-fs (dm-12): error count since last fsck: 115
[951874.916780] EXT4-fs (dm-12): initial error at time 1495053905: ext4_validate_block_bitmap:381
[951874.916782] EXT4-fs (dm-12): last error at time 1540908042: ext4_validate_block_bitmap:384
[987861.755674] RIP: 0010:[] [] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
[987861.758359] RIP [] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
o Several messages relating to the third-party kernel module and ext
crash> log | grep -i ext | grep gsch_flt | awk '{for (i=2;i<=NF;i++){printf "%s ",$i ; if (i==NF) print ""}}' | sort | uniq -c | sort -rn
243 gsch_flt_add_mnt(/var/tmp @ Unknown[ef53(ext3)]) done: 0
243 gsch_flt_add_mnt(/ @ Unknown[ef53(ext3)]) done: 0
243 gsch_flt_add_mnt(/tmp @ Unknown[ef53(ext3)]) done: 0
121 gsch_flt_add_mnt(/boot @ Unknown[ef53(ext3)]) done: 0
o Processes just started and were in an uninterruptible state.
crash> ps -m | grep UN
[ 0 00:00:00.000] [UN] PID: 4073 TASK: ffff880431bdcf10 CPU: 14 COMMAND: "oracle_4073_mra"
[ 0 00:00:00.000] [UN] PID: 29624 TASK: ffff8804ee2c0000 CPU: 12 COMMAND: "ora_j000_mraq04"
[ 0 00:00:00.005] [UN] PID: 3805 TASK: ffff8806cd771fa0 CPU: 1 COMMAND: "oracle_3805_mra"
[ 0 00:00:00.017] [UN] PID: 43209 TASK: ffff8807d3e09fa0 CPU: 6 COMMAND: "oracle_43209_mr"
o Crashing process
crash> bt
PID: 2296 TASK: ffff88115d290fd0 CPU: 4 COMMAND: "jbd2/dm-12-8"
#0 [ffff88115cbcf930] machine_kexec at ffffffff8105c63b
#1 [ffff88115cbcf990] __crash_kexec at ffffffff81106922
#2 [ffff88115cbcfa60] crash_kexec at ffffffff81106a10
#3 [ffff88115cbcfa78] oops_end at ffffffff816b0aa8
#4 [ffff88115cbcfaa0] die at ffffffff8102e87b
#5 [ffff88115cbcfad0] do_trap at ffffffff816b01f0
#6 [ffff88115cbcfb20] do_invalid_op at ffffffff8102b174
#7 [ffff88115cbcfbd0] invalid_op at ffffffff816bd1ae
[exception RIP: jbd2_journal_next_log_block+121]
RIP: ffffffffc014ad99 RSP: ffff88115cbcfc88 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff88115b417800 RCX: 0000000000000008
RDX: 0000000000038818 RSI: ffff88115cbcfd38 RDI: ffff88115b41782c
RBP: ffff88115cbcfca0 R8: ffff8804464fbbc8 R9: 0000000000000000
R10: 0000000000000001 R11: 0000040000000400 R12: ffff88115b417828
R13: ffff88115cbcfd38 R14: ffff88115b417800 R15: 000000000000000b
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88115cbcfc80] jbd2_journal_next_log_block at ffffffffc014ad40 [jbd2]
#9 [ffff88115cbcfca8] jbd2_journal_commit_transaction at ffffffffc01437c8 [jbd2]
#10 [ffff88115cbcfe48] kjournald2 at ffffffffc0149a79 [jbd2]
#11 [ffff88115cbcfec8] kthread at ffffffff810b270f
#12 [ffff88115cbcff50] ret_from_fork at ffffffff816b8798
crash> mount | awk 'NR == 1 || $0 ~ "vg_oraarch-lv_oraarch"'
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff881159887780 ffff88115b714000 ext3 /dev/mapper/vg_oraarch-lv_oraarch /u02/oraarch
o 1 KiB blocksize again.
crash> super_block.s_blocksize ffff88115b714000
s_blocksize = 1024
Is there a reason why the 1 KiB blocksize is still being used?
### Next Steps
o State why the 1 KiB block size is being used when it was expressed previously to avoid such a small blocksize.
~~~~~
Resolution :
- increase the block size to 4K recommended.
No comments:
Post a Comment