LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
LinkBack Search this Thread
Old 10-22-2009, 05:11 PM   #1
Sum1
Member
 
Registered: Jul 2007
Distribution: Slackware and Gentoo
Posts: 199

Rep: Reputation: 18
Kernel and Ext3 Filesystem Errors at the same time -- difficult to diagnose.


Thank you for reading.
Server is Slackware 12.2 using kernel 2.6.30.4 and working as a PDC Samba Server
4 gigs. of DDR2 800 RAM
Main HD with OS and user data is WDC 150 gig. Western Digital SATA II Velociraptor
Secondary HD holding archived backup is 640 gig. Western Digital SATA II.

We had storms in my area over the weekend and when I came in Monday morn. the server was down. The server is plugged into a nice APC UPS/surge protector so I wasn't too worried.

I started up the server and everything ran fine all day.
Monday evening after 5 pm when the employees left, I wanted to test whether I could recover data from my compressed backup archives.
This is when all the trouble began.

Everytime I tried to decompress my gzipped and/or bzipped archives, the process would fail. Example:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root@a1:/archive/2009Oct19# ls -lh
total 11G
-rw-r--r-- 1 root root 11G 2009-10-19 21:46 mw.tar.bz2

root@a1:/archive/2009Oct19# tar -xjf mw.tar.bz2
tar: Skipping to next header

bzip2: Data integrity error when decompressing.
Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Child returned status 2
tar: Exiting with failure status due to previous errors
root@a1:/archive/2009Oct19#
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I tried to decompress a few other archives from other day's backups and all of them failed to complete, as well. Ugh, all my daily backups are useless.

So I decided to use scp to copy our live user data to another linux box in order to have at least one safe copy of the company's data.

Then I tried to write zeros to the secondary drive and was surprised to see this operation fail too:

root@a1:/# dd if=/dev/zero of=/dev/sdb bs=4096

Message from syslogd@a1
a1 kernel: Oops: 0002 [#2] SMP

Message from syslogd@a1
a1 kernel: last sysfs file: /sys/devices/virtual/block/ram9/range

Message from syslogd@a1
a1 kernel: Process cp (pid: 4783, ti=e0634000 task=f699f660 task.ti=e0634000)

Message from syslogd@a1
a1 kernel: Stack:

Message from syslogd@a1
a1 kernel: 00000000 e0635cc0 c29dd880 f25a854c 001200d2 e0635cc0 c1079fee c29dd880

Message from syslogd@a1
a1 kernel: f25a854c 00000001 c1082a42 e0635b54 00000000 e0635c30 010e2bbe 00000000

Message from syslogd@a1
a1 kernel: Call Trace:

Message from syslogd@a1
a1 kernel: [<c10f4356>] ? journal_grab_journal_head+0xb/0x28

Message from syslogd@a1
a1 kernel: [<c10f0917>] ? journal_try_to_free_buffers+0x49/0x16a

Message from syslogd@a1

Message from syslogd@a1
a1a1 kernel: [<c1082a42>] ? shrink_page_list+0x3b4/0x4fekernel: [<c1079fee>] ? try_to_release_page+0x38/0x4c

Message from syslogd@a1
a1 kernel: [<c1081e4b>] ? isolate_pages_global+0x99/0x186

Message from syslogd@a1
a1 kernel: [<c1082d80>] ? shrink_list+0x1f4/0x417

Message from syslogd@a1
a1 kernel: [<c10e2aac>] ? ext3_mark_iloc_dirty+0x2a3/0x30c

Message from syslogd@a1

Message from syslogd@a1
a1a1 kernel: [<c10e3a2d>] ? ext3_get_blocks_handle+0x78d/0x7a9kernel: [<c10eddcf>] ? __ext3_journal_dirty_metadata+0x13/0x32

Message from syslogd@a1
a1 kernel: [<c140ae5a>] ? _spin_lock_irq+0x12/0x15

Message from syslogd@a1
a1 kernel: [<c10831b4>] ? shrink_zone+0x211/0x289

Message from syslogd@a1
a1 kernel: [<c10834e1>] ? try_to_free_pages+0x17c/0x292

Message from syslogd@a1
a1 kernel: [<c1081db2>] ? isolate_pages_global+0x0/0x186

Message from syslogd@a1
a1 kernel: [<c107ea66>] ? __alloc_pages_internal+0x1f0/0x34b

Message from syslogd@a1
a1 kernel: [<c107a054>] ? __page_cache_alloc+0xe/0x10

Message from syslogd@a1
a1 kernel: [<c107c16a>] ? grab_cache_page_write_begin+0x4d/0x87

Message from syslogd@a1
a1 kernel: [<c10e4965>] ? ext3_write_begin+0x67/0x19e

Message from syslogd@a1
a1 kernel: [<c107b0f3>] ? generic_file_buffered_write+0xe1/0x272

Message from syslogd@a1
a1 kernel: [<c107b68e>] ? __generic_file_aio_write_nolock+0x262/0x28c

Message from syslogd@a1
a1 kernel: [<c10afca3>] ? mnt_want_write+0x43/0x4b

Message from syslogd@a1
a1 kernel: [<c107bb4a>] ? generic_file_aio_read+0x492/0x4d9

Message from syslogd@a1
a1 kernel: [<c107bdde>] ? generic_file_aio_write+0x5b/0xb2

Message from syslogd@a1
a1 kernel: [<c10e14e1>] ? ext3_file_write+0x19/0x83

Message from syslogd@a1
a1 kernel: [<c109e0c1>] ? do_sync_write+0xbe/0x105

Message from syslogd@a1
a1 kernel: [<c10a0c02>] ? cp_new_stat64+0xc5/0xd7

Message from syslogd@a1
a1 kernel: [<c104fb30>] ? autoremove_wake_function+0x0/0x2b

Message from syslogd@a1
a1 kernel: [<c116ba89>] ? security_file_permission+0xc/0xd

Message from syslogd@a1
a1 kernel: [<c109e003>] ? do_sync_write+0x0/0x105

Message from syslogd@a1
a1 kernel: [<c109e76f>] ? vfs_write+0x73/0xa1

Message from syslogd@a1
a1 kernel: [<c109e831>] ? sys_write+0x3c/0x60

Message from syslogd@a1
a1 kernel: [<c1019d4d>] ? syscall_call+0x7/0xb

Message from syslogd@a1
a1 kernel: Code: 39 c8 75 10 8b 42 28 31 d2 0f c8 21 f8 39 f8 0f 94 c2 eb 02 31 d2 5b 89 d0 5e 5f c3 89 c1 eb 0b f3 90 0f a3 0a 19 c0 85 c0 75 f5 <f0> 0f ab 0a 19 c0 85 c0 75 ed c3 89 c2 b8 17 00 00 00 eb dd 89

Message from syslogd@a1
a1 kernel: EIP: [<c10f431b>] bit_spin_lock+0xf/0x1a SS:ESP 0068:e0635ab0

Message from syslogd@a1
a1 kernel: CR2: 0000000000000000

AND THE SERVER WAS FROZEN REQUIRING POWER OFF/ON.
A few days later after successfully writing zeros to the first 10 megs of /dev/sdb , I figured I'd try reinstall the ext3 filesystem.
More failure:

oot@a1:/# mkfs -t ext3 /dev/sdb1
mke2fs 1.41.3 (12-Oct-2008)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
39075840 inodes, 156282320 blocks
7814116 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
4770 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000
Writing inode tables: 189/4770
Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: Oops: 0000 [#1] SMP

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/host1/target1:0:0/1:0:0:0/block/sdb/size

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: Process kswapd0 (pid: 357, ti=f71fc000 task=f71b60a0 task.ti=f71fc000)

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: Stack:

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: 00000024 f1c07f4c c1081282 f71fdeac f1c07e9c 00000000 c1081683 0000000e

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: 00000018 ffffffff 00000000 00000000 00000000 c2a210e0 c2a210c0 f1c05bf8

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: Call Trace:

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c107bfc0>] ? find_get_pages+0x50/0x7b

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c1081282>] ? pagevec_lookup+0x17/0x1d

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c1081683>] ? __invalidate_mapping_pages+0xdb/0xf0

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c101a329>] ? common_interrupt+0x29/0x30

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c140ade3>] ? _spin_lock+0x5/0x7

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c108169f>] ? invalidate_mapping_pages+0x7/0x9

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c10ac3ab>] ? shrink_icache_memory+0xc6/0x1a2

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c1083308>] ? shrink_slab+0xdc/0x139

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c108395f>] ? kswapd+0x2fe/0x3f5

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c1081db2>] ? isolate_pages_global+0x0/0x186

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c104fb30>] ? autoremove_wake_function+0x0/0x2b

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c10379ed>] ? complete+0x28/0x36

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c1083661>] ? kswapd+0x0/0x3f5

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c104fa6d>] ? kthread+0x42/0x69

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c104fa2b>] ? kthread+0x0/0x69

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: [<c101a6e7>] ? kernel_thread_helper+0x7/0x10

Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: Code: 5f c3 8b 10 83 c8 ff f6 c2 01 0f 44 c2 c3 53 89 c1 8b 59 48 f6 c3 01 74 04 f3 90 eb f4 8b 41 40 8b 51 44 39 59 48 75 e9 5b c3 53 <8b> 50 04 8d 48 04 85 d2 74 13 8d 5a 01 89 d0 f0 0f b1 19 39 d0
Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: EIP: [<c1079f8e>] page_cache_get_speculative+0x1/0x29 SS:ESP 0068:f71fde58
Message from syslogd@a1 at Wed Oct 21 22:53:49 2009 ...
a1 kernel: CR2: 0000000002000004
190/4770
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At this point, I'm getting numb. Can't figure out what this is all about.

/var/log/syslog timestamps show the kernel definitely not happy:

Oct 21 22:53:49 a1 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/host1/target1:0:0/1:0:0:0/block/sdb/size
Oct 21 22:53:49 a1 kernel: Modules linked in:
Oct 21 22:53:49 a1 kernel:
Oct 21 22:53:49 a1 kernel: Pid: 357, comm: kswapd0 Not tainted (2.6.30.4 #2) GA-MA78GM-S2H
Oct 21 22:53:49 a1 kernel: EIP: 0060:[<c1079f8e>] EFLAGS: 00010213 CPU: 0
Oct 21 22:53:49 a1 kernel: EIP is at page_cache_get_speculative+0x1/0x29
Oct 21 22:53:49 a1 kernel: EAX: 02000000 EBX: 02000000 ECX: c2a210c4 EDX: 02000000
Oct 21 22:53:49 a1 kernel: ESI: 00000002 EDI: 00000002 EBP: f71fdeac ESP: f71fde58
Oct 21 22:53:49 a1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Oct 21 22:53:49 a1 kernel: Process kswapd0 (pid: 357, ti=f71fc000 task=f71b60a0 task.ti=f71fc000)
Oct 21 22:53:49 a1 kernel: Stack:
Oct 21 22:53:49 a1 kernel: 02000000 c107bfc0 f1c07f50 0000000e 00000000 00000003 00000000 f71fdea4
Oct 21 22:53:49 a1 kernel: 00000024 f1c07f4c c1081282 f71fdeac f1c07e9c 00000000 c1081683 0000000e
Oct 21 22:53:49 a1 kernel: 00000018 ffffffff 00000000 00000000 00000000 c2a210e0 c2a210c0 f1c05bf8
Oct 21 22:53:49 a1 kernel: Call Trace:
Oct 21 22:53:49 a1 kernel: [<c107bfc0>] ? find_get_pages+0x50/0x7b
Oct 21 22:53:49 a1 kernel: [<c1081282>] ? pagevec_lookup+0x17/0x1d
Oct 21 22:53:49 a1 kernel: [<c1081683>] ? __invalidate_mapping_pages+0xdb/0xf0
Oct 21 22:53:49 a1 kernel: [<c101a329>] ? common_interrupt+0x29/0x30
Oct 21 22:53:49 a1 kernel: [<c140ade3>] ? _spin_lock+0x5/0x7
Oct 21 22:53:49 a1 kernel: [<c108169f>] ? invalidate_mapping_pages+0x7/0x9
Oct 21 22:53:49 a1 kernel: [<c10ac3ab>] ? shrink_icache_memory+0xc6/0x1a2
Oct 21 22:53:49 a1 kernel: [<c1083308>] ? shrink_slab+0xdc/0x139
Oct 21 22:53:49 a1 kernel: [<c108395f>] ? kswapd+0x2fe/0x3f5
Oct 21 22:53:49 a1 kernel: [<c1081db2>] ? isolate_pages_global+0x0/0x186
Oct 21 22:53:49 a1 kernel: [<c104fb30>] ? autoremove_wake_function+0x0/0x2b
Oct 21 22:53:49 a1 kernel: [<c10379ed>] ? complete+0x28/0x36
Oct 21 22:53:49 a1 kernel: [<c1083661>] ? kswapd+0x0/0x3f5
Oct 21 22:53:49 a1 kernel: [<c104fa6d>] ? kthread+0x42/0x69
Oct 21 22:53:49 a1 kernel: [<c104fa2b>] ? kthread+0x0/0x69
Oct 21 22:53:49 a1 kernel: [<c101a6e7>] ? kernel_thread_helper+0x7/0x10
Oct 21 22:53:49 a1 kernel: Code: 5f c3 8b 10 83 c8 ff f6 c2 01 0f 44 c2 c3 53 89 c1 8b 59 48 f6 c3 01 74 04 f3 90 eb f4 8b 41 40 8b 51$
Oct 21 22:53:49 a1 kernel: EIP: [<c1079f8e>] page_cache_get_speculative+0x1/0x29 SS:ESP 0068:f71fde58
Oct 21 22:53:49 a1 kernel: CR2: 0000000002000004
Oct 21 22:53:49 a1 kernel: ---[ end trace bfac5149c665b21a ]---
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And to make a trifecta, I wondered if at this point I could even copy the live data-set on the main drive to a different directory on the same drive. No! --

root@a1:/home/foo# cp -r /mw /home/foo/2009Oct21/

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: Oops: 0002 [#2] SMP

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: last sysfs file: /sys/devices/virtual/block/ram9/range

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: Process cp (pid: 4783, ti=e0634000 task=f699f660 task.ti=e0634000)

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: Stack:

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: 00000000 e0635cc0 c29dd880 f25a854c 001200d2 e0635cc0 c1079fee c29dd880

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: f25a854c 00000001 c1082a42 e0635b54 00000000 e0635c30 010e2bbe 00000000

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: Call Trace:

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c10f4356>] ? journal_grab_journal_head+0xb/0x28

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c10f0917>] ? journal_try_to_free_buffers+0x49/0x16a

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1a1 kernel: [<c1082a42>] ? shrink_page_list+0x3b4/0x4fekernel: [<c1079fee>] ? try_to_release_page+0x38/0x4c


Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c1081e4b>] ? isolate_pages_global+0x99/0x186

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c1082d80>] ? shrink_list+0x1f4/0x417

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c10e2aac>] ? ext3_mark_iloc_dirty+0x2a3/0x30c

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1a1 kernel: [<c10e3a2d>] ? ext3_get_blocks_handle+0x78d/0x7a9kernel: [<c10eddcf>] ? __ext3_journal_dirty_metadata+0x13/0x32


Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c140ae5a>] ? _spin_lock_irq+0x12/0x15

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c10831b4>] ? shrink_zone+0x211/0x289

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c10834e1>] ? try_to_free_pages+0x17c/0x292

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c1081db2>] ? isolate_pages_global+0x0/0x186

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c107ea66>] ? __alloc_pages_internal+0x1f0/0x34b

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c107a054>] ? __page_cache_alloc+0xe/0x10

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c107c16a>] ? grab_cache_page_write_begin+0x4d/0x87

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c10e4965>] ? ext3_write_begin+0x67/0x19e

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c107b0f3>] ? generic_file_buffered_write+0xe1/0x272

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c107b68e>] ? __generic_file_aio_write_nolock+0x262/0x28c

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c10afca3>] ? mnt_want_write+0x43/0x4b

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c107bb4a>] ? generic_file_aio_read+0x492/0x4d9

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c107bdde>] ? generic_file_aio_write+0x5b/0xb2

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c10e14e1>] ? ext3_file_write+0x19/0x83

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c109e0c1>] ? do_sync_write+0xbe/0x105

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c10a0c02>] ? cp_new_stat64+0xc5/0xd7

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c104fb30>] ? autoremove_wake_function+0x0/0x2b

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c116ba89>] ? security_file_permission+0xc/0xd

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c109e003>] ? do_sync_write+0x0/0x105

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c109e76f>] ? vfs_write+0x73/0xa1

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c109e831>] ? sys_write+0x3c/0x60

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: [<c1019d4d>] ? syscall_call+0x7/0xb

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: Code: 39 c8 75 10 8b 42 28 31 d2 0f c8 21 f8 39 f8 0f 94 c2 eb 02 31 d2 5b 89 d0 5e 5f c3 89 c1 eb 0b f3 90 0f a3 0a 19 c0 85 c0 75 f5 <f0> 0f ab 0a 19 c0 85 c0 75 ed c3 89 c2 b8 17 00 00 00 eb dd 89

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: EIP: [<c10f431b>] bit_spin_lock+0xf/0x1a SS:ESP 0068:e0635ab0

Message from syslogd@a1 at Wed Oct 21 23:02:21 2009 ...
a1 kernel: CR2: 0000000000000000
Killed
root@a1:/home/foo#
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now it's Thurs. Oct. 22 and the server has been behaving strangely. Several times a day users will suddenly lose authentication with the samba server and not be able to access shares and data stores used by an accounting application.

What is the best course of action?
Do I simply need to do a fresh re-install?
Does this appear to be a hardware issue?
filesystem issue?
kernel issue?

I've been running a samba server on Slackware boxes since version 8 or 9 and I've never seen anything like this.
Any insights or guidance very much appreciated.
 
Old 10-24-2009, 02:40 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 10,460

Rep: Reputation: 623Reputation: 623Reputation: 623Reputation: 623Reputation: 623Reputation: 623
I don't see any evidence of a fsck or a memory check.
Did the UPS gracefully shutdown the server or did it fade out and (effectively) pull the rug out from under the server(s) ?. The oops on mkfs and problems whilst subsequently running would worry me - a lot.

I'd want to run tests for days to eliminate hardware - got a spare server or two ?.
 
Old 10-24-2009, 09:56 AM   #3
Sum1
Member
 
Registered: Jul 2007
Distribution: Slackware and Gentoo
Posts: 199

Original Poster
Rep: Reputation: 18
Smile

Quote:
Originally Posted by syg00 View Post
I don't see any evidence of a fsck or a memory check.
Did the UPS gracefully shutdown the server or did it fade out and (effectively) pull the rug out from under the server(s) ?.
That's what I was wondering, but I may never know at this point. I don't have linux-based monitoring software that works with my UPS.

Quote:
Originally Posted by syg00 View Post
The oops on mkfs and problems whilst subsequently running would worry me - a lot. I'd want to run tests for days to eliminate hardware - got a spare server or two ?.
I should have posted about that but I thought I was already scaring too many away with the length of the post. Since I was unable to write zeros or build ext3 filesystem on /dev/sdb storage drive, I disconnected it and commented "out" the device mapping in /etc/fstab.

After disconnecting the storage drive, I rebooted the server using the Slack 12.2 Install Disc 1, and ran fsck /dev/sda2 on / . There were many fixes performed on superblocks, inodes, blocks, "multiply-shared" this and that --- too much to write down under time constraints. Yesterday (Friday), after all employees left, I did the same fsck 5 times in a row and it finishes in 1 or 2 seconds stating "Clean" and listed the number blocks and sectors on /dev/sda2 / (root filesystem).

As I write today (Saturday morn.), I'm ssh'ed into the server and thus far completing a successful tar -cvjf /home/foo/2009Oct24/samba_serv.tar.bz2 samba_dir/ .

It would appear everything is running normally again, but I have no idea why. Most would say, "It's obviously a problem with the storage drive." But I'm still left wondering why the fsck on the / (root partition on main drive) demonstrated a need to fix many errors there too. And why all the kernel errors too? I may never know.

But thanks Syg00, I really appreciate you taking the time to review the problem. If there are any other tests you think may be useful, please let me know.
Best regards.
 
Old 10-24-2009, 05:33 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 10,460

Rep: Reputation: 623Reputation: 623Reputation: 623Reputation: 623Reputation: 623Reputation: 623
Quote:
Originally Posted by Sum1 View Post
I don't have linux-based monitoring software that works with my UPS.
I'd presume that meant a hard crash - especially if there was any work (I/O) going on.
Quote:
... ran fsck /dev/sda2 on / . There were many fixes performed on superblocks, inodes, blocks, "multiply-shared" this and that ...
That spells a broken filesystem or two. Unfortunately you'll never know how broken. Lots can be recovered by fsck, other things like cross linked chains are merely truncated, and the evidence tossed away (if you tell fsck to proceed). I don't even back the data up after such an event, because you can't have any confidence that all the data inode chains are valid.

Rebuild from backup would be my normal answer, but after a period of days that may be untenable.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
loading the ext3 filesystem module into the kernel kaplan71 Linux - Software 1 02-16-2009 04:23 PM
ext3 filesystem over iscsi errors infinitiguy Linux - Server 2 02-20-2008 08:29 AM
LXer: Linux ext3 Filesystem" Optimize Directories / File Access Time LXer Syndicated Linux News 0 02-01-2008 09:10 AM
Using the ext3 filesystem in 2.4 kernel powah Linux - General 2 04-21-2005 11:55 PM
New kernel and ext3 filesystem Thymox Linux - Software 3 12-03-2001 08:43 AM


All times are GMT -5. The time now is 06:02 AM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration