Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
08-19-2007, 10:31 AM
|
#1
|
|
LQ Newbie
Registered: Aug 2007
Location: Devon, UK
Distribution: Xubuntu, Crux
Posts: 18
Rep:
|
File corruption in kernels 2.6.18 thru 2.6.22.2
I have been upgrading my kernel from 2.6.14 to get at the new wireless
stuff, but have hit a big problem.
When copying files, I get random (?) corruption. This seems to affect other file I/O as I can't compile either because the assembler gets stray characters in the source and fall over!
A typical example:
cp ../Changelog-2.6.18 .
diff Changelog-2.6.18 ../Changelog-2.6.18
16377c16377
< LD .top_vmlinux1
---
> LD .tmp_vmlinux1
23954c23954
< Date: Mon Jul 1% 04:45:11 2006 -0700
---
> Date: Mon Jul 10 04:45:11 2006 -0700
24955c24955
< This is generally useful, but partacularly helps see if it is the same sector
---
> This is generally useful, but particularly helps see if it is the same sector
31879c31879
< [MMC] sdhci: version bump cdhci
---
> [MMC] sdhci: version bump sdhci
42955c42955
< Replace `he temp makefile hacks with proper CONFIG entries, which are also
---
> Replace the temp makefile hacks with proper CONFIG entries, which are also
49050c49050
< and this task is(already holding:
---
> and this task is already holding:
[output clipped]
Everything is O.K using kernel 2.6.17.14, but 2.6.18 and beyond all have
this problem. It may take me months to sort this out on my own - any
suggestions? I can't seem to find anyone else with this problem...
|
|
|
|
08-20-2007, 07:30 AM
|
#2
|
|
Senior Member
Registered: Oct 2003
Posts: 2,280
Rep: 
|
What filesystem are you using? IIRC there were some corruption problems with certain kernels and the xfs filesystem.
|
|
|
|
08-20-2007, 11:39 AM
|
#3
|
|
LQ Newbie
Registered: Aug 2007
Location: Devon, UK
Distribution: Xubuntu, Crux
Posts: 18
Original Poster
Rep:
|
I get corruption with ext2, VFAT and NTFS copies so I don't think that's a factor. I am puzzled that no-one else seems to have hit this problem, but I can't find find anything wrong with my kernel config and I have even tried building an LFS-6.3-rc1 system to be sure that it was nothing about my current LFS setup but I still get the same file corruption.
|
|
|
|
08-20-2007, 12:05 PM
|
#4
|
|
Member
Registered: Dec 2006
Distribution: Slackware 11
Posts: 144
Rep:
|
2.6.18 had a nasty bug related to ext3 and filesystem corruption. It was fixed some time around 2.6.21 iirc.
Run memtest86 on your RAM. That could very well be the source of your problem.
|
|
|
|
08-21-2007, 04:32 AM
|
#5
|
|
Member
Registered: Aug 2007
Location: Switzerland
Distribution: Gentoo
Posts: 566
Rep:
|
Are you sure it's not the harddisk? If memtest86 is happy and if you already have the smartmontools installed (<http://gentoo-wiki.com/HOWTO_Monitor_your_hard_disk(s)_with_smartmontools>), have a look if SMART shows any problems with the harddisk.
|
|
|
|
08-21-2007, 01:37 PM
|
#6
|
|
LQ Newbie
Registered: Aug 2007
Location: Devon, UK
Distribution: Xubuntu, Crux
Posts: 18
Original Poster
Rep:
|
I have run memtest86 with no problems, and I'll give the hard disk HOWTO a try, but since I can run 2.6.17.14 and previously 2.6.14 with no problems (and also WindowsXP I can dual-boot into) I don't think my machine is faulty, or perhaps something has been tweaked in 2.6.18 that my machine doesn't like??
I can only see that I will have look into the glibc source, since the "read" and "write" functions are called from "cp" to do the work of copying and hope to find a clue there.
Hoping for inspiration!
|
|
|
|
08-21-2007, 03:13 PM
|
#7
|
|
Member
Registered: Aug 2007
Location: Switzerland
Distribution: Gentoo
Posts: 566
Rep:
|
I have to admit that your problem, is very interesting. And thinking about what I wrote about SMART, I foresee that it probably won't solve a thing (forgetting about Windows being ok, which resides for sure in its own partition which could not be affected), but it's for sure worth a try.
And it becomes even more interesting now that I remembered that in the 90' I built my first PC on my own, installed Windows 98 (installation worked fine) and that, when after rebooting I tried to copy files, the target ones were slightly different than the source - exactly what you are experiencing.  BUT:if, after copying the file I rebooted the PC, the target file was back in a perfect shape - or perhaps the first file I copied after the reboot was fine, but not the next ones - sorry, it was a long time ago - but I'm sure that only 1/2 of the stuff didn't work.
Unluckily I didn't manage to solve the problem and the only solution was to replace the motherboard. But in Linux the analysis could be easier. Did you try to copy something using "dd"? Or perhaps using a file manager like krusader to copy the files? Anything that does not use cp? Or something like "cat sourcefile > targetfile"?
|
|
|
|
08-22-2007, 07:39 AM
|
#8
|
|
Member
Registered: Dec 2006
Distribution: Slackware 11
Posts: 144
Rep:
|
Again, the ext3 corruption bug introduced in 2.6.18 may not be the cause, but here's a link with some more background:
http://lwn.net/Articles/215113/
|
|
|
|
08-22-2007, 09:42 AM
|
#9
|
|
Member
Registered: Aug 2007
Location: Switzerland
Distribution: Gentoo
Posts: 566
Rep:
|
Reading Jeenam's link made me think that another good test would be perhaps to read & copy something from/to a filesystem which is mounted using synchroneous access (e.g. with "mount -vo sync /dev/yourdevice /mnt/yourmountpoint"), so that written and read stuff does not get cached.
If things work fine in there you could afterwards try to read from a normal filesystem and write to the non-cached-one, and do as well the opposite.
|
|
|
|
08-22-2007, 10:07 AM
|
#10
|
|
Senior Member
Registered: Sep 2003
Posts: 3,171
Rep: 
|
When you have a problem that no one else seems to have, you need to ask yourself two questions, in this order: (1) What is peculiar about my configuration, something that affects no one else or (2) what could be wrong with my system.
Now presuming that you don't have kernel that you yourself have modified (making it peculiar to you), then you need to take a look at your system.
In your situation I would start by looking at the hard drive, the hard drive controller, and the power supply. I wouldn't look first at RAM because I would think that a problem affecting RAM would be showing up in other things as well, not merely kernel compiles and copies.
If you have a copy of SpinRite, I would definitely be running that on the HD. I also would be monitoring the +5 and +12 voltages to the HD, looking for a borderline condition or evidence of regulation failure.
I would be turning on all logging of HD I/O, running any appropriate manufacturer's diagnostic tools on the HD, and doing test cases of files read/written looking for some evidence that I could pin down.
I have (rarely) seen problems like yours. Usually turns out to be the HD, the controller, or the PS. Last time I saw the problem, it turned out to be a defective Adaptec 29160 controller card.
|
|
|
|
08-23-2007, 01:49 PM
|
#11
|
|
LQ Newbie
Registered: Aug 2007
Location: Devon, UK
Distribution: Xubuntu, Crux
Posts: 18
Original Poster
Rep:
|
Thanks for all the feedback, I will reply again once I have had a chance to try all your suggestions.
|
|
|
|
08-27-2007, 03:26 PM
|
#12
|
|
LQ Newbie
Registered: Aug 2007
Location: Devon, UK
Distribution: Xubuntu, Crux
Posts: 18
Original Poster
Rep:
|
Update - I've tried to do read/write with "dd" and mount my filesystems "sync" but still get file corruption with all kernels after 2.6.17.14. I've run manufacturer HD diagnostic utility with no errors, and run memtest86+ with no errors. I can see no pattern at all in the file corruption and so I think I'll have to stick with 2.6.17 until I can afford a new machine!
|
|
|
|
08-28-2007, 08:15 AM
|
#13
|
|
Member
Registered: Aug 2007
Location: Switzerland
Distribution: Gentoo
Posts: 566
Rep:
|
Wait! Don't give up! You have to go on beating your head against the wall until something happens - to your head or to the wall.
One thing I remember now is that the speed of the data transfer between HD and controller might generate random errors if it is set too high. Upgrading the kernel version might, if the drivers for the HD controller are different between the two versions, make the machine for some reason set the transfer speed using the new kernel higher as compared to the speed that was set with the older kernel.
Therefore, a very fast thingy you could do is to first boot the machine with the older kernel and have a look with...
Code:
dmesg | grep -i udma
...which speed is finally used by the controller to speak to the harddisks. On my machine it looks like this:
Quote:
localhost ~ # dmesg | grep -i udma
hda: ATAPI 24X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache, UDMA(33)
ata1: SATA max UDMA/133 cmd 0xf8cbed00 ctl 0x00000000 bmdma 0x00000000 irq 21
ata2: SATA max UDMA/133 cmd 0xf8cbed80 ctl 0x00000000 bmdma 0x00000000 irq 21
ata3: SATA max UDMA/133 cmd 0xf8cbee00 ctl 0x00000000 bmdma 0x00000000 irq 21
ata4: SATA max UDMA/133 cmd 0xf8cbee80 ctl 0x00000000 bmdma 0x00000000 irq 21
ata1.00: ATA-7: FUJITSU MHT2080BH, 0000104A, max UDMA/100
ata1.00: configured for UDMA/100
ata3.00: ATA-7: FUJITSU MHT2080BH, 0000104A, max UDMA/100
ata3.00: configured for UDMA/100
|
As you can see the first line shows that my DVDWriter is set to UDMA33, the next four show that the controller is able to handle a speed up to UDMA133 and the last four show that my two harddisks are finally spoken to using UDMA100 - this is what counts.
Do this as well with your new kernel and compare the values. If the ones used by the new kernel are higher, we'll have to see how to set them to a lower value.
By the way, I don't know if you're using serial ata harddisks as in my case or older parallel ata and don't even know if you're using UltraDMA or PIO-mode. If this is the case, the string you have to search changes - don't remember what dmesg shows with PATA HDs. It might even be that your old kernel uses PIO, while the new one uses UDMA. 
|
|
|
|
08-30-2007, 12:34 PM
|
#14
|
|
LQ Newbie
Registered: Aug 2007
Location: Devon, UK
Distribution: Xubuntu, Crux
Posts: 18
Original Poster
Rep:
|
If I bang my head against the wall many more times it will break! I have compared the IDE settings across kernel versions and found no differences.
kernel 2.6.17.14 gives:
Quote:
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SIS5513: IDE controller at PCI slot 0000:00:02.5
ACPI: PCI Interrupt 0000:00:02.5[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
SIS5513: chipset revision 0
SIS5513: not 100% native mode: will probe irqs later
SIS5513: SiS 962/963 MuTIOL IDE UDMA133 controller
ide0: BM-DMA at 0x1000-0x1007, BIOS settings: hda: DMA, hdb: pio
ide1: BM-DMA at 0x1008-0x100f, BIOS settings: hdc: DMA, hdd: pio
Probing IDE interface ide0...
hda: FUJITSU MHT2060AT, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: SONY CD-RW/DVD-ROM CRX830E, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 117210240 sectors (60011 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
hda: hda1 hda2 < hda5 hda6 hda7 hda8 hda9 >
hdc: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
|
kernel 2.6.22.5 gives:
Quote:
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SIS5513: IDE controller at PCI slot 0000:00:02.5
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11
ACPI: PCI Interrupt 0000:00:02.5[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
SIS5513: chipset revision 0
SIS5513: not 100% native mode: will probe irqs later
SIS5513: SiS 962/963 MuTIOL IDE UDMA133 controller
ide0: BM-DMA at 0x1000-0x1007, BIOS settings: hda: DMA, hdb: pio
ide1: BM-DMA at 0x1008-0x100f, BIOS settings: hdc: DMA, hdd: pio
Probing IDE interface ide0...
hda: FUJITSU MHT2060AT, ATA DISK drive
hda: selected mode 0x45
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: SONY CD-RW/DVD-ROM CRX830E, ATAPI CD/DVD-ROM drive
hdc: selected mode 0x42
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 117210240 sectors (60011 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
hda: hda1 hda2 < hda5 hda6 hda7 hda8 hda9 >
hdc: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
|
So I am still totally puzzled, surely it is a kernel bug (perhaps only affecting my machine) ? I have tried to cut out all irrelevant kernel config options but the error remains...
Last edited by NeilR; 08-30-2007 at 12:36 PM.
Reason: Get rid of smileys!
|
|
|
|
08-31-2007, 06:07 AM
|
#15
|
|
Member
Registered: Aug 2007
Location: Switzerland
Distribution: Gentoo
Posts: 566
Rep:
|
Bah, I temporarily give up and go wash my dishes. If you have some more time to waste you could additionally try to compare what hdparm brings back - in my case:
Code:
localhost ~ # hdparm /dev/sdb
/dev/sdb:
IO_support = 0 (default 16-bit)
readonly = 0 (off)
readahead = 256 (on)
geometry = 9729/255/63, sectors = 156301488, start = 0
In you case the output will be different and more detailed, as you're not using SATA (sdX), but PATA (hdX) harddisks.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 04:28 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|