LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   CDROM data corruption? (https://www.linuxquestions.org/questions/linux-hardware-18/cdrom-data-corruption-325150/)

PFudd 05-19-2005 05:33 PM

CDROM data corruption?
 
I was installing Fedora Core 4 rc3 and the installation died part way through with a generic error message (your hard drive is full, or something else went wrong). Looking on the various consoles (Ctrl-Alt-F1 through F5) didn't tell me anything; typing 'df -h' at the shell prompt (Alt-F2) showed I had lots of room. The log files in /root showed nothing beyond a 'cpio read' error.

I checked the media and it reported it was ok.

I then copied all the files from the cds to the hard drive with a plan to run the 'install from hard drive' option.

Of course, after doing this I find that 'install from hard drive' wants iso images, not the files from those images. Rather than take the time to copy the images to the hard drive, I move the hard drive to my other computer, copy all the files into a directory and make it accessible via http, and then move the hard drive back to the first computer.

The 'install from http' option dies part way through, but at a different place. I check the rpm file it died on (rpm --verify -p the_random_rpm_file.rpm) and sure enough, it fails its checksum.

At this point I could just copy the file again from the cdrom to the web server (their md5sums are different), but I wanted to find out what kind of corruption occurred.

So I wrote a perl program to md5sum every 512-byte block and tell me a) which blocks are different between the good and bad files, and b) where the bad blocks came from.

The results were interesting. A four-block section had been deleted from the middle of the bad file, a bunch of data slid to fill up that hole, and a four-block section was duplicated at the end of the section to fill in that hole.

For those that like numbers, blocks 1024-1027 were deleted, blocks 1216-1219 were duplicated, and the stuff in between was just moved sideways by four blocks. Isn't 2048 the size of a cdrom sector?

My question is: is this a sign that I should throw out
a) the cdrom drive,
b) the memory, or
c) the motherboard.

I'm leaning towards it being a cdrom problem.

What do you think?

Here's the perl program, if you're interested:

Code:

#!/usr/bin/perl -w
use strict;
use Digest::MD5 qw(md5 md5_hex);

if (@ARGV<2) {
  die("Usage: $0 file1 file2\n".
      "  This will show differences between two binary files.\n");
}

my $FILE1=$ARGV[0];
if ( ! -e $FILE1 ) { die("$FILE1 doesn't exist!\n");}
if ( ! -f $FILE1 ) { die("$FILE1 isn't a file!\n");}
if ( ! -r $FILE1 ) { die("$FILE1 isn't readable!\n");}

my $FILE2=$ARGV[1];
if ( ! -e $FILE2 ) { die("$FILE2 doesn't exist!\n");}
if ( ! -f $FILE2 ) { die("$FILE2 isn't a file!\n");}
if ( ! -r $FILE2 ) { die("$FILE2 isn't readable!\n");}

my $size1=-s $FILE1;
my $size2=-s $FILE2;
if ($size1==0) {die("$FILE1 is empty");}
if ($size2==0) {die("$FILE2 is empty");}

my %sums;
my $count=0;
my $pos1=0;
my $pos2=0;
PrivoxyWindowOpen(IN1,"<$FILE1") or die("$FILE1: $!");
PrivoxyWindowOpen(IN2,"<$FILE2") or die("$FILE2: $!");
while (1) {
  my $block1="";
  my $block2="";
  $pos1=tell(IN1)/512;
  $pos2=tell(IN2)/512;
  my $status1=read(IN1,$block1,512);
  my $status2=read(IN2,$block2,512);
  if (! defined ($status1)) {die("$FILE1: $!\n");}
  if (! defined ($status2)) {die("$FILE2: $!\n");}
  if ($status1==0 or $status2==0) {last;}
  my $sm1=md5_hex($block1);
  my $sm2=md5_hex($block2);
  if ($sm1 ne $sm2) {
    $sums{$sm1}.=" FILE1($pos1)";
    $sums{$sm2}.=" FILE2($pos2)";
  } else {
    $sums{$sm1}.=" BOTH($pos1)";
  }
}
close(IN2);
close(IN1);

foreach (sort values %sums) {
  if (m/ FILE/) {print "Problem at $_\n";}
}


Electro 05-21-2005 03:27 AM

Software designated as rcX are unreliable or experimental versions. Why are you showing us this? I suggest either download the ISO image again or burn at a much slower rate like 1X. I also run md5sum on the ISO image before writing it to a disc.

PFudd 05-21-2005 03:04 PM

Data corruption.
 
>Software designated as rcX are unreliable or experimental versions. Why are you showing us this?

What kind of question is that?

Because rc3 is going to be the released version if they don't find any more bugs.
Because someone has to check for bugs.
Because I want to see what's going to be in the released version.
Because the only software being tested at this point is Anaconda and the linux kernel, both of which have been thoroughly tested.

> I suggest either download the ISO image again or burn at a much slower rate like 1X.

The ISO is fine. It passes the media test perfectly, as reported in the original message. The install was failing on random rpm files each time, which points to hardware failure, not burning failure or software failure.

> I also run md5sum on the ISO image before writing it to a disc.

This was done as well, and passed perfectly, although this is redundant if the media test passes, since the media test checksum is generated and put into the iso when the iso is created, back at redhat.

It turns out that the cdrom drive was the problem. It caused data corruption by:
1) Skipping a sector and
2) Not detecting the error or
3) Not reporting the error

This is the kind of corruption that cdparanoia tries to catch and correct when reading audio cds.

So, why am I showing you this?

Because I wanted to see if anyone else had an expert opinion (not yet, apparently),
because I wanted to show off the kind of errors that cdrom drives can produce (subtle ones),
because I wanted to point out what the Fedora installer does when a cdrom drive skips a sector (a generic error),
and because I wrote a short program (that I felt like sharing) that happened to discover and highlight (in flashing neon) exactly what went wrong.

Also, if I misplace that short program, I'll be able to find it here instead of having to rewrite it.

Oh, and the right command for checking rpm signatures is 'rpm -K somefile.rpm'; using --verify is for after they're installed, although it will spot corruption in the rpm file as a side effect if you use -p.

The experience has made me smarter; I don't think it's helped you at all.

PFudd 05-21-2005 03:23 PM

One more thing
 
I forgot to mention that I ran memtest86 for 12 hours and found no errors.

I also managed to install successfully using http.

Since I only got errors when reading from cdrom, and only when using this one particular cdrom drive, I'm pretty sure that the motherboard and ram are ok.

A couple of years ago, I had the same kind of failure (generic Anaconda error message), but it was in the same place (same RPM) every time and yet the cdroms passed the media test and installing the rpm by hand worked. It turned out to be bad memory, but it was not detectable until you've run memtest86 for at least an hour. I've kept that stick of memory in my bag ever since, just in case.

Electro 05-21-2005 04:40 PM

Doing the md5sum should be done after a huge file transfer to make sure it is correct at the destination. If using BitTorrent, it does it instantly. Redhat does not control the versions of Fedora. Fedora is controlled by Redhat fans.

Writing to a CD disc at 1X speed is four times better than writing at 4X. Also Writing to a CD disc at 1X speed is forty times better than writing at 40X. This is because the circular velocity at high RPM warps the disc so this warping adds distortion to the burnt pits. The vibrations from the motor adds to the distortion too.

If you have a crappy CD-ROM drive that is your own fault. Spend more money on better hardware.

BTW, You should post it to the Fedora mailing list instead of here.

PFudd 05-21-2005 05:19 PM

Whoa, nice attitude, dude.

The software is not release quality, so I'm not going to put it on a production machine. So, yes, crappy hardware.

Funny you should mention BitTorrent, that is indeed how it was transferred.

And while it is true that this was a Fedora install, this exact same problem would have occurred with Slackware, Mandrake, BlueCheese, PurpleGoat, or for that matter Windows 2038.

The question is: is it crappy ram, a crappy motherboard, or a crappy cdrom drive?

The speed of the burn was irrelevant; all sectors are there, in the right order, with big enough pits, or else the media check wouldn't have passed every time.

The fact the bad rpm changes each time indicates that it's probably not a ram problem.

The fact that installation was possible once the cdrom was taken out of the picture indicates that neither the motherboard nor the ram was the problem.

The correct answer is a crappy cdrom drive.

You gave poor answers, showing that you didn't think it through.

You've been here since 2002 and posting twice a day on average? Wow. You must be burnt out.

doox00 05-21-2005 07:07 PM

I was having troubles installing mandrake 10.1 until I reburned the iso's using alcohol 120% would not work for some reason with nero

PFudd 05-21-2005 11:41 PM

Nero vs Alcohol 120%
 
That's interesting; did they burn at the same speed in the same drive?

There are two times when I have problems: when using cdr's that have been sitting on the shelf for a few years and when using the burner in my dell laptop.

Cdr's have a maximum lifespan, after which they're no good. See http://www.cdrfaq.org/faq07.html#S7-5 and http://www.clir.org/pubs/reports/pub121/sec4.html (a US government report). Manufacturers say that cdr's have a life expectancy of 5 to 10 years before recording; I can't see why they'd last any longer after recording.

And for some reason, cdrs burned on my dell laptop can only be read on that laptop. When that happened to floppy drives, it was because the drive was out of alignment; I guess something equivalent is happening here.

doox00 05-21-2005 11:46 PM

was the same drive for me. burned twice with nero (same speed, 10x) had problems.. burned a 3rd time with alcohol 120% and had no issues. were on the same cd's as well, am using cdrw's.

PFudd 05-22-2005 02:53 AM

cdrom issues
 
Neat. Well, now you know what works with your computer and drive. I can't explain it, but if it works, do it.


All times are GMT -5. The time now is 06:58 AM.