Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: Fedora, CentOS, and would like to get back to Gentoo
Posts: 332
Rep:
Constant .tar.bz2 data corruption
Hi Group,
About a month ago I noticed several problems with my backups on my Samba server. There's 2 HD's on the server: /dev/sda holds the live data accessed by the users, and /dev/sdb holds all the foo.tar.bz2 daily backups. /dev/sdb is mounted as "/archive" in /etc/fstab.
Both HD's use the ext3 filesystem and specify filesystem parameters: defaults,noatime,data=writeback
1.I've fixed the root filesystem on /dev/sda and it's running fine.
2.I've written zeros to /dev/sdb using dd and freshly created the ext3 filesystem.
3.I have a bash scipt that runs daily as a cron job and it creates a .tar.bz2 of all the user data on the samba server. Seems to execute fine and squashes 20 gigs. into a 12 gig. file.
4.PROBLEM: Every test of every bzipped archive fails. When I try to decompress and unarchive the data, there's error messages about corrupted data, and maybe it can be recovered using bzip2recover. Relying on bzip2recover for all my user data is not a good long-term plan.
So for now, I'm simply making nightly copies of the live user data to /dev/sdb , but I definitely miss the space savings afforded by bzip2 compression.
Is there anything I can try to further test why the compressed data gets corrupted?
Nothing in your post is enough to identify the problem. Though you might want to look at a more robust filesystem then EXT3.
Do the kernel or system logs mention anything while the script runs? Can you post the script here in CODE tags so we can look it over for a possible glitch? Matter of fact, has the script ever worked as intended, or have you been able to run it on a different system?
Distribution: Fedora, CentOS, and would like to get back to Gentoo
Posts: 332
Original Poster
Rep:
Quote:
Originally Posted by MS3FGX
Though you might want to look at a more robust filesystem then EXT3.
What would you suggest?
Quote:
Do the kernel or system logs mention anything while the script runs?
No.
Quote:
Can you post the script here in CODE tags so we can look it over for a possible glitch?
Here it is, very simple stuff:
#!/bin/bash
echo "1. Make a date-stamped storage directory."
cd /archive
dirname=`date +"%Y%b%d"`
mkdir $dirname
cd $dirname
#
echo "2. Build archive and place into storage directory."
tar -cjf foo.tar.bz2 /foo
#
echo "3. Completed."
Quote:
Matter of fact, has the script ever worked as intended, or have you been able to run it on a different system?
The script starts, runs, and completes with no errors every night and produces the desired .tar.bz2 file. The only problem is once I unpack the .tar.bz2 file, it fails half way through with errors.
Thanks for your help, MS3.
I'm ready to follow up on any suggestions you may have.
you might want to look at a more robust filesystem then EXT3.
If you search LQ you'll find ten times more problems with for example Reiser than with Ext. Leaves me wondering what your definition of "robust" here would be?..
Quote:
Originally Posted by Sum1
About a month ago
What changed in the system at that time software and configuration-wise?
Quote:
Originally Posted by Sum1
I've fixed the root filesystem on /dev/sda and it's running fine.
What exactly happened to require fixing the filesystem?
Quote:
Originally Posted by Sum1
I have a bash scipt that runs daily as a cron job and it creates a .tar.bz2
Which user does it run as?
Quote:
Originally Posted by Sum1
But when I test extraction of data (tar -xf foo.tar), I receive the following error
Can you at least list contents with 'tar -vtf foo.tar'? And of a compressed tarball? If it fails to complete listing contents then at what point (file or dir) does it fail and could you verbosely list the contents of the directory here? (Please use BB code tags.) Does it happen with small tarballs as well? And does, like chrism01 suggested, gzip work for you? Did you ever find traces of memory corruption in other processes? Does the machine have enough RAM and swap? You list mount flags. What happens if you remount with only "defaults" and test again? Else how about meanwhile using 'rsync' between the two disks? Please note there's thirteen questions here. You may or may not be able to answer them all but being as verbose as possible is good: the more information the better.
Last edited by unSpawn; 01-09-2010 at 06:05 AM.
Reason: //More *is* more.
Can you at least list contents with 'tar -vtf foo.tar'? And of a compressed tarball?
I'm logged in via ssh to the server as I write.
Creating a new foo.tar, and will try 'tar -vtf'.
Results: Two attempts and two failures.
"tar: Exiting with failure status due to previous errors"
Interesting note here, the failure occurred in the exact same place in the directory system both times. I can look at that further -- move this sub-directory to a different partition and try process again, etc.
Quote:
Does it happen with small tarballs as well?
No, I've recently tried a few .tar files about 1 Gig. in size (using data from within /foo) and those were successfully tar'ed and extracted without problems.
Quote:
And does, like chrism01 suggested, gzip work for you?
Will test .tar first, and then move on to gzip and bzip.
Quote:
Did you ever find traces of memory corruption in other processes?
Definitely not sure how to look for, or determine this.
Quote:
Does the machine have enough RAM and swap?
I believe so -- 4 gigs. of DDR2 800 RAM but only 3 gigs. of it is recognized due to using 32-bit Slackware 12.2 on this box. Have 2 Gigs. of Swap and it seemingly never gets used -- no matter what process is running, 'top' always reports 0k used for Swap. The server has no more than 30 users at any given time.
Quote:
You list mount flags. What happens if you remount with only "defaults" and test again?
I will plan an evening to give this a try. Server access is relied upon 7 days a week from 7 am - 7 pm.
Quote:
Else how about meanwhile using 'rsync' between the two disks?
I've heard about this and maybe it's time to try it. If I can make separate daily "syncs" equivalent to these .tar files, then I'll gladly opt for it. I need to RTFM along these lines.
There's definitely a problem with either the filesystem or the drive. If you're getting that kind of error with bzip2, you'll get it with any other compression program.
Is the kernel logging any filesystem or scsi/ide errors? I would expect an IO error to derail the tar as it was writing, but it's worth checking.
Next, I would rule out a bug in ext3. Use ext2fs on the drive receiving the tarball and see if the problem persists. If it does, I'd junk the drive.
This is probably a dumb question, but I assume you're running a 2.6 kernel with large file support? If not, or if you have an old glibc that doesnt support LFS, then bzip2 will receive a SIGFX once it writes 2gb of output, which would cause it to fail similarly to what you described. LFS has been around for a long time, so I doubt thats the problem, but it can happen.
Distribution: Fedora, CentOS, and would like to get back to Gentoo
Posts: 332
Original Poster
Rep:
Quote:
Is the kernel logging any filesystem or scsi/ide errors? I would expect an IO error to derail the tar as it was writing, but it's worth checking.
Mr. Goose :-)
Thanks too for your help.
I'm not sure if I'm looking in the right logs, or whether I have activated logging the right stuff...but I've checked through /var/log/messages and dmesg and syslog, and I can't find any error messages relating to IO activity.
Quote:
Next, I would rule out a bug in ext3. Use ext2fs on the drive receiving the tarball and see if the problem persists. If it does, I'd junk the drive.
I like the thinking and I'm beginning to suspect the drive itself since I just wiped it with zeros, installed the partition and ext3 filesystem only a month ago.
I'll blend your suggestion with UnSpawn's:
remount with ext3 defaults and test;
rebuild with ext2 and test;
install a different hard drive altogether and test.
Quote:
I assume you're running a 2.6 kernel with large file support? If not, or if you have an old glibc that doesnt support LFS, then bzip2 will receive a SIGFX once it writes 2gb of output, which would cause it to fail similarly to what you described.
Currently, using kernel 2.6.30.4
I checked my kernel config and it does show "Support for large block devices and files" built into the kernel.
Thanks again for your help.
I've got quite a bit of testing to do.
Auch. Unfortunately the thread doesn't show you determining and fixing what was wrong.
Quote:
Originally Posted by Sum1
"tar: Exiting with failure status due to previous errors"
Sometimes noting the error value ('tar --do-Something; echo $?') might help.
Quote:
Originally Posted by Sum1
Interesting note here, the failure occurred in the exact same place in the directory system both times. I can look at that further -- move this sub-directory to a different partition and try process again, etc.
Let us know.
Quote:
Originally Posted by Sum1
No, I've recently tried a few .tar files about 1 Gig. in size (using data from within /foo) and those were successfully tar'ed and extracted without problems.
Could try running tar through 'split' to come up with chunked archives?
Quote:
Originally Posted by Sum1
Definitely not sure how to look for, or determine this.
Unexplainable crashes, applications failing?
Quote:
Originally Posted by Sum1
I've heard about this and maybe it's time to try it. If I can make separate daily "syncs" equivalent to these .tar files, then I'll gladly opt for it. I need to RTFM along these lines.
...and search LQ. We've definitely got some threads on rsync. It isn't hard to use.
Quote:
Originally Posted by GooseYArd
This is probably a dumb question, but I assume you're running a 2.6 kernel with large file support? If not, or if you have an old glibc that doesnt support LFS, then bzip2 will receive a SIGFX once it writes 2gb of output, which would cause it to fail similarly to what you described. LFS has been around for a long time, so I doubt thats the problem, but it can happen.
I always thought LFS was a kernel 2.4 thing?.. BTW there is a 16 GB file-size limit if ext3 uses a 1 KB block-size, but the default is 4 KB anyway...
Distribution: Fedora, CentOS, and would like to get back to Gentoo
Posts: 332
Original Poster
Rep:
Quote:
Originally Posted by unSpawn
Auch. Unfortunately the thread doesn't show you determining and fixing what was wrong.
Quote:
Let us know how it's going, OK?
I believe I can mark this thread "Solved."
1. Test Results
After multiple, many, repeated tests, using both server HD's in the box, I can report the following: regardless of tar, tar + gzip compression, or tar + bzip compression, there's always 2 or 3 corrupted areas of data that produce fatal errors when trying to recover/unpack the contents of the .tar. Depending on which of the 3 archiving methods employed, the errors are reproduced in different places in the data set.
2. Conclusion - (best efforts of deduction)
I must have committed an error while using tune2fs back in July 2009.
In another thread, I reported:
Quote:
I started with this ext3 setup in /etc/fstab:
/dev/sda2 / ext3 defaults 1 1
I changed /etc/fstab to:
/dev/sda2 / ext3 defaults,noatime,data=writeback 1 1
And then executed command on root partition:
tune2fs -o journal_data_writeback /dev/sda2
Works.
No data loss.
No problems.
In doing so, I may have made an error in stating a tune2fs command. Or possibly, I may not have properly unmounted the partition/filesystem prior to executing tune2fs commands. I may have remounted the partition in another terminal and forgot about it while executing commands in a different terminal. I'll never know for sure, but it seems like the only logical answer.
It would seem highly unlikely that both my server hard drives are failing. I have 30 users reading/writing to them no less than 12 hours a day, and I have not received any comments/complaints about lost files or inability to access files, corrupted files.....nothing at all. Believe me, they are not shy, and would gladly let me know of such occurrences. <grin>
I feel fortunate it's not a whole lot worse - 99.9% of the data is not corrupt. I've been backing up the data nightly by way of 'cp -p -r /foo /archive/date-stamped-directory/foo'. And then I run a bash script I made to diff and compare the copied files and directories in multiple ways.
Eventually, I'll have to delete all partitions and create new ones with cleanly configured ext3 or ext4 file systems.
UnSpawn, I truly appreciate the solid guidance and prompts to help me work through it in a logical way.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.