LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (http://www.linuxquestions.org/questions/slackware-14/)
-   -   Slack 10.0 GNU tar --multi-volume on LTO-3 doesn't verify for me (http://www.linuxquestions.org/questions/slackware-14/slack-10-0-gnu-tar-multi-volume-on-lto-3-doesnt-verify-for-me-598466/)

petcherd 11-09-2007 07:20 PM

Slack 10.0 GNU tar --multi-volume on LTO-3 doesn't verify for me
 
I'm currently running an inherited SlackWare 10.0 box, with my tar package updated to "tar-1.16-i486-1_slack10.0" Once I get things working, I hope to rebuild my whole server with SlackWare 12 and start keeping all of my packages up-to-date. I'd like to use GNU tar rather than Amanda or Bacula or some commercial package because my archiving dataset is much larger than a single 800 GB tape and because tar is more broadly readable if I ever do some radical upgrade beyond SlackWare (unlikely, though it sounds.)

I've got a Dell PowerVault 124T, which is a 16-slot autoloader wrapped around an IBM Ultrium-TD3 LTO-3 tape drive. Dell has nearly nothing to offer me with regard to documentation for Linux, especially non-RH or non-SuSE Linux. I'm running Dirvish to capture daily backups of several network servers onto a huge local hard drive array. (I've used Dirvish for three years now, and it's like Apple's Time Machine, only without the pretty User Interface.) I would like to take a certain day's backup of all the servers' data and write it to tape.

I've hacked through a pile of little challenges:
  • tar's verify feature doesn't work with streaming tape drives
  • reading past an inline .tar file's EOF marker required the --sparse parameter
  • building a custom set of source folders
  • finding the tape drive autoloader when one of my other SCSI devices is not permanently mounted
  • writing a second script to rewind a tape before loading the next one
But this one last issue has me stumped! When I go back to verify the job, it stops after reading about 400 GB. According to my verify log, the last file it read is an 11GB bz2 tarball. I've logged stderr and stdout to files on both the tar --create and the tar --compare operation. Here's the error message I get when I do the verify:
Code:

tar: Skipping to next header
tar: Error exit delayed from previous errors

I'm going to try changing the order of my source volumes to see whether the problem is happening at the end-of-tape number 1 or if it's happening during the verification of that 11GB file. Perhaps somebody else has walked this road before me, though....

My Question:
Has anyone out there done multi-volume tape archives with GNU tar? Were you able to verify them?

Here's my 2tape script:
Code:

#!/bin/bash

# Create tape backups of a particular day's Dirvish
PATH=/bin:/usr/bin:/usr/local/bin:/usr/local/bin/mtx-1.2.18rel

# Verify that there is a parameter on the command line
if [ ! $# -eq 1 ] ; then
  echo "Syntax: $0 [date]"
  echo "  where [date] is in the form YYYYMMDD"
  exit
fi

# Find the tape autoloader's /dev/sg number
let drv=6
while [ $drv -gt 0 ] ; do
        mtx -f /dev/sg$drv inquiry > /tmp/dev.sg$drv.inq
#      echo "device "$drv
        if [ $(grep -l "PV-124T" /tmp/dev.sg$drv.inq ) ]; then
                break
        fi
        rm /tmp/dev.sg$drv.inq
        let drv=$drv-1
done
echo "Tape changer is /dev/sg"$drv
echo "Housekeeping..."
# Flush status files
: > /tmp/tarc.err
: > /tmp/tarc.out
: > /tmp/tard.err
: > /tmp/tard.out
: > /tmp/tapeindex.txt

# Build index of files to archive
# (Comment-out lines for servers you'd like to skip during testing)
#
echo "/backup/ServerA/"$1 >>/tmp/tapeindex.txt
echo "/backup/ServerB/"$1 >>/tmp/tapeindex.txt
echo "/backup/ServerC/"$1 >>/tmp/tapeindex.txt
echo "/backup/ServerD/"$1 >>/tmp/tapeindex.txt
#echo "/backup/ServerE/"$1 >>/tmp/tapeindex.txt
echo "/backup/ServerF/"$1 >>/tmp/tapeindex.txt
#echo "/backup/ServerG/"$1 >>/tmp/tapeindex.txt
#echo "/backup/ServerH/"$1 >>/tmp/tapeindex.txt
#echo "/backup/ServerI/"$1 >>/tmp/tapeindex.txt
echo "/slowbackup/ServerJ/"$1 >>/tmp/tapeindex.txt
#echo "/slowbackup/ServerK/"$1 >>/tmp/tapeindex.txt
#echo "/slowbackup/ServerL/"$1 >>/tmp/tapeindex.txt
#echo "/slowbackup/ServerM/"$1 >>/tmp/tapeindex.txt
#echo "/slowbackup/ServerN/"$1 >>/tmp/tapeindex.txt

# Load the first tape because tar is too dumb to get it done
echo "Loading the first tape"
mtx -f /dev/sg$drv first
echo "Rewinding the first tape"
mt -f /dev/nst0 rewind

#Write the tape
echo
echo "Archiving these files to tape:"
cat /tmp/tapeindex.txt
echo
date
echo "Tar... with no inline verify"

if tar --create --one-file-system --atime-preserve --totals --sparse -vv \
      --multi-volume --new-volume-script "nexttape $drv" \
      2>> /tmp/tarc.err 1>> /tmp/tarc.out \
      --file /dev/nst0 -T /tmp/tapeindex.txt --label=$1 ; then

  echo "tar done - success"
  echo "Writing EOF markers"
  mt -f /dev/nst0 weof 2

  echo "Rewinding last tape"
  mt -f /dev/nst0 rewind

  echo "Unloading last tape"
  mtx -f /dev/sg$drv unload

else
    cat /tmp/tarc* | mail -s "Tape archive failed" root
    exit
fi

echo "Now verifying backups"
echo "loading first tape"
mtx -f /dev/sg$drv first

date
echo "Verifying with tar...."
echo
if tar --compare --directory / --file /dev/nst0 --sparse -vv \
      --multi-volume --new-volume-script "nexttape $drv" \
      2>>/tmp/tard.err 1>>/tmp/tard.out ; then
        echo "Verified OK"
else
        echo "Errors in verify" $status
fi
echo "Rewinding last tape"
mt -f /dev/nst0 rewind

echo "Unloading tape"
mtx -f /dev/sg$drv unload

date

and here is my "nexttape" script:
Code:

#!/bin/sh
# verify that there is a parameter on the command line
if [ ! $# -eq 1 ] ; then
  echo
  echo "Syntax: $0 x"
  echo "  where x is the $drv number in /dev/sg$drv."
  echo
  exit
fi

# Flush status file
: > /tmp/nexttape.txt

/usr/local/sbin/mt -f /dev/nst0 rewind >> /tmp/nexttape.txt 2>>/tmp/nexttape.txt
/usr/local/sbin/mtx -f /dev/sg$1 next >> /tmp/nexttape.txt 2>>/tmp/nexttape.txt

if grep "Loading media from Storage Element" /tmp/nexttape.txt ; then
        echo "successful tape advance"
        let sorce=`grep "Unload" /tmp/nexttape.txt | sed -e 's/\./ /' | awk  '{print $7}'`
        let trgt=`grep "Loading" /tmp/nexttape.txt | awk  '{print $6}'`
        echo "From " $sorce
        echo "To " $trgt
else
        echo "tape advance fails"
        let sorce=`grep "Unload" /tmp/nexttape.txt | sed -e 's/\./ /' | awk  '{print $7}'`
        cat /tmp/nexttape.txt | mail -s "Tape $sorce advance fails" root
        echo "Unloaded from " $sorce
fi


choogendyk 11-10-2007 09:33 AM

Lot of stuff there. So, I'll start by just commenting on the following:

Quote:

Originally Posted by petcherd (Post 2953859)
I'd like to use GNU tar rather than Amanda or Bacula or some commercial package because my archiving dataset is much larger than a single 800 GB tape and because tar is more broadly readable if I ever do some radical upgrade beyond SlackWare (unlikely, though it sounds.)

Amanda will use GNU tar and will span tapes. And, it actually puts a first file on the tape that lists the linux commands required to read the whole tape. So, if you do something "radical", you can read the tape with just basic linux tools. That's one significant reason from Amanda using native backup tools.

I wrote my own backup scripts for years, and then adopted Amanda just under a year ago. While I use it to backup all my servers in the normal Amanda fashion, I also have some situations that somewhat parallel yours. I have labs that do their own backups to a drive attached to my server (somewhat like your dirvish, but using different software). I do periodic Amanda archives to put their disk backup onto tape. So, while Amanda runs every night to do my regular backups, I will schedule an archive run just when the lab tells me they want one.

I'm not necessarily advocating that you make this change, just pointing out that it would do what you want.

I'll digest your scripts and see if I have anything to add on that later.

petcherd 12-13-2007 11:49 AM

I did some more debugging since my original post and found that the verify phase was consistently failing during a 10 GB *.tar.bz2 file.

The host for my initial testing was a little weird: 3Ware 7500 Disk controller presents a pile of PATA drives to the OS as if they were a pile of SCSI disk space, but it requires kernel support. SlackWare 8 has been patched by previous admins, but I don't know that all components are updated in sync with each other, and it's running too many other critical tasks for me to take it down and rebuild.

I've moved the tape drive onto a newer server with freshly-installed SlackWare 12, more CPU, less uncommon hardware, a big SCSI RAID array and no other responsibilities. I still find that the verify phase is failing during a 10 GB *.tar.bz2 file.

I'm going to check the log to see if there are any other files that were verified earlier in the process that are larger than 10GB.... Yup! There is no other file greater than 300MB in the archive. Does anyone know if GNU tar is hitting a maximum here?

theoffset 12-15-2007 02:54 AM

I found this deeply buried in the "info tar" page:

Code:

...

  The `--verify' (`-W') option will not work in conjunction with the
`--multi-volume' (`-M') option or the `--append' (`-r'), `--update'
(`-u') and `--delete' operations.  *Note Operations::, for more
information on these operations.

...


(Emphasis mine.)


As I understand, tar shouldn't even be able to --verify those tapes.

Good look with that, anyway.

EDIT:

Nevermind. I just saw that you're verifying with --compare.

You either have a bad tape/backup, or you've got bad luck and hit a tar bug, because GNU Tar doesn't have a file-size limit (unless you're telling it to play like some older Unixes)

petcherd 12-17-2007 08:33 AM

Yes, I saw that in the documentation, too. To cope with this deficiency, I had to write a "nexttape.sh" script that would rewind the current tape before loading the next one. After the whole --create phase was done, I went back and loaded the first tape to do a --diff.

I've looked closer and constructed some smaller test data-sets. I tried tar-ing a few 2 GB files, followed by the 10 GB file. I tried tar-ing a 77 GB binary file that wasn't a *.tar archive and saw tar get completely overwhelmed and confused. I removed the --multi-volume and --next-script parameters.

In summary, tar can verify a 3 GB file on tape, but not a 10 GB file. I suspect that 4 GB is the dividing line, but I'm not going to probe any further.

Today I'm experimenting with star: http://cdrecord.berlios.de/old/private/star.html

I will let you know how it went.

-dP

petcherd 12-18-2007 02:30 PM

star handles my big honkin' files without a problem.

petcherd 01-02-2008 12:36 PM

I moved the whole shebang over to a new SlackWare 12 server, and found difficulty compiling and installing star. <SIGH>

In the meantime, someone suggested I look at the changelogs for GNU tar: There was a known bug in verifying large sparse files. My files are fully packed with compressed binary data, so I had ignored this bug-fix as irrelevant to my situation. A little more pressure from my GNU tar-using friend, and I decided to test the latest v.1.19 anyway to see if it performed better than v.1.16.1....

I uncompressed the tgz, removed my old tar package from SlackWare 12, and compiled the new version. My quick-test with 34 files @ 2 GB each plus 1 file @ 46 GB wrote to tape and verified without errors.

Now I'm testing a 1.4 TB dataset that includes these 35 files and the one 10 GB file that caused my original discomfort. The first tape hasn't finished filling yet, but I'm getting more and more confident that it will work.

gnashley 01-02-2008 02:06 PM

tar is up to version 1.19 now so I'd look into using the latest version for the problems you are having.
Wow, I thought star was dead I have the last stable version on my site. I hadn't noticed before that it is by the cdrecord guy Joerg Schilling -that means it may be worse than dead -even if it works really well. I've grabbed the latest alpha version though and will see if I can get it compile on slack-12 -but no promises -Schilling and Torvalds have been butting heads for at least 10 years and if star won't compile he would surely blame it on the kernel...

petcherd 01-03-2008 08:42 AM

Thanx, it is version 1.19 that I'm trying now.

I had noticed a philosophical conflict between Joerg and the authors of GNU tar (over POSIX compliance) and the authors of GNU make (over some other matter beyond my understanding), but I didn't know that he had any issues with Linus T. I'm sure whatever matter there might be between them is far beyond my understanding.

My test yesterday failed due to an unrelated matter, and I'm trying again.
<SIGH>

Alien_Hominid 01-03-2008 09:11 AM

The argument was because of cdrecord.

gnashley 01-03-2008 11:29 AM

Because of cdrecord and the whole scsi emulation layer debate and problem. The kernel guys finally 'split the sheets' with him and went their own way with the ATA drivers. Also, debian forked cdrecord into cdrkit because of license incompatibilities.
Any way, I'm not sure what trouble you were having with star compiling on Slack-12. I've been able to compile it easily enough on Slack-12. perhaps you were running configure? You should *not* do that. I can supply you with a package if you like, but it would be better if you compile your own since it creates a binary optimized for your CPU. If you have src2pkg installed you can use my script to easily put it all together.

petcherd 01-04-2008 08:27 AM

I'm sure my problem compiling star can be traced to my own inexperience and failure to notice some necessary detail in the documentation.

I tried some other stuff yesterday that completely polluted my system. Now I'm trying to rebuild it with RAID-5 instead of LVM to join my hard drive storage into a single volume. It could be a while before I get all of my act together.

gnashley 01-04-2008 09:55 AM

I guess you don't have src2pkg installed, but I think my src2pkg script for building star might help you anyway, so I've included it below. The build is straightforwyrd enough. DDO not run the configure script in the sources, just type make and have a cuppa' while it compiles. There is no install rule, so you can just copy the binary to /usr/bin or cerate a package manually if you want. The src2pkg script lists the pertinent docs and man-pages which you might want to include. Or grab the package for src2pkg and use it with this script by placing the script and the tarball in the same dir and running 'srcpkg -X'.
You can get src2pkg here:
http://distro.ibiblio.org/pub/linux/...i486-1_K26.tgz

Code:

#!/bin/bash
## src2pkg script for:        star
## src2pkg Copyright 2005-2007 Gilbert Ashley <amigo@ibilio.org>

SOURCE_NAME='star-1.5a87.tar.gz'
NAME='star'
VERSION='1.5a87'
ARCH='i486'
BUILD='1'
PRE_FIX='usr'
# Any extra options go here
# EXTRA_CONFIGS=''
# STD_FLAGS='-O2 -march=i486 -mtune=i686'

DOCLIST='star/README star/README.largefiles star/README.pattern
star/README.otherbugs star/README.crash star/README.pax
star/README.posix-2001 star/README.ACL'

# Get the functions and configs
. /usr/libexec/src2pkg/FUNCTIONS ;

# do_all_processes can substitute these 16 steps:

pre_process
find_source
make_dirs
unpack_source
fix_source_perms

#configure_source
#compile_source

cd $SRC_DIR && make
# fake_install

mkdir -p $PKG_DIR/usr/bin
# the star stands for the name/type of your cpu
cp $SRC_DIR/star/OBJ/*/star $PKG_DIR/usr/bin

fix_pkg_perms
strip_bins
create_docs

mkdir -p $PKG_DIR/usr/man/man1 $PKG_DIR/usr/man/man4
cp $SRC_DIR/star/star.1 $PKG_DIR/usr/man/man1
cp $SRC_DIR/star/star.4 $PKG_DIR/usr/man/man4

compress_man_pages
make_description
make_doinst
make_package
post_process


aikempshall 01-07-2008 06:34 AM

I had the same problem verifying large files with tar 1.16.1. Google kempshall+tar for the story. The fix is in 1.19.


All times are GMT -5. The time now is 11:19 AM.