Create TAR archives to two different LTO-5 drives at once
Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Create TAR archives to two different LTO-5 drives at once
I need to be able to write a tar archive to two SAS LTO-5 tape drives simultaneously. I work for a post-production house where we have to create redundant tape backups from large image sequences. We are typically backing up 1-3TB of data a day, which is nearly impossible to do using only one LTO drive and swapping tapes.
My first thought was to send the tar archive to STDOUT and then pipe the that data into the tee command to write multiple files at once. This works great writing to a hard drive, but I haven't been able to figure out the best way to write to the tape drives via tee.
Here is what I've tried:
tar -cvO /sequenceDirectory | tee (dd of=/dev/nst0) >(dd of=/dev/nst1)
I was hoping to not have to create a temp .tar file on the harddrive before writing to the tape. Please let me know if anyone has a solution to this. Also, is there a way to verify that all the data in the directory was successfully written to the tape?
Thanks for your reply. This biggest issue with writing to one drive first and then copying that to the other is the time it will take to get that data over to the 2nd drive. Both tape drives will be tied up twice as long. Basically, I could achieve the same thing with one tape drive. I do however see the benefit of being able to deliver the source drive back to the production company sooner going tape to tape.
Another reason I was thinking about creating the temp tar first would be for write speed reasons. I'm lucky to see 20-25MB/sec going from the source eSATA drive directly to tape. This is because we are writing thousands of small files (around 7MB each) instead of fewer larger files. This doesn't allow the tape head to spin up to max speed before having to wind down. Copying larger files I see speeds up to 100MB/sec or more. I am thinking that creating the larger temporary .tar archive onto our RAID and then copying to both tape drives from that temp file at once will be the fastest way to get the job done.
I briefly looked into the Bacula software that you mentioned, but at first it appeared that it was designed to do routine backups on a regular basis. The data we are backing up will be coming in from multiple sources. My concern would be having to reconfigure the backup software every time we get a new batch of footage in. Any insight you can provide on how these applications work is greatly appreciated.
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731
Rep:
In case of writng ten thousands of small files to tape think about a temporary staging area. As you already mentioned larger files will speed up performance.
I am currently doing speed tests creating tar files first and then going to both tape drives at once. I will post up my findings once the tests are completed.
Okay, so I created a tar archive of one sequence of images. This archive is 39.2GB in size. I am still not seeing speeds anywhere near 100MB/s. Currently I can only get write speeds around 17MB/s (roughly 1GB/min) using the following command:
pv -B 4m -tbr /archive.tar | tee >/dev/nst0
I have tried several different buffer sizes ranging from 512 bytes to 4GB in the pv parameters. This has not effected the write speeds at all. Running the same command, but writing to the same hard drive that I'm reading from I am getting an average of 120MB/s.
Does anyone have any suggestions as to what the bottleneck might be?
The buffer size that pv uses does not affect the buffering in the tee command, which is probably using the default 4K block size. You would need to run the output through something like dd to reconstruct the larger blocks:
Keeping a fast tape drive streaming at full speed can be quite a challenge. Back in the days of slower disks and processors, even with a slower tape drive (DDS2) I found it necessary to write a circular buffering program in order to keep the pipeline flowing and the tape drive streaming. The script to set up that pipeline and handle all the places that errors could occur was rather horribly complex.
Is there any alternative to using the dd command? It spits a bunch of gibberish out in my terminal window and my throughput goes down to 182Kb/s using it.
Getting rid of the junk from dd is simple, just redirect its stderr to /dev/null, or better, to some file you can examine later should there be some error. But that isn't going to help your throughput problem. The only thing I can guess at this point is that there might be some issue in tee w.r.t. blocking I/O and block sizes larger than the size of a FIFO buffer. I fear I've about reached the limit of what I can suggest. If I were fighting a problem like this on my own system, I'd no doubt be writing code by now.
I did just think of one thing, though. Since you've now got the archive on a file, there's no reason you couldn't start up two independent processes to read from the file and write to a tape drive. That way there would be no need to use tee and pipelines.
Thanks for the input. I had a suspicion that tee might be the culprit. I will try bypassing it tomorrow when I get to the office. I should be able to whip out a python script that reads in blocks of the file and sends them out to be written to the tape drive.
You mentioned that there were quite a few points of error when scripting this sort of thing in an earlier post. If you know of any gotchas off the top of your head, I'm all ears. Thanks again for all your help.
Nothing specific w.r.t. errors, just that when you have a complex pipeline set up you need to be sure that no error in any stage will slip by unnoticed. The script that performed my backups wrote its output to the tape as well as keeping an online copy, included the aforementioned buffering to keep the tape streaming, calculated an MD5 sum of the data stream as it was being generated, had options for compression, maintained an index of where things were on the tape, kept track of tape usage, ... . Making sure the script didn't try to continue in the face of errors in any of that got pretty complex.
Makes me glad I'm not using tape for backup any more. 'Course the scripts I've go to work around issues with the backup tool I do use now are even worse, but that's progress for you.
Glad to report that bypassing tee did the trick! I am able to keep a consistent speed of around 90MB/s now, which is an amazing increase. I did a test taring directly to the tape and that went at about 65MB/s. I will not be the only person running these backups. I might create a GUI to help them create the .tar archives and to get them onto and off of the tape. If so, I will make the code available to help others with the same issues.
Just as a recap:
- Create your tar archive first so that maximum speeds can be achieved.
- Use pv and dd together to get your fastest write speeds and monitor the output. The LTO-5 drive seemed to like a block size between 384k and 1024k. Send the dd output to /dev/null to prevent a bunch of stuff being written to the terminal window
pv -B 1024k -tbr your_archive.tar | dd bs=1024K iflag=fullblock of=/dev/nst0 > /dev/null
- Pull the data off of the tape by using the mt command to position the head at the appropriate block (use mt manpage to find appropriate command) and then using dd to pull the data off
mt -f /dev/nst0 bsf 1
dd bs=1024K if=/dev/nst0 of=/your_dir/your_archive.tar > /dev/null
i also work in a post-production environnement and i have to create LTO-5 for one of my teams.
Like yourself, i would like to create simultaneously 2 LTO tapes with the same contents (one for each archive sites we have).
I read carefully the previous posts, and it was very interesting.
At the end, it seems that you find the correct settings, but i didn't understand if you managed to create both LTO simultaneously, or one by one.
I tried different things with pv and dd... and i did managed to write files on one LTO at 186MB/s with your command line, but i didn't understand how to run it on both LTO drives.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.