-   Linux - Software (
-   -   Using tapes for long term data storage. What software to use? Not Amanda. (

WojtekO 04-04-2009 03:40 PM

Using tapes for long term data storage. What software to use? Not Amanda.
So the company I work for recently purchased a Dell TL2000 tape box.
They then decided to use Amanda to interface with that machine.

The then system admin installed it and soon afterwards quit his position. Then I got hired (not his position, I'm IT support) and besides my regular duties, got put in charge to make amanda work for us.

After a few weeks of trial and error, chatting on irc and mailing lists, I came to the conclusion that Amanda is not for us as we are not planning to do 'regular backups' as per amanda's definition.
We only need to archive files for long term storage. Tapes will never be overwritten, just accumulated and new ones purchased.

We plan on dumping about 200Gb / week.
With amanda, the problem was that it is not able to append 'sessions' to the same tape, it writes a new one on every run (waste of tapes for our purpose). We could tell it to hold the files in it's 'holding disk' until it accumulates 800Gb and then it writes them. The problem with that scenario is that we'd need to buy a few extra hard drives to build a reliable RAID array for that holding disk. 3 projects at a time = 2.4TB of raid'ed space required.

All this just because amanda cannot write multiple sessions to the same tape.

So my question is, what would be a good software that could accomplish what we need:
- Be able to write multiple times to the same tape until it's full (!)
- Change tapes when the current one becomes full
- Keep a database (plain-text or mysql) of which files were written to which tape
- Skip file if it's already in the database (optional)

That's it basically. No nonsense :)

Any input greatly appreciated.

choogendyk 04-05-2009 12:50 PM

On the Amanda Users list, you indicated


Every week, we'll be dumping about 100Gb of files categorized in
folders, Amanda will then backup those files to tape, and then the
files will be deleted from that folder. Process repeats every week.
Could you explain a bit more about the process that does that initial dump? It seems that's a significant part of the larger picture.

reptiler 04-05-2009 12:53 PM

I never played with Amanda, but an alternative worth considering might be Bacula.

WojtekO 04-06-2009 02:11 PM


Originally Posted by choogendyk (Post 3499214)
On the Amanda Users list, you indicated

Could you explain a bit more about the process that does that initial dump? It seems that's a significant part of the larger picture.

The source will have the following folder structure:
Project1, Project2 -> 2007,2008,2009 -> Jan,Feb,[...],Dec -> 1,2,[..],31

A folder with years containing all months containing all days

Files would be dumped into the appropriate folder where they belong.

amdump Project1 would backup *every new file* under Project1.
Once amdump would complete, we would then delete all files in the source to save on disk space (but keep folder structure)

During the course of the week we'd fill up the source with files and run amdump on the weekends. And repeat every week.

That's how I planned to do the long term backuping.

choogendyk 04-06-2009 07:07 PM

Where do the files come from? And how is it that they are put in a folder and land in a particular folder? Not trying to be dense, just trying to understand the process behind the scenes that is being backed up.

WojtekO 04-07-2009 03:10 AM

Let's call them daily reports which are generated by some scripts. Say 10-20gb per day.
Everyday they're put in that day's folder /files/YYYY/MM/DD/ on the main server.

Now, this process was taking place everyday for the last 2 years and there never was a tape archiving solution.
The result was that every time the main server started to get full, a chunk of those reports were 'temporarily' moved to some secondary servers only to be forgotten there, creating a huge mess I now have to clean :)

Those files on the secondary servers would have to be manually moved back to the main box in the appropriate date folder where they belong so the tape software could write them to tape.
(I didn't want to archive them from their current location as that would imply multiple amanda DLE's and would imply searching multiple DLE's afterwards if one of those files was needed. That's why I wanted to centralize everything back on the main server to have 1 DLE with a logical folder structure underneath)

The goal is basically to a) clean all the secondary servers of those reports and b) automate the archiving of future files to prevent this from happening again.

The way I saw it is that after moving the files from secondary to main, I'd launch the amdump process which would write what's in main's DLE, then wait till it finished and deleted the files from the main.
Repeat this process until all secondary are clean, and then setup some kind of script to automate a weekly arching of future reports older then 3 months.

3am here, does the above make sense? :)

All times are GMT -5. The time now is 12:21 PM.