Using tapes for long term data storage. What software to use? Not Amanda.
So the company I work for recently purchased a Dell TL2000 tape box.
They then decided to use Amanda to interface with that machine.
The then system admin installed it and soon afterwards quit his position. Then I got hired (not his position, I'm IT support) and besides my regular duties, got put in charge to make amanda work for us.
After a few weeks of trial and error, chatting on irc and mailing lists, I came to the conclusion that Amanda is not for us as we are not planning to do 'regular backups' as per amanda's definition.
We only need to archive files for long term storage. Tapes will never be overwritten, just accumulated and new ones purchased.
We plan on dumping about 200Gb / week.
With amanda, the problem was that it is not able to append 'sessions' to the same tape, it writes a new one on every run (waste of tapes for our purpose). We could tell it to hold the files in it's 'holding disk' until it accumulates 800Gb and then it writes them. The problem with that scenario is that we'd need to buy a few extra hard drives to build a reliable RAID array for that holding disk. 3 projects at a time = 2.4TB of raid'ed space required.
All this just because amanda cannot write multiple sessions to the same tape.
So my question is, what would be a good software that could accomplish what we need:
- Be able to write multiple times to the same tape until it's full (!)
- Change tapes when the current one becomes full
- Keep a database (plain-text or mysql) of which files were written to which tape
- Skip file if it's already in the database (optional)
That's it basically. No nonsense :)
Any input greatly appreciated.
On the Amanda Users list, you indicated
I never played with Amanda, but an alternative worth considering might be Bacula.
Project1, Project2 -> 2007,2008,2009 -> Jan,Feb,[...],Dec -> 1,2,[..],31
A folder with years containing all months containing all days
Files would be dumped into the appropriate folder where they belong.
amdump Project1 would backup *every new file* under Project1.
Once amdump would complete, we would then delete all files in the source to save on disk space (but keep folder structure)
During the course of the week we'd fill up the source with files and run amdump on the weekends. And repeat every week.
That's how I planned to do the long term backuping.
Where do the files come from? And how is it that they are put in a folder and land in a particular folder? Not trying to be dense, just trying to understand the process behind the scenes that is being backed up.
Let's call them daily reports which are generated by some scripts. Say 10-20gb per day.
Everyday they're put in that day's folder /files/YYYY/MM/DD/ on the main server.
Now, this process was taking place everyday for the last 2 years and there never was a tape archiving solution.
The result was that every time the main server started to get full, a chunk of those reports were 'temporarily' moved to some secondary servers only to be forgotten there, creating a huge mess I now have to clean :)
Those files on the secondary servers would have to be manually moved back to the main box in the appropriate date folder where they belong so the tape software could write them to tape.
(I didn't want to archive them from their current location as that would imply multiple amanda DLE's and would imply searching multiple DLE's afterwards if one of those files was needed. That's why I wanted to centralize everything back on the main server to have 1 DLE with a logical folder structure underneath)
The goal is basically to a) clean all the secondary servers of those reports and b) automate the archiving of future files to prevent this from happening again.
The way I saw it is that after moving the files from secondary to main, I'd launch the amdump process which would write what's in main's DLE, then wait till it finished and deleted the files from the main.
Repeat this process until all secondary are clean, and then setup some kind of script to automate a weekly arching of future reports older then 3 months.
3am here, does the above make sense? :)
|All times are GMT -5. The time now is 06:43 AM.|