LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   I need some storage suggestions... (https://www.linuxquestions.org/questions/linux-newbie-8/i-need-some-storage-suggestions-816024/)

trist007 06-23-2010 08:45 PM

I need some storage suggestions...
 
I have a 500 GB system hard disk that I will mount to / and use as my main partition. It also has a smaller partition for swap. Then I have 4 1 TB hard drives. I have a website that is linked to a database via php. So far I have about 400 GB of content on the site which grows daily. However, it's growing slowly. Growth will probably slowdown in a year or so, so I don't feel a need to have more than 2 TB in the long run. Also, I can only fit 5 hard drives in my computer so I do not want to add any more. Okay so far I can think of two options:

Option 1:

Setup 2 1 TB hard drives on a LVM. Then I can use one or both of tje other 1 TB hard drives for backup.

I have question here. If I do this, I will probably setup a cron job to backup weekly, maybe monthly. It's not an active site. The thing is from testing, a backup tgz file only compresses about 15% with the type of content I use, which is mostly binary. Also, is there a difference at all between a tar.gz file and a tgz file, in terms of compression? So if I made a backup right now the file would be about 370 GB ish. Anyhow, this setup would definitely require space. However, I see a flaw. Because the backup would be so big, each backup would just overwrite the new backup. However, what if my production data gets blown and the backup.tgz file is corrupt? I understand the bigger the tgz file is, the higher the chance that it will be corrupt. I won't be transferring the tgz file, I will just backup to one of those 1 TB hard drives. However, I've seen times when just the process of creating a tgz file sometimes ends up in the dreaded 'unexpected EOF reached.' I've tried using 'tar -tvv filename.tgz' which is supposed to test the integrity of the file, but whenever I run this, the process always seems to freeze. Also, I've heard about bzip2recover. Anybody have any experience with this?

So yes, this is one viable option. I don't lose any space but I risk being left with a backup tgz file that could be corrupt.

Option 2:

Setup 2 1 TB hard drives on a mirrored raid. Then setup the other 2 1 TB hard drives also on a mirrored raid. Then merge the two mirrored raids into a striped raid. I think they call this a Raid 10 because it's a combination of the two. I like this but at the same time I lose 2 TB of space. Also, anybody know the procedure if I lose one hard disk? What programs do I use to insert a brand new 1 TB hard drive into this Raid 10 array?

Another big flaw is if two hard drives fail both in the same mirrored raid, then I lose everything. So I'm not liking this idea.

What do you guys think?

I'm thinking Option 1 and doing backup on 1 TB hard drive and then another on the other 1 TB hard drive. That way I increase my chances. However, once my storage goes above 1 TB, which it will, I can no longer do double backups. I would merge the other two 1 TB hard drives into another LVM and just make 1 backup that hopefully won't corrupt.

Anyhow, I'm excited to hear what you guys think. The data is very valuable to me so I want to do as much as I can to minimize the risk of loss.

Wim Sturkenboom 06-23-2010 11:13 PM

No advice regarding your options as I don't know what I would use in your scenario.

If data is very valuable, I would not use onsite backup only. Use a couple of external HDs and store offsite as well (bank, at work if the location differs from where the machine is); cycle them.

Further consider incremental backups instead of full backups all the time; this can save you considerable space, especially if the content does not change significantly. E.g. once a month a full backup and the other weeks of the month a backup of the new/changed files.

chrism01 06-24-2010 03:30 AM

Yeah, incrementals might be an idea.
There's no difference in .tar.gz v .tgz; its just a naming convention.
Note that gzip has a compression flag

-n where n is a num 1 - 9

higher num = better compression but slower to create (obviously).
http://linux.die.net/man/1/gzip

Also, off-site backups as recommended above.

alli_yas 06-24-2010 07:18 AM

The path you follow here depends on what your end goal is and whether you have a bit of money to spend in addition to what you already have.

To me what the previous two posters have mentioned is paramount - you need to ship your backups off site somewhere - either by purchasing a secondary storage device for yourself; or buying some hosting space somewhere to store your backups.

In terms of the Option 1 shortcoming; of a backup being corrupted; if you're using an Oracle database; this is easy to counter since Oracle uses redo logs and RMAN which can allow you to recover to a specific point in time - I'm not sure if MySQL/other relational databases have a similar feature/s?

In terms of Option 2; I'd say instead of doing a RAID 1/0 with your 4 disks; what about a RAID 5? In this way you only lose 1 disk worth of space and thus have more data space available. The difference would be that you cannot lose more than one disk without losing all your data.


In my opinion you should go for a combination of Option 1 and Option 2. Use all 4 drives for your database in a RAID 5 configuration. Ship your backups off site and store say up to 7 days worth to allow recovery in a wider window.

If you have cash to spend you may want to consider a small storage solution (HP MSA2000 or EMC Clariion AX or similar) which will cost a bit but be much more robust to preserve your data.

trist007 06-24-2010 11:18 AM

Found this article on Raid 5.

http://www.zdnet.com/blog/storage/wh...ng-in-2009/162

Thinking about going with LVM that way I can always add another hard dis. Then I will make incremental backups on modified or new files and a full monthly backup to both one of those internal 1 TB hard disks, which are separate from the main LVM and then another to a hosted backup. Thanks for all the info guys.

junglepunk 06-24-2010 11:42 AM

It's important to note that RAID is NOT backup. It's fine if you lose a disk but if the controller goes you usually can't recover the array. Then you're screwed.

trist007 06-25-2010 08:52 AM

bump

jefro 06-25-2010 04:17 PM

Take a chance on ZFS.


All times are GMT -5. The time now is 09:17 AM.