LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-23-2010, 08:45 PM   #1
trist007
Senior Member
 
Registered: May 2008
Distribution: Slackware
Posts: 1,052

Rep: Reputation: 70
I need some storage suggestions...


I have a 500 GB system hard disk that I will mount to / and use as my main partition. It also has a smaller partition for swap. Then I have 4 1 TB hard drives. I have a website that is linked to a database via php. So far I have about 400 GB of content on the site which grows daily. However, it's growing slowly. Growth will probably slowdown in a year or so, so I don't feel a need to have more than 2 TB in the long run. Also, I can only fit 5 hard drives in my computer so I do not want to add any more. Okay so far I can think of two options:

Option 1:

Setup 2 1 TB hard drives on a LVM. Then I can use one or both of tje other 1 TB hard drives for backup.

I have question here. If I do this, I will probably setup a cron job to backup weekly, maybe monthly. It's not an active site. The thing is from testing, a backup tgz file only compresses about 15% with the type of content I use, which is mostly binary. Also, is there a difference at all between a tar.gz file and a tgz file, in terms of compression? So if I made a backup right now the file would be about 370 GB ish. Anyhow, this setup would definitely require space. However, I see a flaw. Because the backup would be so big, each backup would just overwrite the new backup. However, what if my production data gets blown and the backup.tgz file is corrupt? I understand the bigger the tgz file is, the higher the chance that it will be corrupt. I won't be transferring the tgz file, I will just backup to one of those 1 TB hard drives. However, I've seen times when just the process of creating a tgz file sometimes ends up in the dreaded 'unexpected EOF reached.' I've tried using 'tar -tvv filename.tgz' which is supposed to test the integrity of the file, but whenever I run this, the process always seems to freeze. Also, I've heard about bzip2recover. Anybody have any experience with this?

So yes, this is one viable option. I don't lose any space but I risk being left with a backup tgz file that could be corrupt.

Option 2:

Setup 2 1 TB hard drives on a mirrored raid. Then setup the other 2 1 TB hard drives also on a mirrored raid. Then merge the two mirrored raids into a striped raid. I think they call this a Raid 10 because it's a combination of the two. I like this but at the same time I lose 2 TB of space. Also, anybody know the procedure if I lose one hard disk? What programs do I use to insert a brand new 1 TB hard drive into this Raid 10 array?

Another big flaw is if two hard drives fail both in the same mirrored raid, then I lose everything. So I'm not liking this idea.

What do you guys think?

I'm thinking Option 1 and doing backup on 1 TB hard drive and then another on the other 1 TB hard drive. That way I increase my chances. However, once my storage goes above 1 TB, which it will, I can no longer do double backups. I would merge the other two 1 TB hard drives into another LVM and just make 1 backup that hopefully won't corrupt.

Anyhow, I'm excited to hear what you guys think. The data is very valuable to me so I want to do as much as I can to minimize the risk of loss.
 
Old 06-23-2010, 11:13 PM   #2
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Ubuntu 12.04, Antix19.3
Posts: 3,794

Rep: Reputation: 282Reputation: 282Reputation: 282
No advice regarding your options as I don't know what I would use in your scenario.

If data is very valuable, I would not use onsite backup only. Use a couple of external HDs and store offsite as well (bank, at work if the location differs from where the machine is); cycle them.

Further consider incremental backups instead of full backups all the time; this can save you considerable space, especially if the content does not change significantly. E.g. once a month a full backup and the other weeks of the month a backup of the new/changed files.
 
Old 06-24-2010, 03:30 AM   #3
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,362

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Yeah, incrementals might be an idea.
There's no difference in .tar.gz v .tgz; its just a naming convention.
Note that gzip has a compression flag

-n where n is a num 1 - 9

higher num = better compression but slower to create (obviously).
http://linux.die.net/man/1/gzip

Also, off-site backups as recommended above.
 
Old 06-24-2010, 07:18 AM   #4
alli_yas
Member
 
Registered: Apr 2010
Location: Johannesburg
Distribution: Fedora 14, RHEL 5.5, CentOS 5.5, Ubuntu 10.04
Posts: 559

Rep: Reputation: 92
The path you follow here depends on what your end goal is and whether you have a bit of money to spend in addition to what you already have.

To me what the previous two posters have mentioned is paramount - you need to ship your backups off site somewhere - either by purchasing a secondary storage device for yourself; or buying some hosting space somewhere to store your backups.

In terms of the Option 1 shortcoming; of a backup being corrupted; if you're using an Oracle database; this is easy to counter since Oracle uses redo logs and RMAN which can allow you to recover to a specific point in time - I'm not sure if MySQL/other relational databases have a similar feature/s?

In terms of Option 2; I'd say instead of doing a RAID 1/0 with your 4 disks; what about a RAID 5? In this way you only lose 1 disk worth of space and thus have more data space available. The difference would be that you cannot lose more than one disk without losing all your data.


In my opinion you should go for a combination of Option 1 and Option 2. Use all 4 drives for your database in a RAID 5 configuration. Ship your backups off site and store say up to 7 days worth to allow recovery in a wider window.

If you have cash to spend you may want to consider a small storage solution (HP MSA2000 or EMC Clariion AX or similar) which will cost a bit but be much more robust to preserve your data.
 
Old 06-24-2010, 11:18 AM   #5
trist007
Senior Member
 
Registered: May 2008
Distribution: Slackware
Posts: 1,052

Original Poster
Rep: Reputation: 70
Found this article on Raid 5.

http://www.zdnet.com/blog/storage/wh...ng-in-2009/162

Thinking about going with LVM that way I can always add another hard dis. Then I will make incremental backups on modified or new files and a full monthly backup to both one of those internal 1 TB hard disks, which are separate from the main LVM and then another to a hosted backup. Thanks for all the info guys.
 
Old 06-24-2010, 11:42 AM   #6
junglepunk
Member
 
Registered: Jun 2010
Posts: 41

Rep: Reputation: 15
It's important to note that RAID is NOT backup. It's fine if you lose a disk but if the controller goes you usually can't recover the array. Then you're screwed.
 
Old 06-25-2010, 08:52 AM   #7
trist007
Senior Member
 
Registered: May 2008
Distribution: Slackware
Posts: 1,052

Original Poster
Rep: Reputation: 70
bump
 
Old 06-25-2010, 04:17 PM   #8
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,998

Rep: Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628
Take a chance on ZFS.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Suggestions on setting up network - storage rishipandit007 Linux - Server 6 08-03-2009 11:57 AM
LXer: Distributed Storage Across Four Storage Nodes With GlusterFS On Debian Lenny LXer Syndicated Linux News 0 06-25-2009 02:40 PM
Suggestions for network storage fw12 Linux - Networking 2 12-31-2008 01:20 AM
LXer: Ten suggestions for free storage software that works LXer Syndicated Linux News 1 04-23-2007 02:11 PM
USB Mass Storage, Suggestions please VariableEnigma Linux - Hardware 6 02-09-2005 09:51 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration