LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 12-05-2011, 04:25 PM   #1
jeenam
Member
 
Registered: Dec 2006
Distribution: Slackware 11
Posts: 144

Rep: Reputation: 15
Archiving massive amounts of log data (think petabytes)


Looking for some advice on setting up storage to handle very large log files. We are attempting to archive as much data as possible.

With the current storage configuration, the procedure is as follows:

1) Application server logs to local storage.

2) cron on Application server compresses logs older than X days and copies the compressed logs to secondary archive server.

3) Archive server prunes logs on daily basis when it detects storage <XX%.


The above procedure has worked up to this point. Unfortunately the amount of data being logged is increasing rapidly and we now barely have enough space to store 45 days worth of logs. Storage is being consumed at a rate of ~400GB/day, and we expect this number to double (800GB/day) by this time next year.


:Questions/Notes:

1) Is archiving to cloud storage feasible from a financial perspective?

2) Ideally we would like to keep logs for a minimum of 1 year.

3) Who manufactures chassis with large numbers of drive bays? I would prefer to build my own solution rather than pay a vendor such as HP/NetApp/EMC for one of their solutions.


If anyone out there has worked with storing/archiving massive amounts of data please chime in. TIA.
 
Old 12-05-2011, 05:41 PM   #2
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,982

Rep: Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625
Somebody is charging you for cloud. It can and is done more for a security issue, think Katrina offsite for a reason. Kind of hard to tell. Your local taxes and energy costs and cost to transmit that much data are all part of the basis.

Sun used to sell/rent trucks for this. You just ordered a seatrain box and it was full of energy efficient sparc servers and drives with all AC and such built in.


I'd start by reviewing this log deal. Is it worth that much money? Can you compress it by some means locally and store it locally compressed at all times.

A log has to have some useful reason to keep and be worth that effort.


Not sure you could use any sort of tape.

Last edited by jefro; 12-05-2011 at 05:43 PM.
 
Old 12-05-2011, 06:10 PM   #3
jeenam
Member
 
Registered: Dec 2006
Distribution: Slackware 11
Posts: 144

Original Poster
Rep: Reputation: 15
jefro - Currently the logs are stored compressed so we are generating 400GB+ of logs per day. I realize cloud storage involves a monthly cost of storage/GB so it will get pricey over the long term.

On a side note, it's insane how expensive storage has become due to the shortage of drives caused by flooding in Thailand.
 
Old 12-07-2011, 12:06 PM   #4
travisdh1
Member
 
Registered: Sep 2008
Distribution: Fedora
Posts: 129

Rep: Reputation: 22
Have you considered building one of the backblaze storage boxes? BackBlaze Storage Pod v2

The cost of these will have gone up quite a bit due to the increased hard drive prices which will be around until at least the middle of next year. They can handle up to 137 TB per pod. According to my quick-n-dirty math here at 800GB of log files per day you're going to need 292 terabytes worth of storage for one year. You'd get almost 1 year on two of those systems.

I know the prices have gone substantially up for hard drives (twice as much in a lot of cases) but this setup is still one of the cheapest systems that has some drive redundancy (and as we all know having a raid setup IS NOT A BACKUP, right?) I don't want to think about how much that much data being moved across a 'cloud' system would cost (generally they have a charge for how much total storage is used along with upload/download charges.) Let's just say expensive.

I'd start by taking a very hard look at what is being logged and how much information those logs have to have. At the scale you're talking about changing the logging level across the board will have large changes in the space requirements. You also might want to consider a combined near-term long-term data retention system where it will automatically move older files to tape but still have access to the files on the tapes.
 
Old 12-07-2011, 02:30 PM   #5
jthill
Member
 
Registered: Mar 2010
Distribution: Arch
Posts: 211

Rep: Reputation: 67
Second trafisdh1's tape suggestion. A quick LTO google shows $23/800GB these days, that's archival-quality storage and if you're doing that every day a heavyduty drive will amortize pretty darn quick. See the wikipedia article
 
1 members found this post helpful.
Old 12-07-2011, 09:10 PM   #6
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,982

Rep: Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625
Might still get some quotes for a tape backup.

I'd still look at trying a different compression. PAQ8PX -7 or see at least try peazip on a few logs.
 
Old 12-08-2011, 09:54 AM   #7
travisdh1
Member
 
Registered: Sep 2008
Distribution: Fedora
Posts: 129

Rep: Reputation: 22
Linear Tape File System

That's more like what I had in mind actually. Granted an LTO tape library is a little expensive but might make sense depending on your situation. (Just be ready to wait a few minutes while looking at files that end up on different tapes, yuck.) The other bonus is that you may be able to duel-purpose it to do backups as well, or maybe already have something like this running backups already!
 
Old 12-08-2011, 02:21 PM   #8
jeenam
Member
 
Registered: Dec 2006
Distribution: Slackware 11
Posts: 144

Original Poster
Rep: Reputation: 15
Backblaze - considering them for when we build out another storage server. Considering the cost of drives right now, that will not happen until 3rd quarter next year. We're a smaller startup, so springing for the insane costs of an autoloader/library is not a path I am willing to consider. This storage will be used for short term log retrieval (less than one year). As of right now, we do not need to permanently archive the data so no need for tape anyways.

Right now we're going to stick with a two tiered storage system where logs older than 45+ days are moved to another server. Admittedly, I was dreaming of completely nuking the existing environment and rebuilding from scratch but the existing environment serves its purpose for now. Plus there are other projects on the horizon so this will get put on the back burner for a few months.

Thanks for the input guys.
 
Old 12-13-2011, 09:01 PM   #9
kabars_edge
Member
 
Registered: Apr 2006
Location: Silver Spring, MD
Distribution: Debian
Posts: 40

Rep: Reputation: 8
Have you thought of a more efficient storage mechanism for you data? I know it's an outrageously expensive product, but have you ever thought of using Splunk and doing a distributed deployment? That way you can just add servers and/or storage as you go and it will be easy to tie the new resources into the old resources. Just a thought.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Best utility to copy massive amounts of data king0770 Linux - Hardware 2 07-16-2010 01:45 PM
Searching through massive amounts of Data sxa Linux - Software 5 02-27-2009 09:42 PM
DISCUSSION: Network Attached Storage – An Alternative To Tape Drives In Managing Massive Amounts of Data primearray.com LQ Articles Discussion 0 04-02-2006 04:48 PM
rm command is choking on large amounts of data? Jello Linux - General 18 02-28-2003 07:11 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 02:54 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration