LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 12-15-2009, 03:43 AM   #1
txalin
LQ Newbie
 
Registered: Dec 2009
Posts: 2

Rep: Reputation: 0
Opening big log files for sorting.


Hi all,

I'm a newbie on this forums so first of all i want to present myself. I'm a 29 years old spanish UNIX/linux administrator with 7 years of experience on administrating unix machines and also sercurity devices.

My (latest) problem is related with one script that i'm trying to put in production, but i'm encountering serious stopper problems. The goal of this script is to read one big oracle xml log and sort it by one date field and put the result to another file. Oracle log is about 36 gb per day and sort is made on the fly.

Now i'm making the sort and everything seems to be working fine, but i encounter one problem, my process, made in bash, start at 12pm and the problem is that sometimes it stop sortening at about 12am more or less, the process is active (ps -ef | grep sort) but the destination file don't receive anything, so i think that maybe the problem is that, for an unknow reason, my process close the original Oracle log after 12 hours.

The way i open this log is using FDs, on this way:

exec 0< logfile.log
while [ $start -eq 1 ]
do
read -r line
blablablaba
done

I think that maybe the file descriptors have a low priority on my kernel and it was closed after a few time, but i don't know how to verify this problem or modify this kernel parameter, someone knows how to do this or even tell me a better way of letting open files to read?

Thanks in advance and regards,.
 
Old 12-15-2009, 04:01 AM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
if you are going to work with big files, use tools like awk to parse the files, not bash's while loop. Also, since you have such a big file, is it not possible to use a database in your environment?
 
Old 12-15-2009, 04:09 AM   #3
AleLinuxBSD
Member
 
Registered: May 2006
Location: Italy
Distribution: Ubuntu, ArchLinux, Debian, SL, OpenBSD
Posts: 274

Rep: Reputation: 42
Perhaps a more safe way consist on write some simple program using a program language instead of using a script bash.
So you could have a better control, you specify a buffer that you clean when is full, after you write on the disk, so the impact on the memory resource of your system is low.

How many resource are available on your system after 24 use of that script?
(You can ran top on the start and on the "end" or using more sophisticated systems to analyze the situation on your server).

Quote:
Originally Posted by ghostdog74 View Post
is it not possible to use a database in your environment?
Nice idea.
Perhaps this could help the OP:
Oracle log files : An introduction

Last edited by AleLinuxBSD; 12-15-2009 at 04:20 AM.
 
Old 12-15-2009, 04:33 AM   #4
txalin
LQ Newbie
 
Registered: Dec 2009
Posts: 2

Original Poster
Rep: Reputation: 0
Unfortunatelly i cannot use a database on my system

Regarding resources available, it takes about 20% of one cpu (8 cores available on this machine), so it didn't have a big impact on my system (in fact, 90% is idle right now)

Regarding making a program in another languaje, wich one do you prefer for this tasks? I can do it on C o perl maybe but i don't know wich one is better for this.

And thanks for your help
 
Old 12-15-2009, 04:40 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by txalin View Post
Regarding making a program in another languaje, wich one do you prefer for this tasks? I can do it on C o perl maybe but i don't know wich one is better for this.
And thanks for your help
if you only know C and Perl, go for Perl for the speed of development, plus the advantage to be able to use Perl modules( even those for Oracle ) and XML parsing especially. You can also extend Perl with C if you want speed performance. Otherwise, you can code in pure C (for the speed factor), which, you have to to be prepared to spend some time on.
 
Old 12-15-2009, 05:01 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
Maybe you should consider a merge sort. Dynamically split the input into (disk) files and merge sort them. Should be less memory intensive.
File::Sort on CPAN looks like one candidate.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sorting large text files tmaxx AIX 14 02-19-2009 06:32 PM
how to use shell scripts for sorting files New2Linux06 Linux - Newbie 4 10-30-2006 08:02 AM
Sorting files in BASH deleted/ Linux - Newbie 16 01-26-2006 06:03 AM
HELP, sorting files by name with environment ar3ol Linux - Newbie 6 12-05-2005 04:03 PM
Can log files be time stamped? (such as FTP login and transfer log files) bripage Linux - Networking 6 08-08-2002 10:55 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 05:11 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration