Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm a newbie on this forums so first of all i want to present myself. I'm a 29 years old spanish UNIX/linux administrator with 7 years of experience on administrating unix machines and also sercurity devices.
My (latest) problem is related with one script that i'm trying to put in production, but i'm encountering serious stopper problems. The goal of this script is to read one big oracle xml log and sort it by one date field and put the result to another file. Oracle log is about 36 gb per day and sort is made on the fly.
Now i'm making the sort and everything seems to be working fine, but i encounter one problem, my process, made in bash, start at 12pm and the problem is that sometimes it stop sortening at about 12am more or less, the process is active (ps -ef | grep sort) but the destination file don't receive anything, so i think that maybe the problem is that, for an unknow reason, my process close the original Oracle log after 12 hours.
The way i open this log is using FDs, on this way:
exec 0< logfile.log
while [ $start -eq 1 ]
do
read -r line
blablablaba
done
I think that maybe the file descriptors have a low priority on my kernel and it was closed after a few time, but i don't know how to verify this problem or modify this kernel parameter, someone knows how to do this or even tell me a better way of letting open files to read?
if you are going to work with big files, use tools like awk to parse the files, not bash's while loop. Also, since you have such a big file, is it not possible to use a database in your environment?
Perhaps a more safe way consist on write some simple program using a program language instead of using a script bash.
So you could have a better control, you specify a buffer that you clean when is full, after you write on the disk, so the impact on the memory resource of your system is low.
How many resource are available on your system after 24 use of that script?
(You can ran top on the start and on the "end" or using more sophisticated systems to analyze the situation on your server).
Quote:
Originally Posted by ghostdog74
is it not possible to use a database in your environment?
Unfortunatelly i cannot use a database on my system
Regarding resources available, it takes about 20% of one cpu (8 cores available on this machine), so it didn't have a big impact on my system (in fact, 90% is idle right now)
Regarding making a program in another languaje, wich one do you prefer for this tasks? I can do it on C o perl maybe but i don't know wich one is better for this.
Regarding making a program in another languaje, wich one do you prefer for this tasks? I can do it on C o perl maybe but i don't know wich one is better for this.
And thanks for your help
if you only know C and Perl, go for Perl for the speed of development, plus the advantage to be able to use Perl modules( even those for Oracle ) and XML parsing especially. You can also extend Perl with C if you want speed performance. Otherwise, you can code in pure C (for the speed factor), which, you have to to be prepared to spend some time on.
Maybe you should consider a merge sort. Dynamically split the input into (disk) files and merge sort them. Should be less memory intensive.
File::Sort on CPAN looks like one candidate.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.