LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-23-2013, 10:49 AM   #1
epols
LQ Newbie
 
Registered: Sep 2008
Posts: 3

Rep: Reputation: 0
Question summing up count returned by several bash commands to divide load


I have a script that executes the following to count all instances with the matching regex:

zcat /path/to/logs/today/* | grep '%[A-Z0-9_]\+-' | grep -v 'Primary ID' | wc -l

It basically looks through every file in the today directory, finds every line that matches that regex, and returns the SUM of those lines for a report I run.

This works fine on a directory that contains <15000 individual files, but anytime I run it on a directory with <15K files, I get "argument too long", regardless of the piping that occurs after the zcat

So the only way I can think to accomplish this, is to run it in stages, for example:

Stage 1: ls -l | wc -1 (this returns total count of files in directory, ex: 45000)
Stage 2: Divide by 3 = 3x15000 sets of files
Stage 3: Run the command on the first 15000 files (listed alphabetically) and return that count: zcat /path/to/logs/today/* | grep '%[A-Z0-9_]\+-' | grep -v 'Primary ID' | wc -l
Stage 4: Run the command on the second set of 15k and return a count
Stage 5: Run the command on the third set of 15K and return a count
Stage 6: Sum up the counts of all returned in all three sets

Can anyone suggest a way to achieve this? The above command is executed in part of a script using variables for the directory.
 
Old 04-23-2013, 02:20 PM   #2
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
You can try:
Code:
for i in /path/to/logs/today/* ; do
  zcat $i
done | grep '%[A-Z0-9_]\+-' | grep -v 'Primary ID' | wc -l
The major difference is that the "for i in" (and the wildcard list) is handled by bash and is not passed as a parameter list (as in "zcat /path/to/logs/today/*") is done, so it doesn't have the same restrictions (memory allocation for parameters for an exec...)

Alternatively you could use find to serialize such access:

Code:
find /path/to/logs/today -name '*' -exec zcat {} ';' | grep '%[A-Z0-9_]\+-' | grep -v 'Primary ID' | wc -l
This works because find is performing the "readdir" and the filename expansion search (the -name '*'), then executes zcat on each file it finds.

In both cases, there is one zcat process per file. Otherwise you have to do some awkward thing like reading the file name 1000 times to make a list, then execute a zcat for that 1000 list -- and you still have the issue of creating that list using something like find.

Last edited by jpollard; 04-23-2013 at 02:27 PM.
 
1 members found this post helpful.
Old 04-24-2013, 07:58 AM   #3
epols
LQ Newbie
 
Registered: Sep 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jpollard View Post
You can try:
Code:
for i in /path/to/logs/today/* ; do
  zcat $i
done | grep '%[A-Z0-9_]\+-' | grep -v 'Primary ID' | wc -l
The major difference is that the "for i in" (and the wildcard list) is handled by bash and is not passed as a parameter list (as in "zcat /path/to/logs/today/*") is done, so it doesn't have the same restrictions (memory allocation for parameters for an exec...)

Alternatively you could use find to serialize such access:

Code:
find /path/to/logs/today -name '*' -exec zcat {} ';' | grep '%[A-Z0-9_]\+-' | grep -v 'Primary ID' | wc -l
This works because find is performing the "readdir" and the filename expansion search (the -name '*'), then executes zcat on each file it finds.

In both cases, there is one zcat process per file. Otherwise you have to do some awkward thing like reading the file name 1000 times to make a list, then execute a zcat for that 1000 list -- and you still have the issue of creating that list using something like find.

You, Sir, are awesome. Thank you so much for taking the time to think about the problem and come up with a great solution. I ended up going with your second solution as it'll fit into my original script much more easily. I owe you a beer!
 
  


Reply

Tags
bash, scripting



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Easy question to answer. Count commands in terminal Grassjelly Linux - Newbie 6 04-20-2012 01:27 AM
some questions about load cycle count tahmoores2569 Linux - Laptop and Netbook 1 09-09-2010 11:08 AM
bash math...can't divide!! vous Programming 9 07-07-2009 03:22 AM
unwanted characters returned with bash find - how to remove? babag Programming 5 06-10-2007 09:36 PM
Problems with the export & sed commands... Unexpected respose returned..! MC1903 Linux - Newbie 5 02-07-2007 04:20 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 07:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration