LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 08-02-2012, 02:20 PM   #1
iconig
LQ Newbie
 
Registered: May 2012
Posts: 28

Rep: Reputation: Disabled
how many files can awk handle at a time?


Hi,
Please I would like to know how many files awk can handle at a time. I read somewhere that awk handles like 10 files at the same time. I don't think this may be correct because it would take a very long time to process very many files if this were to be so.
Say I want to run this program in for as many as 100 files
Code:
awk '{ $1 = 1 - $1; print }' b.1 b.2 b.3 ........... b.100
The problems I see with this program is having to manually list each file and secondly, like I stated earlier, awk may not run this

Code:
Secondly, how can I use awk in a loop in bash say with this given code?
Code:
third: can I pass different outputs into awk in a loop? how should that look?
Help needed, thanks in advance
 
Old 08-02-2012, 03:19 PM   #2
kakaka
Member
 
Registered: Sep 2003
Posts: 382

Rep: Reputation: 86
It would help us to help you, if you would please be a little more specific about what you are trying to do.

If you say "awk", you could be talking about awk in general, which could include "awk", "nawk", "gawk", "pgawk", and possibly others.

Under Linux, awk would probably most commonly be "gawk". If that's what you're actually using, I don't know that there is stated limit on the number files, as far the language is concerned. There can be a limit on the number of open files in a given implementation of Linux itself, both for the OS as a whole, and a per-process limit. There might be a limit for a particular implementation of gawk.

In general, it's probably not a bad idea, to keep as few simultaneously open files, as possible. So if you are planning on using more than a few files in a single program, and you'll iterate through a list of files, using one at a time, then closing each file after you're done with it, can be nice.

For you're situation, I'm expecting you're not going to have more than 100 files open at the same time, so I tried the following program, to make sure gawk could easily handle more than 100 simultaneously open files. I put the program in a file named fl.gawk:

Code:
END {

    for ( file_num = 1 ;  file_num <= 200 ;  file_num++  )
    {
        file_name = "b."  file_num ;
        print file_num  >  file_name ;
    }

    for ( file_num = 1 ;  file_num <= 200 ;  file_num++  )
    {
        file_name = "b."  file_num ;
        close( file_name ) ;
    }
}
In principle, the program only closes the files, after, they are all open. So at the end of the first loop, there should be at least 200 open files. I then executed the program this way:

Code:
gawk -f fl.gawk < /dev/null
with no errors. This works for me, with the binaries I have installed. That doesn't guarantee that it will work for you. But AFAIK, the limit on the number of open files a given programming language can handle under Linux, is typically at least hundreds, if not thousands.

I executed, from a bash shell, your program, against the first 100 of the 200 files, like this:

Code:
gawk '{ $1 = 1 - $1; print }' b.{1..100..1}
which gave me this output:

Code:
0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10
-11
-12
-13
-14
-15
-16
-17
-18
-19
-20
-21
-22
-23
-24
-25
-26
-27
-28
-29
-30
-31
-32
-33
-34
-35
-36
-37
-38
-39
-40
-41
-42
-43
-44
-45
-46
-47
-48
-49
-50
-51
-52
-53
-54
-55
-56
-57
-58
-59
-60
-61
-62
-63
-64
-65
-66
-67
-68
-69
-70
-71
-72
-73
-74
-75
-76
-77
-78
-79
-80
-81
-82
-83
-84
-85
-86
-87
-88
-89
-90
-91
-92
-93
-94
-95
-96
-97
-98
-99
From your question, I'm not entirely sure how you want to use a loop in bash, nor exactly how you are thinking about passing different output into awk. But your program can be executed this way, using a loop from bash:

Code:
for file_name in b.{1..100..1}; do cat $file_name | gawk '{ $1 = 1 - $1; print }'; done
Executing that way, the output is the same, but you don't have to manually list all the file names.

HTH.
 
1 members found this post helpful.
Old 08-02-2012, 09:08 PM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,503

Rep: Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893
For consideration as a possible restriction, dealing with 100's of files has never caused me issue, however, placing large portions of data from said files
into something like an array to be used later has kicked me in the butt
Quote:
Secondly, how can I use awk in a loop in bash say with this given code?
The same as calling any other command from within a bash loop, awk is not special in this case.
Quote:
third: can I pass different outputs into awk in a loop? how should that look?
Perhaps checkout the '-v' option which allows you to create a variable and assign a value externally to running your awk code.
 
Old 08-03-2012, 12:11 AM   #4
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Distribution: UBUNTU 5.10 since Jul-18,2006 on Intel 820 DC
Posts: 627

Rep: Reputation: 137Reputation: 137
Since the OP talks of putting it in a loop, I expect that the files have to be processed sequentially.

Usually typing awk is sufficient since it links to the correct awk (nawk, gawk, stare etc)

As far as I know, the limitation of 10 arguments is only in the DOS batch file construct and doesn't apply to awk when run on a linux box.

One way to get what you need is:
First move all the files to a freshly created directory for this purpose
Second, navigate to the newly created directory.
Third, run code like this
Quote:
for i on $*
do
awk '{your code goes here ..}' $i
done
for fewer file, you can do something like

awk '{your code goes here ..}' file1 file2 ...
OK
 
1 members found this post helpful.
Old 08-03-2012, 11:59 AM   #5
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 1,434

Rep: Reputation: 599Reputation: 599Reputation: 599Reputation: 599Reputation: 599Reputation: 599
The limit on the number of args that can be passed to awk or any other command is just the ARG_MAX byte limit in the kernel. Linux kernels typically have a 128 Kilobyte limit, so it's seldom an issue in practice, but if you're concerned about wider compatibility, the Posix limit is 4KB.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
awk return an error that I don't know how to handle coss_cat Linux - Software 3 12-09-2011 10:15 AM
how to handle division by zero in awk dazdaz Programming 2 05-29-2011 07:15 PM
Limit in number of fields that awk can handle ? chargaff Programming 4 03-03-2011 08:36 AM
bash script how to handle crlf and cr at the same time new2linu Programming 3 10-22-2009 01:41 PM
How many clients one NIS server can handle at one time? SaleemS General 1 02-05-2008 11:16 AM


All times are GMT -5. The time now is 06:55 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration