Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a script that I would like to make recursive searhing down through all subfolders. The way it is written below, it looks in the 10th column of one file and return the number of occurances in that one file. I need it to run on a bunch of individual files that are each located in their own directory.
Code:
#!/usr/bin/awk -f
awk -F "," '$10 == 42 { if (FILENAME != last && last != "")
{
print last, count
count = 0
}
count++
last = FILENAME
}
END { print last, count }' fileABC.txt | sort -k2nr > column10_fileABC.out
awk is a text processing language, and it isn't really designed for the complex processing of files internally.
File handling like this is usually done at the shell level, using find or a loop of some kind.
Code:
find . -type f -name '*.txt' -exec awkscript '{}' \;
Unless you need it to output some kind of total for all the files together or similar? If so, give us some more details.
Edit: Looking again, the script you have posted is incorrect. The shebang designates it as an awk script, but the contents are actually a shell script that contains an awk command, plus a separate sort and final file redirection. Please explain in more detail exactly what this is supposed to be doing.
Last edited by David the H.; 11-07-2012 at 03:37 PM.
Reason: As stated
each file contains 17 columns of csv data. I need to investigate column 10 of each file and see how many times 42 (the answer to the universe ) occurs.
Using wildcards I can get the above script to work if it is run inside any one of the directories, like inside dir_171, but I need it to run at the dir_170_246 level on all the csv files that are inside all those subdirectories. Then I need it to spit out a report something like:
I'm barely literate (and fairly dangerous) in awk and bash. But the last time I tried a perl script I deleted the files in the remote home directory of every user on the server. I'm staying away from perl.
I set it up so that you could override the default startdir and output file on the command line. It's only been slightly tested though, since I don't have the data you do.
I ran the script located at the same level as the dir_170, dir_171, dir_172, etc.... using the following command:
Code:
sh davids_awk_script.bash /home/tabby/dir_170_246
thinking that dir_170_246 is the defaultdir that I'm supposed to enter, and I think that's correct because it didn't complain about no such file or directory
I'm not sure what the defaultfile should be set to because in line it is declared to be "file*.txt" I'm guessing from your comment that I could have entered file*.txt as a second argument to the command, like this:
Code:
sh davids_awk_script.bash /home/tabby/dir_170_246 file*.txt
but I didn't do that, I just gave it the directory as an input
when I run the script, here is what it prints to the screen:
Code:
davids_awk_script.bash: line 16: syntax error near unexpected token `<'
davids_awk_script.bash: line 16: `done < <( find ./ -type f -name "file*.txt" -print0 ) | -k2nr >"$outfile'
note that the beginning tick is slanted and the ending tick is not - I don't know if that helps you or not
so I tried playing around a bit with what I thought might make sense (in my very limited awk bash knowledge) but couldn't get it to go.
sh davids_awk_script.bash /home/tabby/dir_170_246 file*.txt
This forces /bin/sh as the interpreting program, which then attempts to process the file in posix portability mode. If your default sh shell isn't bash, and doesn't apppear to be, then it won't understand the bash-specific features I used.
(/bin/sh originally referred to the original bourne shell. These days it's usually a link to another shell like bash, dash, or ksh, but when invoked that way the shell used will run it in a posix/bourne compatablity mode.)
If a script has a #!shebang defined on the first line, then it already has everything it needs to run correctly on its own. Just chmod it to make it executable and run it directly.
(Either the location of the script needs to be in your PATH variable, or the full path to the file, absolute or relative, needs to be specified.)
As for "defaults", I just threw that in because I thought might be useful, using the substitution pattern "${var:-alternative}" So I set the first script argument to be the top directory, and the second to be the output file. If you don't supply that argument, or if the argument is null, then it will use the default instead.
Actually, you should probably also include a test or two to ensure that the locations actually exist before using them.
and from there you can change the first word to "find2perl". That gives as output a perl script that acts like the find command - that's going to give us the recursion.
In that script we need to add some payload to act on each matching file. So remove the "use strict" at the beginning and change the definition of the "wanted" function to this.
Code:
sub wanted {
my ($dev,$ino,$mode,$nlink,$uid,$gid);
(($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_));
return unless (defined($mode));
return unless -f _;
return unless (/\.txt$/);
$count=0;
open(F, "<$_") or die("open :$name: $!");
while(<F>) {
my @field=split(/,/);
next unless (defined($field[9]));
next unless ($field[9] =~ /^\d+$/);
$count++ if (42 == $field[9]); # fields counted from 0
}
close(F);
printf("%s %d\n", $name, $count) if ($count);
}
When we split the input line by commas the first field is called 0 and the 10th in awk terms is called 9th in perl.
I understand almost all of your post very well, and I got it to run, thank you soooo much!
I'm wondering why you guess I'm not running a bash shell? As far as I can tell it is. I have a .bashrc file. Could there be something that the Sys Admin guy has done that makes it less than a bash shell? or is it just that I'm such a newbie that I'm messing something up in how I'm doing things
I have a few other scripts that I think are bash scripts but they don't have anything on the first line, and they do what they are supposed to, why? should I put the "shebang" (being a girl, I'm not sure I like that name, but maybe ) on the first line?
oh, and how do I know if I should use the shebang or have on the first line #!/usr/bin/awk -f
Basically any 'scripting' lang file (ie non-binary) has the option to either
1. specify the tool (bash, sh, awk etc) externally, as you have originally done
OR
2. use the very 1st line inside the script to specify the tool to use (bash, sh, awk etc)
(I think its called shebang because its just how you try to pronounce hash-exclamation mark (aka 'bang') ) =>hash-bang => shebang
If you don't specify the shebang, then just doing ./myscript will cause the parser to use the current shell defined in your env eg bash, which may not be what you want.
Its also self-documenting ie if you specify the shebang, both you and anyone who comes later will know what should be used.
This is very important in prod envs, as using the wrong shell from the env may cause it to do unexpected things....
Note that file extensions are optional in *nix, the OS doesn't use them .
Also, most 'shell' files tend to have .sh extension for human info, even thouh they may be designed for different shells eg sh, bash, ksh.
Another good reason to have a shebang.
Also, you may have more than one version of a tool on the system in different locations eg /usr/bin/perl, /opt/usr/bin/perl
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.