help to make recursive

atjurhs · 11-07-2012, 11:11 AM

Hi guys,

I have a script that I would like to make recursive searhing down through all subfolders. The way it is written below, it looks in the 10th column of one file and return the number of occurances in that one file. I need it to run on a bunch of individual files that are each located in their own directory.

Code:

#!/usr/bin/awk -f

awk -F "," '$10 == 42 { if (FILENAME != last && last != "")
                          {
                          print last, count
                          count = 0
                          }
                          count++
                          last = FILENAME
                      }
END { print last, count }' fileABC.txt | sort -k2nr > column10_fileABC.out

thanks soooo much for you help

Tabitha

David the H. · 11-07-2012, 03:30 PM

awk is a text processing language, and it isn't really designed for the complex processing of files internally.

File handling like this is usually done at the shell level, using find or a loop of some kind.

Code:

find . -type f -name '*.txt' -exec awkscript '{}' \;

Unless you need it to output some kind of total for all the files together or similar? If so, give us some more details.

Edit: Looking again, the script you have posted is incorrect. The shebang designates it as an awk script, but the contents are actually a shell script that contains an awk command, plus a separate sort and final file redirection. Please explain in more detail exactly what this is supposed to be doing.

atjurhs · 11-07-2012, 09:00 PM

Hi,

I have a file and directory structure that looks like this:

Code:

dir_170_246
    dir_170
        file_AAA.txt
        file_BBB.txt
        file_CCC.txt
    dir_171
        file_DDD.txt
        file_EEE.txt
        file_FFF.txt
        file_GGG.txt
        file_HHH.txt
        file_III.txt
        file_JJJ.txt
    dir_172
        file_KKK.txt
        file_LLL.txt
    dir_173
        file_MMM.txt
        file_NNN.txt
        file_OOO.txt
        file_PPP.txt
etc...

each file contains 17 columns of csv data. I need to investigate column 10 of each file and see how many times 42 (the answer to the universe

) occurs.

Using wildcards I can get the above script to work if it is run inside any one of the directories, like inside dir_171, but I need it to run at the dir_170_246 level on all the csv files that are inside all those subdirectories. Then I need it to spit out a report something like:

this request is a follow-on to http://www.linuxquestions.org/questi...rn-4175425979/

Code:

dir_170/file_AAA.txt 8
dir_170/file_BBB.txt 5
dir_170/file_CCC.txt 0
dir_171/file_DDD.txt 14
dir_171/file_EEE.txt 1
dir_171/file_FFF.txt 0
dir_171/file_GGG.txt 0
dir_171/file_HHH.txt 0
dir_171/file_III.txt 31
dir_171/file_JJJ.txt 0
dir_172/file_KKK.txt 0
dir_172/file_LLL.txt 8
dir_173/file_MMM.txt 1
dir_173/file_NNN.txt 79
dir_173/file_OOO.txt 0
dir_173/file_MMM.txt 42

etc...

now, I know which files I should spend more time looking at

thanks so much for your help!

Tabby

linosaurusroot · 11-07-2012, 09:46 PM

find2perl will make you a perl script with recursion - with a stub where you need to add something similar to your awk function.

atjurhs · 11-07-2012, 09:52 PM

I'm barely literate (and fairly dangerous) in awk and bash. But the last time I tried a perl script I deleted the files in the remote home directory of every user on the server. I'm staying away from perl.

Tabitha

atjurhs · 11-08-2012, 09:40 AM

I googled the find2perl command and it's usage is beyound me

Tabitha

David the H. · 11-08-2012, 03:33 PM

I think then that what you really want is something like this:

Code:

#!/bin/bash

topdir=${1:-defaultdir}
outfile=${2:-defaultfile}

awksearch(){
	awk -F ',' '$10 == 42 { count++ } END{ print FILENAME , count }' "$1"
}

cd "$topdir"

while IFS='' read -r -d '' fname ; do

	awksearch "$fname"

done < <( find ./ -type f -name "file*.txt" -print0 ) | sort -k2nr >"$outfile"

exit 0

I set it up so that you could override the default startdir and output file on the command line. It's only been slightly tested though, since I don't have the data you do.

atjurhs · 11-09-2012, 01:11 PM

Hi David,

thanks sooo much for your help!

I ran the script located at the same level as the dir_170, dir_171, dir_172, etc.... using the following command:

Code:

sh davids_awk_script.bash /home/tabby/dir_170_246

thinking that dir_170_246 is the defaultdir that I'm supposed to enter, and I think that's correct because it didn't complain about no such file or directory

I'm not sure what the defaultfile should be set to because in line it is declared to be "file*.txt" I'm guessing from your comment that I could have entered file*.txt as a second argument to the command, like this:

Code:

sh davids_awk_script.bash /home/tabby/dir_170_246 file*.txt

but I didn't do that, I just gave it the directory as an input

when I run the script, here is what it prints to the screen:

Code:

davids_awk_script.bash: line 16: syntax error near unexpected token `<'
davids_awk_script.bash: line 16: `done < <( find ./ -type f -name "file*.txt" -print0 ) | -k2nr >"$outfile'

note that the beginning tick is slanted and the ending tick is not - I don't know if that helps you or not

so I tried playing around a bit with what I thought might make sense (in my very limited awk bash knowledge) but couldn't get it to go.

linosaurusroot · 11-09-2012, 05:12 PM

I've been detained by the working week and sleep deprivation from making more comments but I'm hoping to get on it in the morning.

David the H. · 11-10-2012, 12:26 PM

Don't run the script this way:

Code:

sh davids_awk_script.bash /home/tabby/dir_170_246 file*.txt

This forces /bin/sh as the interpreting program, which then attempts to process the file in posix portability mode. If your default sh shell isn't bash, and doesn't apppear to be, then it won't understand the bash-specific features I used.

(/bin/sh originally referred to the original bourne shell. These days it's usually a link to another shell like bash, dash, or ksh, but when invoked that way the shell used will run it in a posix/bourne compatablity mode.)

If a script has a #! shebang defined on the first line, then it already has everything it needs to run correctly on its own. Just chmod it to make it executable and run it directly.

Code:

/path/to/davids_awk_script.bash /home/tabby/dir_170_246 file*.txt

(Either the location of the script needs to be in your PATH variable, or the full path to the file, absolute or relative, needs to be specified.)

As for "defaults", I just threw that in because I thought might be useful, using the substitution pattern "${var:-alternative}" So I set the first script argument to be the top directory, and the second to be the output file. If you don't supply that argument, or if the argument is null, then it will use the default instead.

Actually, you should probably also include a test or two to ensure that the locations actually exist before using them.

linosaurusroot · 11-10-2012, 02:24 PM

I assume you know all about

Code:

find . -type f -name \*.txt

and from there you can change the first word to "find2perl". That gives as output a perl script that acts like the find command - that's going to give us the recursion.

In that script we need to add some payload to act on each matching file. So remove the "use strict" at the beginning and change the definition of the "wanted" function to this.

Code:

sub wanted {
    my ($dev,$ino,$mode,$nlink,$uid,$gid);
    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_));
    return unless (defined($mode));
    return unless -f _;
    return unless (/\.txt$/);
        $count=0;
        open(F, "<$_") or die("open :$name: $!");
        while(<F>) {
            my @field=split(/,/);
            next unless (defined($field[9]));
            next unless ($field[9] =~ /^\d+$/);
            $count++ if (42 == $field[9]); # fields counted from 0
        }
        close(F);
        printf("%s %d\n", $name, $count) if ($count);
}

When we split the input line by commas the first field is called 0 and the 10th in awk terms is called 9th in perl.

atjurhs · 11-13-2012, 09:15 AM

Good morning David,

I understand almost all of your post very well, and I got it to run, thank you soooo much!

I'm wondering why you guess I'm not running a bash shell? As far as I can tell it is. I have a .bashrc file. Could there be something that the Sys Admin guy has done that makes it less than a bash shell? or is it just that I'm such a newbie that I'm messing something up in how I'm doing things

chrism01 · 11-14-2012, 07:55 PM

Try

Code:

echo $SHELL

#which will likely return
/bin/bash

However, there are many shells available in /bin and you can invoke eg 'sh' simply by specifying it, as you have.

Instead of using

Code:

sh davids_awk_script.bash

just

Code:

./davids_awk_script.bash

assuming it has execute perms.
The shebang line (#!/bin/bash) tells it what shell to use; must be very first line.

You may find these useful
http://rute.2038bug.com/index.html.gz
http://tldp.org/LDP/Bash-Beginners-G...tml/index.html
http://www.tldp.org/LDP/abs/html/

atjurhs · 11-15-2012, 09:15 AM

thanks Chris!

ok so I have another question then:

I have a few other scripts that I think are bash scripts but they don't have anything on the first line, and they do what they are supposed to, why? should I put the "shebang" (being a girl, I'm not sure I like that name, but maybe

) on the first line?

oh, and how do I know if I should use the shebang or have on the first line #!/usr/bin/awk -f

thanks soooo much, Tabby

chrism01 · 11-15-2012, 07:31 PM

Basically any 'scripting' lang file (ie non-binary) has the option to either

1. specify the tool (bash, sh, awk etc) externally, as you have originally done
OR
2. use the very 1st line inside the script to specify the tool to use (bash, sh, awk etc)
(I think its called shebang because its just how you try to pronounce hash-exclamation mark (aka 'bang') ) =>hash-bang => shebang

If you don't specify the shebang, then just doing ./myscript will cause the parser to use the current shell defined in your env eg bash, which may not be what you want.
Its also self-documenting ie if you specify the shebang, both you and anyone who comes later will know what should be used.
This is very important in prod envs, as using the wrong shell from the env may cause it to do unexpected things....

Note that file extensions are optional in *nix, the OS doesn't use them .
Also, most 'shell' files tend to have .sh extension for human info, even thouh they may be designed for different shells eg sh, bash, ksh.
Another good reason to have a shebang.
Also, you may have more than one version of a tool on the system in different locations eg /usr/bin/perl, /opt/usr/bin/perl

HTH