LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-07-2012, 11:11 AM   #1
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Rep: Reputation: Disabled
help to make recursive


Hi guys,

I have a script that I would like to make recursive searhing down through all subfolders. The way it is written below, it looks in the 10th column of one file and return the number of occurances in that one file. I need it to run on a bunch of individual files that are each located in their own directory.

Code:
#!/usr/bin/awk -f

awk -F "," '$10 == 42 { if (FILENAME != last && last != "")
                          {
                          print last, count
                          count = 0
                          }
                          count++
                          last = FILENAME
                      }
END { print last, count }' fileABC.txt | sort -k2nr > column10_fileABC.out


thanks soooo much for you help

Tabitha
 
Old 11-07-2012, 03:30 PM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
awk is a text processing language, and it isn't really designed for the complex processing of files internally.

File handling like this is usually done at the shell level, using find or a loop of some kind.

Code:
find . -type f -name '*.txt' -exec awkscript '{}' \;
Unless you need it to output some kind of total for all the files together or similar? If so, give us some more details.


Edit: Looking again, the script you have posted is incorrect. The shebang designates it as an awk script, but the contents are actually a shell script that contains an awk command, plus a separate sort and final file redirection. Please explain in more detail exactly what this is supposed to be doing.

Last edited by David the H.; 11-07-2012 at 03:37 PM. Reason: As stated
 
Old 11-07-2012, 09:00 PM   #3
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
Hi,

I have a file and directory structure that looks like this:

Code:
dir_170_246
    dir_170
        file_AAA.txt
        file_BBB.txt
        file_CCC.txt
    dir_171
        file_DDD.txt
        file_EEE.txt
        file_FFF.txt
        file_GGG.txt
        file_HHH.txt
        file_III.txt
        file_JJJ.txt
    dir_172
        file_KKK.txt
        file_LLL.txt
    dir_173
        file_MMM.txt
        file_NNN.txt
        file_OOO.txt
        file_PPP.txt
etc...
each file contains 17 columns of csv data. I need to investigate column 10 of each file and see how many times 42 (the answer to the universe ) occurs.

Using wildcards I can get the above script to work if it is run inside any one of the directories, like inside dir_171, but I need it to run at the dir_170_246 level on all the csv files that are inside all those subdirectories. Then I need it to spit out a report something like:

this request is a follow-on to http://www.linuxquestions.org/questi...rn-4175425979/

Code:
dir_170/file_AAA.txt 8
dir_170/file_BBB.txt 5
dir_170/file_CCC.txt 0
dir_171/file_DDD.txt 14
dir_171/file_EEE.txt 1
dir_171/file_FFF.txt 0
dir_171/file_GGG.txt 0
dir_171/file_HHH.txt 0
dir_171/file_III.txt 31
dir_171/file_JJJ.txt 0
dir_172/file_KKK.txt 0
dir_172/file_LLL.txt 8
dir_173/file_MMM.txt 1
dir_173/file_NNN.txt 79
dir_173/file_OOO.txt 0
dir_173/file_MMM.txt 42

etc...
now, I know which files I should spend more time looking at

thanks so much for your help!

Tabby

Last edited by atjurhs; 11-08-2012 at 01:49 PM. Reason: typo correction
 
Old 11-07-2012, 09:46 PM   #4
linosaurusroot
Member
 
Registered: Oct 2012
Distribution: OpenSuSE,RHEL,Fedora,OpenBSD
Posts: 982
Blog Entries: 2

Rep: Reputation: 244Reputation: 244Reputation: 244
find2perl will make you a perl script with recursion - with a stub where you need to add something similar to your awk function.
 
Old 11-07-2012, 09:52 PM   #5
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
I'm barely literate (and fairly dangerous) in awk and bash. But the last time I tried a perl script I deleted the files in the remote home directory of every user on the server. I'm staying away from perl.

Tabitha
 
Old 11-08-2012, 09:40 AM   #6
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
I googled the find2perl command and it's usage is beyound me

Tabitha
 
Old 11-08-2012, 03:33 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
I think then that what you really want is something like this:


Code:
#!/bin/bash

topdir=${1:-defaultdir}
outfile=${2:-defaultfile}

awksearch(){
	awk -F ',' '$10 == 42 { count++ } END{ print FILENAME , count }' "$1"
}

cd "$topdir"

while IFS='' read -r -d '' fname ; do

	awksearch "$fname"

done < <( find ./ -type f -name "file*.txt" -print0 ) | sort -k2nr >"$outfile"

exit 0
I set it up so that you could override the default startdir and output file on the command line. It's only been slightly tested though, since I don't have the data you do.
 
Old 11-09-2012, 01:11 PM   #8
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
Hi David,

thanks sooo much for your help!

I ran the script located at the same level as the dir_170, dir_171, dir_172, etc.... using the following command:

Code:
sh davids_awk_script.bash /home/tabby/dir_170_246
thinking that dir_170_246 is the defaultdir that I'm supposed to enter, and I think that's correct because it didn't complain about no such file or directory

I'm not sure what the defaultfile should be set to because in line it is declared to be "file*.txt" I'm guessing from your comment that I could have entered file*.txt as a second argument to the command, like this:

Code:
sh davids_awk_script.bash /home/tabby/dir_170_246 file*.txt
but I didn't do that, I just gave it the directory as an input

when I run the script, here is what it prints to the screen:

Code:
davids_awk_script.bash: line 16: syntax error near unexpected token `<'
davids_awk_script.bash: line 16: `done < <( find ./ -type f -name "file*.txt" -print0 ) | -k2nr >"$outfile'

note that the beginning tick is slanted and the ending tick is not - I don't know if that helps you or not
so I tried playing around a bit with what I thought might make sense (in my very limited awk bash knowledge) but couldn't get it to go.

Last edited by atjurhs; 11-09-2012 at 01:14 PM.
 
Old 11-09-2012, 05:12 PM   #9
linosaurusroot
Member
 
Registered: Oct 2012
Distribution: OpenSuSE,RHEL,Fedora,OpenBSD
Posts: 982
Blog Entries: 2

Rep: Reputation: 244Reputation: 244Reputation: 244
I've been detained by the working week and sleep deprivation from making more comments but I'm hoping to get on it in the morning.
 
Old 11-10-2012, 12:26 PM   #10
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Don't run the script this way:

Code:
sh davids_awk_script.bash /home/tabby/dir_170_246 file*.txt
This forces /bin/sh as the interpreting program, which then attempts to process the file in posix portability mode. If your default sh shell isn't bash, and doesn't apppear to be, then it won't understand the bash-specific features I used.

(/bin/sh originally referred to the original bourne shell. These days it's usually a link to another shell like bash, dash, or ksh, but when invoked that way the shell used will run it in a posix/bourne compatablity mode.)

If a script has a #! shebang defined on the first line, then it already has everything it needs to run correctly on its own. Just chmod it to make it executable and run it directly.

Code:
/path/to/davids_awk_script.bash /home/tabby/dir_170_246 file*.txt
(Either the location of the script needs to be in your PATH variable, or the full path to the file, absolute or relative, needs to be specified.)

As for "defaults", I just threw that in because I thought might be useful, using the substitution pattern "${var:-alternative}" So I set the first script argument to be the top directory, and the second to be the output file. If you don't supply that argument, or if the argument is null, then it will use the default instead.

Actually, you should probably also include a test or two to ensure that the locations actually exist before using them.
 
Old 11-10-2012, 02:24 PM   #11
linosaurusroot
Member
 
Registered: Oct 2012
Distribution: OpenSuSE,RHEL,Fedora,OpenBSD
Posts: 982
Blog Entries: 2

Rep: Reputation: 244Reputation: 244Reputation: 244
I assume you know all about
Code:
find . -type f -name \*.txt
and from there you can change the first word to "find2perl". That gives as output a perl script that acts like the find command - that's going to give us the recursion.

In that script we need to add some payload to act on each matching file. So remove the "use strict" at the beginning and change the definition of the "wanted" function to this.

Code:
sub wanted {
    my ($dev,$ino,$mode,$nlink,$uid,$gid);
    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_));
    return unless (defined($mode));
    return unless -f _;
    return unless (/\.txt$/);
        $count=0;
        open(F, "<$_") or die("open :$name: $!");
        while(<F>) {
            my @field=split(/,/);
            next unless (defined($field[9]));
            next unless ($field[9] =~ /^\d+$/);
            $count++ if (42 == $field[9]); # fields counted from 0
        }
        close(F);
        printf("%s %d\n", $name, $count) if ($count);
}
When we split the input line by commas the first field is called 0 and the 10th in awk terms is called 9th in perl.
 
Old 11-13-2012, 09:15 AM   #12
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
Good morning David,

I understand almost all of your post very well, and I got it to run, thank you soooo much!

I'm wondering why you guess I'm not running a bash shell? As far as I can tell it is. I have a .bashrc file. Could there be something that the Sys Admin guy has done that makes it less than a bash shell? or is it just that I'm such a newbie that I'm messing something up in how I'm doing things
 
Old 11-14-2012, 07:55 PM   #13
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Try
Code:
echo $SHELL

#which will likely return
/bin/bash
However, there are many shells available in /bin and you can invoke eg 'sh' simply by specifying it, as you have.

Instead of using
Code:
sh davids_awk_script.bash
just
Code:
./davids_awk_script.bash
assuming it has execute perms.
The shebang line (#!/bin/bash) tells it what shell to use; must be very first line.

You may find these useful
http://rute.2038bug.com/index.html.gz
http://tldp.org/LDP/Bash-Beginners-G...tml/index.html
http://www.tldp.org/LDP/abs/html/
 
Old 11-15-2012, 09:15 AM   #14
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
thanks Chris!

ok so I have another question then:

I have a few other scripts that I think are bash scripts but they don't have anything on the first line, and they do what they are supposed to, why? should I put the "shebang" (being a girl, I'm not sure I like that name, but maybe ) on the first line?

oh, and how do I know if I should use the shebang or have on the first line #!/usr/bin/awk -f

thanks soooo much, Tabby
 
Old 11-15-2012, 07:31 PM   #15
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Basically any 'scripting' lang file (ie non-binary) has the option to either

1. specify the tool (bash, sh, awk etc) externally, as you have originally done
OR
2. use the very 1st line inside the script to specify the tool to use (bash, sh, awk etc)
(I think its called shebang because its just how you try to pronounce hash-exclamation mark (aka 'bang') ) =>hash-bang => shebang

If you don't specify the shebang, then just doing ./myscript will cause the parser to use the current shell defined in your env eg bash, which may not be what you want.
Its also self-documenting ie if you specify the shebang, both you and anyone who comes later will know what should be used.
This is very important in prod envs, as using the wrong shell from the env may cause it to do unexpected things....

Note that file extensions are optional in *nix, the OS doesn't use them .
Also, most 'shell' files tend to have .sh extension for human info, even thouh they may be designed for different shells eg sh, bash, ksh.
Another good reason to have a shebang.
Also, you may have more than one version of a tool on the system in different locations eg /usr/bin/perl, /opt/usr/bin/perl

HTH
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Silencing recursive make Shannon Barber Linux - General 2 01-21-2012 03:01 PM
make: *** [install-recursive] Error 1 phantom_cyph Linux - Software 1 02-19-2008 10:06 PM
make :recursive error appasamy Linux - Software 1 01-03-2005 12:01 AM
Recursive make SeanatIL Linux - Software 2 08-04-2004 03:47 PM
Stopping recursive make's on error Meatwad Programming 0 04-28-2004 05:31 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration