LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Combining multiple AWK commands (https://www.linuxquestions.org/questions/linux-newbie-8/combining-multiple-awk-commands-4175512032/)

jonnybinthemix 07-23-2014 09:47 AM

Combining multiple AWK commands
 
Hey Guys,

I was wondering if there's a way and (if there is) do I need to change the format to combine multiple AWK commands?

I'm currently achieving what I want by doing the following:

Code:

diff -b $RLS $LLS | awk '{print $2}' | awk '$1=$1' | awk '{print "BAT_"$0".pgp" }' | while read i; do
I've tried just putting all the commands after one AWK but it displays an error. Is there a special syntax? Or have I done it right this way?

Thanks
Jon

schneidz 07-23-2014 09:49 AM

without hte full context it seems like this would be the same:
Code:

diff -b $RLS $LLS | awk '{print "BAT_" $2 ".pgp"}' | while read i; do

jonnybinthemix 07-23-2014 10:00 AM

Ah nice, that works :)

Makes sense... $2 straight into print..

Thanks, appreciate your help :D

But, if I wanted to combine multiple commands next time, can I just stack them up? Comma separated or something?

grail 07-23-2014 12:07 PM

I am not sure I followed the point of the original logic, specifically the need for the awk in the middle?
The first awk will only ever return a contiguous group of characters (or nothing assuming less than 2 fields), so the second awk, which would typically be used to remove
any additional whitespace, would have nothing to do.

As for grouping statements, it would really depend on just how much work each individual awk is doing and what the output is from each to the next.

jonnybinthemix 07-24-2014 03:25 AM

Hi Grail,

Thank for your message..

The logic behind the original commands were, using diff to check the difference of each variable (contains a list of filenames) - Then awk to just give me the file name (and omit the < or > at the beginning), Then I was noticing there was random white space, so I added the second awk command to make sure theres no white space and the third awk command to add BAT_ to the beginning of each filename and .pgp to the end of each filename.

The reason for the above if I'm downloading encrypted image files from an SFTP Server via script, and want to only download new images. However the existing images which have been downloaded, have already been decrypted and had the 'BAT_' removed and as they're decrypted the '.pgp' has also gone. So, to get two lists and do a comparison I first must take a list of what is on the server, strip the BAT_ & .pgp off, compare the two lists, take the differences, add the _BAT & .pgp to the filenames again and then tell a loop to download all files.

I've added the first part of the code to help explain my meaning.. (It all works without hitch, but I'm really happy to listen to other ways of doing it and if I'm not doing it the best way, I'd love to learn).

Code:

/usr/bin/expect <<! > $FTPLIST
        spawn sftp -o$PORT $USER@$HOST
        expect "password:"
        send "$PASS\r"
        expect "sftp>"
        send "cd output\r"
        expect "sftp>"
        send "ls -1 *.JPEG.pgp\r"
        send "bye\r"
        expect eof
!
grep 'BAT_' $FTPLIST | cut -c5- | sed 's/\(.*\)\..*/\1/' > $RLS

ls -1 > $LLS

diff -b $RLS $LLS | awk '{print "BAT_"$2".pgp" }' | while read i; do

The loop then goes on to download $i within another expect session.

schneidz 07-24-2014 07:24 AM

Quote:

Originally Posted by jonnybinthemix (Post 5208775)
... (It all works without hitch, but I'm really happy to listen to other ways of doing it and if I'm not doing it the best way, I'd love to learn).

Code:

/usr/bin/expect <<! > $FTPLIST
        spawn sftp -o$PORT $USER@$HOST
        expect "password:"
        send "$PASS\r"
        expect "sftp>"
        send "cd output\r"
        expect "sftp>"
        send "ls -1 *.JPEG.pgp\r"
        send "bye\r"
        expect eof
!
grep 'BAT_' $FTPLIST | cut -c5- | sed 's/\(.*\)\..*/\1/' > $RLS

ls -1 > $LLS

diff -b $RLS $LLS | awk '{print "BAT_"$2".pgp" }' | while read i; do

The loop then goes on to download $i within another expect session.

the obvious best way would be to use ssh with keys so passwords arent necessary ?

jonnybinthemix 07-24-2014 07:46 AM

Yes that would be nice, unfortunately I don't have this as an option.

pan64 07-24-2014 08:04 AM

you can also simplify the grep|cut|sed chain.
Code:

(not tested, because there is no sample input)
awk '/BAT_/ { a=substr($0, 5); b=split(a, "."); print b[0] } ' $FTPLIST > $RLS

diff can handle stdin, so:
ls -1 | diff -b $RLS - | awk '{print "BAT_"$2".pgp" }' | while ...
should work too


jonnybinthemix 07-24-2014 09:29 AM

Hi Pan64,

Thanks for your response.. What you've suggested looks interesting.

Sorry to be a pain, but if you've time and it's not too complex would you be able to explain your chain? The section within the {} looks new to me and I'd love to understand it as apposed to just use it :)

Thanks
Jon

grail 07-24-2014 10:45 AM

Actually I think pan64 has made a small mistake, but I understand where he was going. The mistake is that split returns the number of items after the split, whereas the second argument is
where we should place the 'b' variable.

So the re-write would be:
Code:

awk '/BAT_/ { a=substr($0, 5); split(a, b, "."); print b[1] }' $FTPLIST > $RLS
As a break down:

1. /BAT_/ :- Search for lines containing the string 'BAT_'

2. a=substr($0, 5) :- Assign to the variable 'a' everything stored in the record staring from the fifth character, ie. remove 'BAT_' ... which assumes we find only files starting with this string

3. split(a, b, ".") :- Split the data stored in variable 'a' using period ('.') as the separator and store each piece in the array 'b'

4. print b[1] :- Print the data stored in the first element of the array 'b' (awk arrays are indexed from 1 and not 0 {most of the time})

If you really wanted to, I believe you could perform the whole task in awk or bash and even at the point of not having to remove and re-add portions ... should be a nice challenge :)

pan64 07-25-2014 12:12 AM

thanks grail, that was the split of perl or python.
2. a=substr($0, 5) is more or less the same as your cut -c5- command
3. and 4. split the data using . and printing the first part - that works like the sed you gave.

jonnybinthemix 07-25-2014 04:29 AM

Hey Guys,

Thanks for the responses.. I've been playing around with the above and it works nice.

However, the command print b[1] of course prints the first section of the array, which in this instance is just the filename.

How do I print multiple sections of the array? For example if the array (when split) has 3 parts.. how would I print parts 1 & 2 and omit just part three?

For example the filenames in the $FTPLIST variable are looking like:

BAT_123456.JPEG.pgp
BAT_234567.JPEG.pgp
BAT_345678.JPEG.pgp

So I can use; awk ' /BAT_/' to display only the above files within that variable (works fine)

then a=substr($0, 5) to print the filename from the 5th character... and getting rid of the BAT_ (works fine)

then split (a, b, ".") to create an array named b, containing each section of the filename with "." separation (works fine, because if I change print b[2] it corresponds and prints JPEG)

then print b[1] which prints the first part of the array, which in this instance would be; 123456 234567 345678 (works)

So, I think I've understood it all okay... as it makes sense.

But, if I wanted to print 123456.JPEG 234567.JPEG 345678.JPEG how would I print both parts together?

I've tried print b[1]; print b[2] - which just prints both parts separately.

I've tried print b[1,2] which doesn't work. I've also tried print b[1-2] which doesn't work.

Any ideas?

Thanks, Jon

pan64 07-25-2014 04:44 AM

probably print b[1]" "b[2] will do that, but I'm not really sure I understand it well

jonnybinthemix 07-25-2014 04:58 AM

aha.. thanks :)

I played around with it and this works perfectly:

Code:

awk '/BAT_/ { a=substr($0, 5) split(a, b, "."); print b[1]"." b[2]}' $FTPLIST
I needed the filename 123456.JPEG, but adding the "."b[2] did that without a hitch :)

grail 07-25-2014 06:50 AM

As usual, always more than one way to skin things :)
Code:

awk 'match($0,/BAT_(.*)[.]pgp/,a){print a[1]}' $FTPLIST


All times are GMT -5. The time now is 11:51 PM.