sed or awk help

sharky · 02-28-2010, 11:01 PM

How can I take the following example from a text file

/this/is/the/dir P1
/this/is/the/dir P2
/this/is/the/dir P3
/this/is/another/dir P1
/this/is/another/dir P3

and generate the following using sed or awk (or any scripting language)?

/this/is/the/dir P1 P2 P3
/this/is/another/dir P1 P3

I'm trying to generate a report showing what projects are using which tools and I have 350 projects and over 1000 tools to parse through. Any help would be greatly appreciated.

pixellany · 03-01-2010, 12:30 AM

First, please confirm that you need to key on the actual content of the first field---ie that you don't know always what will be there.

Here's a stab at how this might go (pseudocode)

Code:

while reading the file, one line at a time:
   read the first field into a variable F1, and into a variable TMP
   continue reading as long as the first field matches F1
      remove the first field
      append the second field to TMP
   end inner loop
   write TMP to the output file
end outer loop

murugesan · 03-01-2010, 02:03 AM

Example is given here:
http://murugesan.webnode.com/technic...r-redirection/

ghostdog74 · 03-01-2010, 03:03 AM

Code:

# awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i,a[i] }' file
/this/is/the/dir  P1 P2 P3
/this/is/another/dir  P1 P3

ghostdog74 · 03-01-2010, 03:04 AM

Quote:

Originally Posted by murugesan

Example is given here:
http://murugesan.webnode.com/technic...r-redirection/

lots of redundant steps in that script.

colucix · 03-01-2010, 03:43 AM

Nice, ghostdog! I'd only remove the comma from the print statement to avoid double spaces:

Code:

# awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i a[i] }' file
/this/is/the/dir P1 P2 P3
/this/is/another/dir P1 P3

grail · 03-01-2010, 06:18 AM

ghostdog ... i bow to you as the awk god ... are you able to point me to examples or tutorials
taht have some of the funky stuff you come up with?

Please ignore me ... as a fool I have only just looked at all the references in your signature

pixellany · 03-01-2010, 06:42 AM

Quote:

Originally Posted by grail

ghostdog ... i bow to you as the awk god ... are you able to point me to examples or tutorials
taht have some of the funky stuff you come up with?

Please ignore me ... as a fool I have only just looked at all the references in your signature

I can't speak for the resident "AWK-meister", but a lot of programmers come up with "funky stuff" by good old trial and error.

ghostdog74 · 03-01-2010, 08:05 AM

Quote:

Originally Posted by grail

ghostdog ... i bow to you as the awk god ... are you able to point me to examples or tutorials
taht have some of the funky stuff you come up with?

just read the link in my sig. it points to the Gawk manual. Also go to awk.info and have a look

sharky · 03-01-2010, 10:36 AM

Quote:

Originally Posted by pixellany

I can't speak for the resident "AWK-meister", but a lot of programmers come up with "funky stuff" by good old trial and error.

For me it's mostly error.

colucix · 03-01-2010, 11:35 AM

Quote:

Originally Posted by sharky

For me it's mostly error.

I'd call it... experience. What about your issue? Did the code suggested by ghostdog74 work for you? Can you show us what you've tried so far?

sharky · 03-01-2010, 12:02 PM

Quote:

Originally Posted by colucix

Nice, ghostdog! I'd only remove the comma from the print statement to avoid double spaces:

Code:

# awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i a[i] }' file
/this/is/the/dir P1 P2 P3
/this/is/another/dir P1 P3

It hard to say for certain because I'm dealing with such a large amount of data but this appears to work like charm.

Truly an amazing one liner. Unfortunately I don't have a clue how it works.

colucix · 03-01-2010, 12:15 PM

Quote:

Originally Posted by sharky

Truly an amazing one liner. Unfortunately I don't have a clue how it works.

Maybe the following will help a little, but I strongly suggest to read some good reference manual (the official gawk manual being the best, in my opinion). The statement

Code:

a[$1]=a[$1]" "$2

assigns values to array "a". Index in arrays can be any string, so that here we can use the first field $1 as index. The value is: the current value of the corresponding element of "a", followed by a blank space, followed by the content of the second field (simple string concatenation).

In other words the first field of each line of the input file is an index of the array, whereas the corresponding second fields are the values concatenated together.

In the END statement the whole array is scanned and each index is printed out together with the value of each array's element.

sharky · 03-02-2010, 05:17 PM

Quote:

Originally Posted by colucix

Maybe the following will help a little, but I strongly suggest to read some good reference manual (the official gawk manual being the best, in my opinion). The statement

Code:

a[$1]=a[$1]" "$2

assigns values to array "a". Index in arrays can be any string, so that here we can use the first field $1 as index. The value is: the current value of the corresponding element of "a", followed by a blank space, followed by the content of the second field (simple string concatenation).

In other words the first field of each line of the input file is an index of the array, whereas the corresponding second fields are the values concatenated together.

In the END statement the whole array is scanned and each index is printed out together with the value of each array's element.

This is what blows me away, "Index in arrays can be any string". That is handy. I would probably know that if I read the freakin manual.