LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-28-2010, 11:01 PM   #1
sharky
Member
 
Registered: Oct 2002
Posts: 569

Rep: Reputation: 84
sed or awk help


How can I take the following example from a text file

/this/is/the/dir P1
/this/is/the/dir P2
/this/is/the/dir P3
/this/is/another/dir P1
/this/is/another/dir P3

and generate the following using sed or awk (or any scripting language)?

/this/is/the/dir P1 P2 P3
/this/is/another/dir P1 P3

I'm trying to generate a report showing what projects are using which tools and I have 350 projects and over 1000 tools to parse through. Any help would be greatly appreciated.
 
Old 03-01-2010, 12:30 AM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
First, please confirm that you need to key on the actual content of the first field---ie that you don't know always what will be there.

Here's a stab at how this might go (pseudocode)

Code:
while reading the file, one line at a time:
   read the first field into a variable F1, and into a variable TMP
   continue reading as long as the first field matches F1
      remove the first field
      append the second field to TMP
   end inner loop
   write TMP to the output file
end outer loop
 
Old 03-01-2010, 02:03 AM   #3
murugesan
Member
 
Registered: May 2003
Location: Bangalore ,Karnataka, India, Asia, Earth, Solar system, milky way galaxy, black hole
Distribution: murugesan openssl
Posts: 181

Rep: Reputation: 29
Example is given here:
http://murugesan.webnode.com/technic...r-redirection/
 
Old 03-01-2010, 03:03 AM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
# awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i,a[i] }' file
/this/is/the/dir  P1 P2 P3
/this/is/another/dir  P1 P3
 
Old 03-01-2010, 03:04 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by murugesan View Post
lots of redundant steps in that script.
 
Old 03-01-2010, 03:43 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Nice, ghostdog! I'd only remove the comma from the print statement to avoid double spaces:
Code:
# awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i a[i] }' file
/this/is/the/dir P1 P2 P3
/this/is/another/dir P1 P3
 
Old 03-01-2010, 06:18 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
ghostdog ... i bow to you as the awk god ... are you able to point me to examples or tutorials
taht have some of the funky stuff you come up with?

Please ignore me ... as a fool I have only just looked at all the references in your signature

Last edited by grail; 03-01-2010 at 06:25 AM. Reason: blind idiot
 
Old 03-01-2010, 06:42 AM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by grail View Post
ghostdog ... i bow to you as the awk god ... are you able to point me to examples or tutorials
taht have some of the funky stuff you come up with?

Please ignore me ... as a fool I have only just looked at all the references in your signature
I can't speak for the resident "AWK-meister", but a lot of programmers come up with "funky stuff" by good old trial and error.
 
Old 03-01-2010, 08:05 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by grail View Post
ghostdog ... i bow to you as the awk god ... are you able to point me to examples or tutorials
taht have some of the funky stuff you come up with?
just read the link in my sig. it points to the Gawk manual. Also go to awk.info and have a look
 
Old 03-01-2010, 10:36 AM   #10
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by pixellany View Post
I can't speak for the resident "AWK-meister", but a lot of programmers come up with "funky stuff" by good old trial and error.
For me it's mostly error.
 
Old 03-01-2010, 11:35 AM   #11
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by sharky View Post
For me it's mostly error.
I'd call it... experience. What about your issue? Did the code suggested by ghostdog74 work for you? Can you show us what you've tried so far?
 
Old 03-01-2010, 12:02 PM   #12
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by colucix View Post
Nice, ghostdog! I'd only remove the comma from the print statement to avoid double spaces:
Code:
# awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i a[i] }' file
/this/is/the/dir P1 P2 P3
/this/is/another/dir P1 P3
It hard to say for certain because I'm dealing with such a large amount of data but this appears to work like charm.

Truly an amazing one liner. Unfortunately I don't have a clue how it works.
 
Old 03-01-2010, 12:15 PM   #13
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by sharky View Post
Truly an amazing one liner. Unfortunately I don't have a clue how it works.
Maybe the following will help a little, but I strongly suggest to read some good reference manual (the official gawk manual being the best, in my opinion). The statement
Code:
a[$1]=a[$1]" "$2
assigns values to array "a". Index in arrays can be any string, so that here we can use the first field $1 as index. The value is: the current value of the corresponding element of "a", followed by a blank space, followed by the content of the second field (simple string concatenation).

In other words the first field of each line of the input file is an index of the array, whereas the corresponding second fields are the values concatenated together.

In the END statement the whole array is scanned and each index is printed out together with the value of each array's element.
 
Old 03-02-2010, 05:17 PM   #14
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by colucix View Post
Maybe the following will help a little, but I strongly suggest to read some good reference manual (the official gawk manual being the best, in my opinion). The statement
Code:
a[$1]=a[$1]" "$2
assigns values to array "a". Index in arrays can be any string, so that here we can use the first field $1 as index. The value is: the current value of the corresponding element of "a", followed by a blank space, followed by the content of the second field (simple string concatenation).

In other words the first field of each line of the input file is an index of the array, whereas the corresponding second fields are the values concatenated together.

In the END statement the whole array is scanned and each index is printed out together with the value of each array's element.
This is what blows me away, "Index in arrays can be any string". That is handy. I would probably know that if I read the freakin manual.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with awk or sed. tuxtutorials Linux - Software 3 07-23-2009 03:26 PM
Help with awk or sed. tuxtutorials Linux - Software 1 07-23-2009 02:45 AM
sed or awk ilo Programming 1 08-22-2008 10:38 AM
SED/AWK help bioinformatics_guy Linux - Newbie 1 08-12-2008 05:07 AM
awk and/or sed linux2man Linux - General 7 01-22-2007 10:02 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration