LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-01-2010, 12:01 AM   #1
sharky
Member
 
Registered: Oct 2002
Posts: 396

Rep: Reputation: 37
sed or awk help


How can I take the following example from a text file

/this/is/the/dir P1
/this/is/the/dir P2
/this/is/the/dir P3
/this/is/another/dir P1
/this/is/another/dir P3

and generate the following using sed or awk (or any scripting language)?

/this/is/the/dir P1 P2 P3
/this/is/another/dir P1 P3

I'm trying to generate a report showing what projects are using which tools and I have 350 projects and over 1000 tools to parse through. Any help would be greatly appreciated.
 
Old 03-01-2010, 01:30 AM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
First, please confirm that you need to key on the actual content of the first field---ie that you don't know always what will be there.

Here's a stab at how this might go (pseudocode)

Code:
while reading the file, one line at a time:
   read the first field into a variable F1, and into a variable TMP
   continue reading as long as the first field matches F1
      remove the first field
      append the second field to TMP
   end inner loop
   write TMP to the output file
end outer loop
 
Old 03-01-2010, 03:03 AM   #3
murugesan
Member
 
Registered: May 2003
Posts: 149

Rep: Reputation: 28
Example is given here:
http://murugesan.webnode.com/technic...r-redirection/
 
Old 03-01-2010, 04:03 AM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Code:
# awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i,a[i] }' file
/this/is/the/dir  P1 P2 P3
/this/is/another/dir  P1 P3
 
Old 03-01-2010, 04:04 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by murugesan View Post
lots of redundant steps in that script.
 
Old 03-01-2010, 04:43 AM   #6
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Nice, ghostdog! I'd only remove the comma from the print statement to avoid double spaces:
Code:
# awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i a[i] }' file
/this/is/the/dir P1 P2 P3
/this/is/another/dir P1 P3
 
Old 03-01-2010, 07:18 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,627

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
ghostdog ... i bow to you as the awk god ... are you able to point me to examples or tutorials
taht have some of the funky stuff you come up with?

Please ignore me ... as a fool I have only just looked at all the references in your signature

Last edited by grail; 03-01-2010 at 07:25 AM. Reason: blind idiot
 
Old 03-01-2010, 07:42 AM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Quote:
Originally Posted by grail View Post
ghostdog ... i bow to you as the awk god ... are you able to point me to examples or tutorials
taht have some of the funky stuff you come up with?

Please ignore me ... as a fool I have only just looked at all the references in your signature
I can't speak for the resident "AWK-meister", but a lot of programmers come up with "funky stuff" by good old trial and error.
 
Old 03-01-2010, 09:05 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by grail View Post
ghostdog ... i bow to you as the awk god ... are you able to point me to examples or tutorials
taht have some of the funky stuff you come up with?
just read the link in my sig. it points to the Gawk manual. Also go to awk.info and have a look
 
Old 03-01-2010, 11:36 AM   #10
sharky
Member
 
Registered: Oct 2002
Posts: 396

Original Poster
Rep: Reputation: 37
Quote:
Originally Posted by pixellany View Post
I can't speak for the resident "AWK-meister", but a lot of programmers come up with "funky stuff" by good old trial and error.
For me it's mostly error.
 
Old 03-01-2010, 12:35 PM   #11
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Quote:
Originally Posted by sharky View Post
For me it's mostly error.
I'd call it... experience. What about your issue? Did the code suggested by ghostdog74 work for you? Can you show us what you've tried so far?
 
Old 03-01-2010, 01:02 PM   #12
sharky
Member
 
Registered: Oct 2002
Posts: 396

Original Poster
Rep: Reputation: 37
Quote:
Originally Posted by colucix View Post
Nice, ghostdog! I'd only remove the comma from the print statement to avoid double spaces:
Code:
# awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i a[i] }' file
/this/is/the/dir P1 P2 P3
/this/is/another/dir P1 P3
It hard to say for certain because I'm dealing with such a large amount of data but this appears to work like charm.

Truly an amazing one liner. Unfortunately I don't have a clue how it works.
 
Old 03-01-2010, 01:15 PM   #13
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Quote:
Originally Posted by sharky View Post
Truly an amazing one liner. Unfortunately I don't have a clue how it works.
Maybe the following will help a little, but I strongly suggest to read some good reference manual (the official gawk manual being the best, in my opinion). The statement
Code:
a[$1]=a[$1]" "$2
assigns values to array "a". Index in arrays can be any string, so that here we can use the first field $1 as index. The value is: the current value of the corresponding element of "a", followed by a blank space, followed by the content of the second field (simple string concatenation).

In other words the first field of each line of the input file is an index of the array, whereas the corresponding second fields are the values concatenated together.

In the END statement the whole array is scanned and each index is printed out together with the value of each array's element.
 
Old 03-02-2010, 06:17 PM   #14
sharky
Member
 
Registered: Oct 2002
Posts: 396

Original Poster
Rep: Reputation: 37
Quote:
Originally Posted by colucix View Post
Maybe the following will help a little, but I strongly suggest to read some good reference manual (the official gawk manual being the best, in my opinion). The statement
Code:
a[$1]=a[$1]" "$2
assigns values to array "a". Index in arrays can be any string, so that here we can use the first field $1 as index. The value is: the current value of the corresponding element of "a", followed by a blank space, followed by the content of the second field (simple string concatenation).

In other words the first field of each line of the input file is an index of the array, whereas the corresponding second fields are the values concatenated together.

In the END statement the whole array is scanned and each index is printed out together with the value of each array's element.
This is what blows me away, "Index in arrays can be any string". That is handy. I would probably know that if I read the freakin manual.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with awk or sed. tuxtutorials Linux - Software 3 07-23-2009 04:26 PM
Help with awk or sed. tuxtutorials Linux - Software 1 07-23-2009 03:45 AM
sed or awk ilo Programming 1 08-22-2008 11:38 AM
SED/AWK help bioinformatics_guy Linux - Newbie 1 08-12-2008 06:07 AM
awk and/or sed linux2man Linux - General 7 01-22-2007 11:02 AM


All times are GMT -5. The time now is 08:09 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration