LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-02-2005, 12:56 AM   #1
farmerjoe
Member
 
Registered: Oct 2004
Location: Texas
Distribution: Ubuntu - Home, RHEL4 - Server
Posts: 96

Rep: Reputation: 15
Sed/Awk command help needed.


Ok, here is my question.
I have a block of html like this, and i want to grab the folder names (labled folder1, folder2, and folder3) with a sed or awk command. Any one here have any idea how to do that? Keep in mind, that whole thing is on one line (thats whats making it difficult for me).


<br></span><a class="header3" href="/blah/blah2/folder/index.html">Name</a><span class="text11"></span><br><a class="header3" href="/blah3/blah4/folder2/index.html">Name</a><span class="text11"></span><br><a class="header3" href="/blah5/blah6/folder3/index.html">Name</a><span class="text11"></span><br>



I would appreciate any help anyone can offer! Let me know if i need to provide some more details.

Thanks!
-farmerjoe
 
Old 03-02-2005, 03:38 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Hi,

This should work:

awk 'BEGIN { RS="href=\"" } { print $1 }' infile | sed -n 's/\(.*\)">.*/\1/p'

Used sed and awk to do this, there are probably more ways, but this came to mind first.

The problem is split in two. First, using awk, the whole input line is split up in parts that are easier to work with:

awk 'BEGIN { RS="href=\"" } { print $1 }' infile

In (gnu) awk the seperator is not bound to one character, so the complete string href=" is set to be record seperator (RS) and only the first field of that record is printed. Using your example input you will end up with 4 lines:

<br></span><a
/blah/blah2/folder/index.html">Name</a><span
/blah3/blah4/folder2/index.html">Name</a><span
/blah5/blah6/folder3/index.html">Name</a><span

Secondly, using sed, the part you are looking for is extracted and printed:

sed -n 's/\(.*\)">.*/\1/p'

The -n and the p make sure that only hits are printed. In this case the first line (<br></span><a) will not be printed.

sed -n 's/\(.*\)">.*/\1/p'
The search string looks for 2 things:
1) 'anything' (.*)
2) "> followed by 'anything' thats after the first hit.

The \( and \) are special, everything in between these can be represented by \1 in the replace string.

All this put together you'll end up with:

/blah/blah2/folder/index.html
/blah3/blah4/folder2/index.html
/blah5/blah6/folder3/index.html

Hope this helps.

Last edited by druuna; 03-02-2005 at 03:40 AM.
 
Old 03-02-2005, 10:58 AM   #3
farmerjoe
Member
 
Registered: Oct 2004
Location: Texas
Distribution: Ubuntu - Home, RHEL4 - Server
Posts: 96

Original Poster
Rep: Reputation: 15
Ahhhh. Very helpful. Not only did you solve my original problem, but showed me the answer in a very informative way! Havent you come to my rescue before?

Thanks again!
-farmerjoe
 
Old 03-02-2005, 11:13 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Hi again,

Quote:
Havent you come to my rescue before?
Yep, a few days back (Sed Help Needed..Cutting out peice of file)
The sed part should look familiar

Quote:
Thanks again!
You're welcome!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
SED, AWK or PERL HELP embsupafly Programming 6 08-20-2005 09:07 PM
Sed & Awk hinetvenkat Linux - Software 4 05-30-2005 05:10 AM
awk and sed issues alaios Linux - General 11 03-24-2005 05:33 AM
sed or awk help requested tonyfreeman Programming 7 10-03-2004 12:23 AM
awk/sed help pantera Programming 1 05-13-2004 11:59 PM


All times are GMT -5. The time now is 07:32 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration