Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
03-02-2005, 12:56 AM
|
#1
|
|
Member
Registered: Oct 2004
Location: Texas
Distribution: Ubuntu - Home, RHEL4 - Server
Posts: 96
Rep:
|
Sed/Awk command help needed.
Ok, here is my question.
I have a block of html like this, and i want to grab the folder names (labled folder1, folder2, and folder3) with a sed or awk command. Any one here have any idea how to do that? Keep in mind, that whole thing is on one line (thats whats making it difficult for me).
<br></span><a class="header3" href="/blah/blah2/folder/index.html">Name</a><span class="text11"></span><br><a class="header3" href="/blah3/blah4/folder2/index.html">Name</a><span class="text11"></span><br><a class="header3" href="/blah5/blah6/folder3/index.html">Name</a><span class="text11"></span><br>
I would appreciate any help anyone can offer! Let me know if i need to provide some more details.
Thanks!
-farmerjoe
|
|
|
|
03-02-2005, 03:38 AM
|
#2
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,690
|
Hi,
This should work:
awk 'BEGIN { RS="href=\"" } { print $1 }' infile | sed -n 's/\(.*\)">.*/\1/p'
Used sed and awk to do this, there are probably more ways, but this came to mind first.
The problem is split in two. First, using awk, the whole input line is split up in parts that are easier to work with:
awk 'BEGIN { RS="href=\"" } { print $1 }' infile
In (gnu) awk the seperator is not bound to one character, so the complete string href=" is set to be record seperator (RS) and only the first field of that record is printed. Using your example input you will end up with 4 lines:
<br></span><a
/blah/blah2/folder/index.html">Name</a><span
/blah3/blah4/folder2/index.html">Name</a><span
/blah5/blah6/folder3/index.html">Name</a><span
Secondly, using sed, the part you are looking for is extracted and printed:
sed -n 's/\(.*\)">.*/\1/p'
The -n and the p make sure that only hits are printed. In this case the first line (<br></span><a) will not be printed.
sed -n 's/\(.*\)">.*/\1/p'
The search string looks for 2 things:
1) 'anything' (.*)
2) "> followed by 'anything' thats after the first hit.
The \( and \) are special, everything in between these can be represented by \1 in the replace string.
All this put together you'll end up with:
/blah/blah2/folder/index.html
/blah3/blah4/folder2/index.html
/blah5/blah6/folder3/index.html
Hope this helps.
Last edited by druuna; 03-02-2005 at 03:40 AM.
|
|
|
|
03-02-2005, 10:58 AM
|
#3
|
|
Member
Registered: Oct 2004
Location: Texas
Distribution: Ubuntu - Home, RHEL4 - Server
Posts: 96
Original Poster
Rep:
|
Ahhhh. Very helpful. Not only did you solve my original problem, but showed me the answer in a very informative way! Havent you come to my rescue before?
Thanks again!
-farmerjoe
|
|
|
|
03-02-2005, 11:13 AM
|
#4
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,690
|
Hi again,
Quote:
|
Havent you come to my rescue before?
|
Yep, a few days back (Sed Help Needed..Cutting out peice of file)
The sed part should look familiar
You're welcome!
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 11:31 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|