LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Use sed to find and replace a url (https://www.linuxquestions.org/questions/linux-software-2/use-sed-to-find-and-replace-a-url-590903/)

xmrkite 10-10-2007 05:31 PM

Use sed to find and replace a url
 
Hello. Sed is a tough learn.

I need to take several files each with a bunch of urls in them and get rid of parts of the url.

In the code of the files, it reads something to the effect of:
Code:

<a href='http://www.yahoo.com/here-is-testpage-this-is-the-page.aspx'>
<a href='http://www.yahoo.com/here-is-goodpage-this-is-the-page.aspx'>
<a href='http://www.yahoo.com/here-is-badpage-this-is-the-page.aspx'>

I need to end up with just
Code:

testpage
goodpage
badpage

So i need to get rid of the
Code:

<a href='http://www.yahoo.com/
and the
Code:

here-is-
and then the
Code:

-this-is-the-page.aspx'>
Currently, i open the files up in gedit and do find and replace, where i find "here-is-" and replace it with nothing, so that deletes it.

There must be a way to use sed. I want to write a few scripts to do this automatically so that i don't have to manually do this. (there are a lot of files to do this on)

Tinkster 10-10-2007 06:22 PM

Something like this?

Code:

$ cat test.html                                           
<a href='http://www.yahoo.com/here-is-testpage-this-is-the-page.aspx'>
<a href='http://www.yahoo.com/here-is-goodpage-this-is-the-page.aspx'>
<a href='http://www.yahoo.com/here-is-badpage-this-is-the-page.aspx'>
$ sed -r "s@^.+here-is-(.+)-this-is-the-page.aspx'>@\1@" test.html
testpage
goodpage
badpage
$

If this does what you want, just do
Code:

find -type f -name \*.html -exec sed -r -i "s@^.+here-is-(.+)-this-is-the-page.aspx'>@\1@"  {} \;
and it will "fix" all files *html in the current directory and
all subdirs.


Cheers,
Tink

xmrkite 10-10-2007 06:56 PM

You are the man! That worked exactly how i needed it. If only i understood why?
-Thanks

syg00 10-10-2007 07:14 PM

It probably ain't "sed" that's the tough learn, it's regex.
Plenty of threads here recommending tutorials - but it's still a tough slog when you start.

Tinkster 10-10-2007 07:20 PM

Quote:

Originally Posted by xmrkite (Post 2920248)
You are the man! That worked exactly how i needed it. If only i understood why?
-Thanks

Thanks for the praise ;}

Which bit is giving trouble? Happy to elaborate :}



Cheers,
Tink


All times are GMT -5. The time now is 06:01 AM.