I need a little search & replace help.
Here's how far I have gotten.
1. started with a site map file like this:
Code:
<?xml version="1.0" encoding="utf-8"?>
<!--Created by Devintelligence.com Sitemap Generator-->
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://example.com</loc>
<lastmod>2009-04-04</lastmod>
<changefreq>daily</changefreq>
<priority>0.9</priority>
</url>
2. got rid of all lines except those with URLs
Code:
$ cat < example.com.SiteMap.xml | grep '^\s*<loc>' > example.com.UrlList.txt
result looks like this:
Code:
<loc>http://example.com</loc>
<loc>http://example.com/forums/43.aspx</loc>
<loc>http://example.com/blogs/300.aspx</loc>
3. Now I need to get rid of the white space at the beginning of the line and keep just the URL between the opening and closing tags, and output that as a new file.
Not sure of the next step...
reading about awk, sed, etc. and just confused...
My final result should be a file with lines like this:
Code:
http://example.com
http://example.com/forums/43.aspx
http://example.com/blogs/300.aspx