Sed/awk help with regular expressions needed
Hi guys,
I was given a rather large file (about 35,000 lines) and asked to create an .SQL file so I could import it into a Postgres database. Now I've already managed to do it, but would like some input as to make it easier for the next time I have to do it. My problem is that it contains large amounts of text that contains markup. For example, a typical small row would look something like this: Code:
Some text goes here, then <a href="http://www.something.com">here</a> is a link. Here is some <b>more</b> text. Code:
Some text goes here, then here is a link. Here is some more text. I would like to automate this however. What I would like to do is something like this: awk < infile.txt > outfile.txt Obviusly this would take the input file, strip out HTML tags then output to outfile.txt. I've tried a few things, but I can't get my head around regular expressions via command line. Any pointers as how to do this? |
Code:
awk '{gsub(/<[^>]*>/,"")}1' infile.txt > oufile.txt |
Awesome...I will give it a go tomorrow.
|
If you have lynx:
Code:
lynx>outfile.txt --force-html --dump -nolist infile.txt Code:
html2text>outfile.txt infile.txt |
All times are GMT -5. The time now is 08:27 AM. |