LinuxQuestions.org - Sed/awk help with regular expressions needed

Hi guys,

I was given a rather large file (about 35,000 lines) and asked to create an .SQL file so I could import it into a Postgres database. Now I've already managed to do it, but would like some input as to make it easier for the next time I have to do it.

My problem is that it contains large amounts of text that contains markup. For example, a typical small row would look something like this:

Code:

Some text goes here, then <a href="http://www.something.com">here</a> is a link. Here is some <b>more</b> text.

I have to remove all markup, turn it into something like this:

Code:

Some text goes here, then here is a link. Here is some more text.

What I did was paste all this text into GEdit, then use a regular expression plugin to remove all links and markup. The rest is easy from here on.

I would like to automate this however. What I would like to do is something like this:

awk < infile.txt > outfile.txt

Obviusly this would take the input file, strip out HTML tags then output to outfile.txt. I've tried a few things, but I can't get my head around regular expressions via command line.

Any pointers as how to do this?