It's a simple problem.
I wrote a HTMLFilter class in Java a while ago.
If you have the JWM at home your can try it out.
If you don't , it easy to rewrite it in some other language like C ++ or Perl.
Code:
public class HTMLFilter
{
public String filter(StringBuffer input)
{
return new String(privateHelpMethod(new String(input)));
}
public String filter(String input)
{
return new String(privateHelpMethod(input));
}
private String privateHelpMethod(String input)
{
StringBuffer clean = new StringBuffer();
boolean add = true;
for(int i = 0 ; i < input.length() ; i++)
{
if(input.charAt(i) == '<')
add = false;
else if(input.charAt(i) == '>')
add = true;
else if(add == true)
{
clean.append(input.charAt(i));
}
}
return new String(clean);
}
}
If you have some HTML code like
<html><head>
<title>Uptime
www.thegate.nu</title>;
</head>
<body text="#FFFFFF" bgcolor="#000000">
<p align="center"><font size="4" face="System"> 18:06:38 up 28 days, 2:09, 0 u
sers, load average: 0.00, 0.01, 0.00
</font></p>
</body>
</html>
then after using the HTMLFilter class the output will look like
Uptime
www.thegate.nu 18:40:30 up 28 days, 2:43, 0 users, load average: 0.00, 0.01, 0.00