With regular expressions, "*" means "the previous character repeated 0 or more times." A "." means "any character," so ".*" means "any character repeated 0 or more times": the same thing as "*" in the shell itself, but "<*>" means "0 or more '<' followed by ">"," which means that both ">" and "<<<<<<<<<<<<<<>" are valid matches but not "<br>". I recommend you switch "<*>" to "<[^<]+>", and you also need
sed instead of
grep to make the substitution.
Here is what "<[^<]+>" means:
- "<": match a "<"
- "[...]": pick a character from the list
- "[^...]": pick a character besides one in the list
- "[^<]": pick a character that isn't '<'
- "+": match the preceding 1 or more times
- "[^<]+": 1 or more characters that aren't '<'
- ">": match a ">"
- "<[^<]+>": "<" followed by 1 or more non-'<' followed by ">"
Here is how to use it:
Code:
sed -r "s/<[^<]+>//g" $file > tempFile
Here is what the
sed line means:
- "s/.../.../": match the first part and replace with the second
- "s/...//": delete matching portions
- "g": repeat the preceding multiple times on the same line
- "s/<[^<]+>//g": delete all tags
ta0kira
PS "/" as used with
sed above can be replaced with any other character if "/" is actually a part of your pattern. Example:
Code:
find ~ | sed "s@/home/`whoami`/@-> @"