Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I have several hundred .html files that have a mistake that I would like to fix. Can someone please help me with the code to do the change in all the files at once? I would hate to spend hours doing the change manually. Thank you.
So, as an example, how would you change the following in all the files in the same directory?
I guess the slashes are throwing sed off, because I get an error. How do I do this? Thanks again.
If you use forward-slashes as the delimiters in sed, and the slashes appear in the pattern as well, you need to escape them with a backslash, but you can use a large range of other delimiters (just choose one you don't need to use in the pattern) - e.g.:
sed -i 's/\(<h1>\)\(<a[^>]*>\)\([^<]*\)\(<\/a>\)\(<\/h1>\)/\1\3\5/' $file
sed -i 's!\(<h1>\)\(<a[^>]*>\)\([^<]*\)\(</a>\)\(</h1>\)!\1\3\5!' $file
Here, exclamation marks are used as the delimiters, so the backslashes in </a> and </h1> are not needed. The \(...\)'s define sub-patterns, that are then replayed by \1 (first sub-pattern), \3 (third sub-pattern), etc.
I know what you mean - I often refer to these patterns as an explosion in a punctuation factory. They are called regular expressions, and consist of literal characters and metacharacters. The pattern means:
! Delimiter beginning search pattern
\(<h1>\) Group 1: literal <h1>
\(<a[^>]*>\) Group 2: <a then 0 or more characters that are not >'s, then >
\([^<]*\) Group 3: Any number of characters that are not <'s
\(</a>\) Group 4: literal </a>
\(</h1>\) Group 5: literal </h1>
! End of search pattern, beginning of replace pattern
\1 Replay group 1
\3 Replay group 3
\5 Replay group 5
! End of replace pattern
The reason you got the error message is that you used single quotes - '...' - rather than backticks -`...` around the ls command in the first line - very easy to do. Backticks are on the key to the left of the 1 on a US/UK keyboard, but I would tend to use $(ls *.html) - the $(...) does the same thing as the backticks, but is a lot easier to read.
Edited to add: slight warning about the documentation on regular expressions - there are different forms. For instance, in the link above, it tells you parentheses -- ( ) -- enclose a group. However, in the sed version, escaped parentheses - \( \) - are used.
Last edited by Robhogg; 04-01-2009 at 01:53 PM.
Reason: compliance with English 1.0