LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   remove some text from a file (https://www.linuxquestions.org/questions/programming-9/remove-some-text-from-a-file-462598/)

hfawzy 07-10-2006 08:09 AM

remove some text from a file
 
I want to replace all instances of
Quote:

\textit{Some text here}
simply with
Quote:

Some text here
in a file.
In other words, I want to remove all the "\textit{" and the corresponding "}" in that file.
How can I do this ?
Thank you.

spirit receiver 07-10-2006 08:33 AM

If each instance is in a single line and there are no nested curly brackets, the following should work:
Code:

sed 's/\\textit{\([^}]*\)}/\1/g'
Otherwise, I'd suggest to use Perl or remove only the "\textit" part and leave the brackets in place, they shouldn't do any harm in TeX.

jschiwal 07-10-2006 08:36 AM

You could use sed.
sed 's/\\textit{\([^}]*\)}/\1/g' oldfile >newfile
This will remove the \textit from your file if it isn't split up between two or more lines.

>cat sample
This is a test. \textit{This is a sample line of Latex}. This is more on the line.
This is a second line.
This is the third line.
This \textit{italic text is
divided on two lines.}
This is a line without italic text.


sed -e 's/\\textit{\([^}]*\)}/\1/g' -e '/\\textit{\([^}]*\)$/{ N;' -e '/}/s/\\textit{\([^}]*\)}/\1/}' sample
This is a test. This is a sample line of Latex. This is more on the line.
This is a second line.
This is the third line.
This italic text is
divided on two lines.
This is a line without italic text.

The above example handles \textit{ .* } split up on two lines. For three lines you will need to use branching and use a sed program instead of '-e' on a oneliner:
sed -f sedprogram.sed sample >newsample

The above example will remove the \textit{ part even if there isn't a closing } so it isn't perfect. Also, for sed programs it is important to check if the pattern to be replaced works when it is on the last line.

hfawzy 07-10-2006 11:04 AM

Thanks for the replies.. The sed command worked great.
To get used to sed and know how to use it next time, I would like to understand the command you gave :
Quote:

sed 's/\\textit{\([^}]*\)}/\1/g' oldfile >newfile
As I'm not familiar with Regular expressions, I would like to understand this part : \([^}]*\
Anyone willing to explain ?
Thank you.

spirit receiver 07-10-2006 01:04 PM

I'll begin with the "center" of that regular expression. The brackets [] specify a set of characters that the expression is supposed to match. But ^ as the first character in [] negates its content, i.e. we want to match all characters except those that are contained in [].
Thus [^}] matches all characters except a closing curly brace. We don't just want a single non-bracket character, but arbitrarily many, that's why [^}] is followed by *.
So far, \\textit{[^}]*} will match \textit followed by some text in curly braces, and we need to store the content of the brackets for later reference, that's why it has to be enclosed in \( .. \). Then we can use \1 to insert the first matching pair \( .. \) .

hfawzy 07-10-2006 02:40 PM

Thank you for taking the time to explain, spirit receiver.
I really appreciate that.

jschiwal 07-10-2006 02:47 PM

FYI, you might want to read the sed manual. Also, there is a "man regex" page which might help a little on understanding regular expressions. But a google search would return something more readable.

Also, if you edit in vim, you can do the same thing with the command:
:s/\\textit{\(.*\)}/\1/

This will perform the replacement on the current line. For the entire document, :%s\\textit{\(.*\)}/\1/g
This won't work when you have your italic text split up in two or more lines.

sundialsvcs 07-10-2006 09:22 PM

Also check out 'awk'.


All times are GMT -5. The time now is 07:38 PM.