Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I've been doing some bothersome text treatment, and I suddenly wondered if there was a way to automatize this. I honestly don't know if that is possible, but... who knows !
I would really appreciate if someone can tell me if there is a way to do that
My text treament, manually made, consists in transforming a list of
A replacement, if generic terms were accepted, of <a href= * img with just img, followed by a deletion of all </a> fields. I can only do the last part with my knoweledge
Do you think there would be a fast way to do that automatically or partially automatically ? Who knows, maybe someone will tell me that is possible
sed -e s/'<a href="(url)">'//g -e s/'<\/a>'//g FILE
Where FILE is the name of the file that contains the original text.
Or you could
echo LINE |sed -e s/'<a href="(url)">'//g -e s/'<\/a>'//g
Where LINE is the line that contains the original text.
Parsing it out:
sed = Execute sed command
-e = use following script
s/pattern/replacement/g = search for pattern and replace with replacement, g means to do it globally (rather than just at first occurence). You can see the pattern in what I wrote above. The replacement is blank so it simply deletes the pattern and replaces it with nothing.
-e = use following script (a second one)
s/pattern/replacement/g = search and replace globally - this time for the second pattern. Note the "\/a" here. The "\" escapes the special meaing of "/" so it knows to litterally look for "/a" rather than thinking it is a directive to sed. (The "/" as you can see is what sed uses to separate the search, pattern, replace and global.)
P.S. French distro should be called "Le Nix"
Last edited by MensaWater; 04-12-2007 at 04:36 PM.
Thanks a lot, Jlightner
I didn't know that sed existed, what a great tool ! I'm grateful to you
I didn't manage to make your script work in a single line, certainly because the (url) was never the same, and I must have gotten the wrong hold of regular expressions. And yet I read the help.
But separating the script in two, then it worked.
And then I realized that didn't output it to a file, and if I had to run the script in two steps, I had to use a file-written version !
Finally, I paste it, in case it can help other people maybe, here is my results, how I made it work :
(original file is test.txt)
Those two lines are the code that outputs in the console window, useless since each script works on the original file and lets one part of the code unfixed.
sed -e :a -e 's/<a[^>]*>//g;/</N;//ba' test.txt
sed -e s/'<\/a>'//g test.txt
sed -e :a -e 's/<a[^>]*>//g;/</N;//ba' test.txt > test2.txt
sed -e s/'<\/a>'//g test2.txt > test3.txt
And here, text3.txt is the result that I want Thanks again
I would love making it work in a single line, but that I didn't manage to make it work pasting one after each other the two sed parts (as you had written on your side) dont' work for me >_<
I fear that would be asking too much, but would you have any idea why that is so ?
Oh, I also tried something, to manage bbcode, in which it is [ url ] [ /url ] instead, but it didn't work either, would you know what kind of difference it should have made ?
(thanks a lot if you reply to this, thanks anyway already ! )
I'm not a sed expert by any means so I'm not sure what you're trying to do by adding the ":a" - I see it deals with labels but I'm a little too lazy to delve into it at the moment.
A pipe is special kind of two-way redirection - the "stdout" (standard output) of whatever is on the left side of the pipe become "stdin" (standard input) to whatever is on the right side of the pipe. So where sed would normally expect a file as stdin it will instead use the output from the first command.
I'm not exactly sure what you're saying in your last question. Are you saying you couldn't get sed to eliminate those things?
Hoo, I never thought of using the | like that !
I used it for instance for ps -A | grep ...without seeing the same principle could lead to further uses.
Thanks a lot once again
About my last question, more simply, I'd want to eliminate external hyperlinks in html but also in bbcode, the <a href=""> ... </a> and [ url] ... [ /url]
I have also a few lists labelled in bbcode, and for those the part about removing the bbcode hyperlinks, I can't write the sed line properly, indeed. I guess there must be a rule (like the backslash that must be appended before a slash) that I have missed.
The backslash, \, is to "escape" the special meaning of the character that follows it. You have to put quotes around the expression so that sed doesn't change its meaning.
Any time you see non-alphanumeric characters there's a possibility you need to escape or quote it (or both). So the [ would likely need to become \[ as the ] would likely need to become \].
Don't forget to quote your expression. 'expresion'
Well, I think I tried that and this didn't work well, maybe I have forgotten one special character on the way in the attempts.
But my main need was to mass-manage html files, so the current sed script is just perfect, thank you, Jlightner !
And for phpbb, I have found a solution, there are some forums in which when you post a text there is a "break links" button, that lets the image links but deletes the hyperlinks, so that will do the job for my few phpbb lists.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.