LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 04-12-2007, 03:53 PM   #1
Sabinou
Member
 
Registered: Jan 2006
Location: France
Distribution: Debian Wheezy, Webmin + Virtualmin (remote dedi)
Posts: 214

Rep: Reputation: 30
Text transformation [Solved]


Hello there

I've been doing some bothersome text treatment, and I suddenly wondered if there was a way to automatize this. I honestly don't know if that is possible, but... who knows !

I would really appreciate if someone can tell me if there is a way to do that

My text treament, manually made, consists in transforming a list of
Code:
<a href="(url)"><img src="(url)"></a> <a href="(url)"><img src="(url)"></a> <a href="(url)"><img src="(url)"></a><br> <img src="(url)"></a> <a href="(url)"><img src="(url)"></a> <a href="(url)"><img src="(url)"></a><br>...
into a version without hyperlinks :
Code:
<img src="(url)"> <img src="(url)"> <img src="(url)"><br><img src="(url)"> <img src="(url)"> <img src="(url)"><br>...
A replacement, if generic terms were accepted, of <a href= * img with just img, followed by a deletion of all </a> fields. I can only do the last part with my knoweledge

Do you think there would be a fast way to do that automatically or partially automatically ? Who knows, maybe someone will tell me that is possible

Last edited by Sabinou; 04-15-2007 at 12:02 PM.
 
Old 04-12-2007, 04:34 PM   #2
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
Sure - this can be done with sed:

sed -e s/'<a href="(url)">'//g -e s/'<\/a>'//g FILE

Where FILE is the name of the file that contains the original text.

Or you could
echo LINE |sed -e s/'<a href="(url)">'//g -e s/'<\/a>'//g

Where LINE is the line that contains the original text.

Parsing it out:
sed = Execute sed command

-e = use following script

s/pattern/replacement/g = search for pattern and replace with replacement, g means to do it globally (rather than just at first occurence). You can see the pattern in what I wrote above. The replacement is blank so it simply deletes the pattern and replaces it with nothing.

-e = use following script (a second one)
s/pattern/replacement/g = search and replace globally - this time for the second pattern. Note the "\/a" here. The "\" escapes the special meaing of "/" so it knows to litterally look for "/a" rather than thinking it is a directive to sed. (The "/" as you can see is what sed uses to separate the search, pattern, replace and global.)

P.S. French distro should be called "Le Nix"

Last edited by MensaWater; 04-12-2007 at 04:36 PM.
 
Old 04-14-2007, 10:57 AM   #3
Sabinou
Member
 
Registered: Jan 2006
Location: France
Distribution: Debian Wheezy, Webmin + Virtualmin (remote dedi)
Posts: 214

Original Poster
Rep: Reputation: 30
Thanks a lot, Jlightner
I didn't know that sed existed, what a great tool ! I'm grateful to you

I didn't manage to make your script work in a single line, certainly because the (url) was never the same, and I must have gotten the wrong hold of regular expressions. And yet I read the help.
But separating the script in two, then it worked.
And then I realized that didn't output it to a file, and if I had to run the script in two steps, I had to use a file-written version !

Finally, I paste it, in case it can help other people maybe, here is my results, how I made it work :
(original file is test.txt)
Those two lines are the code that outputs in the console window, useless since each script works on the original file and lets one part of the code unfixed.
sed -e :a -e 's/<a[^>]*>//g;/</N;//ba' test.txt
sed -e s/'<\/a>'//g test.txt

sed -e :a -e 's/<a[^>]*>//g;/</N;//ba' test.txt > test2.txt
sed -e s/'<\/a>'//g test2.txt > test3.txt

And here, text3.txt is the result that I want Thanks again

I would love making it work in a single line, but that I didn't manage to make it work pasting one after each other the two sed parts (as you had written on your side) dont' work for me >_<
I fear that would be asking too much, but would you have any idea why that is so ?

Oh, I also tried something, to manage bbcode, in which it is [ url ] [ /url ] instead, but it didn't work either, would you know what kind of difference it should have made ?

(thanks a lot if you reply to this, thanks anyway already ! )
 
Old 04-14-2007, 01:27 PM   #4
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
Any time you have output that you need to address a second time you should think of "piping" which is done with the "|" sign.

Your two lines:
Code:
sed -e :a -e 's/<a[^>]*>//g;/</N;//ba' test.txt > test2.txt
sed -e s/'<\/a>'//g test2.txt > test3.txt
Could become one line as follows:
Code:
sed -e :a -e 's/<a[^>]*>//g;/</N;//ba' test.txt |sed -e s/'<\/a>'//g > test3.txt
I'm not a sed expert by any means so I'm not sure what you're trying to do by adding the ":a" - I see it deals with labels but I'm a little too lazy to delve into it at the moment.

A pipe is special kind of two-way redirection - the "stdout" (standard output) of whatever is on the left side of the pipe become "stdin" (standard input) to whatever is on the right side of the pipe. So where sed would normally expect a file as stdin it will instead use the output from the first command.

I'm not exactly sure what you're saying in your last question. Are you saying you couldn't get sed to eliminate those things?
 
Old 04-15-2007, 02:01 AM   #5
Sabinou
Member
 
Registered: Jan 2006
Location: France
Distribution: Debian Wheezy, Webmin + Virtualmin (remote dedi)
Posts: 214

Original Poster
Rep: Reputation: 30
Hoo, I never thought of using the | like that !
I used it for instance for ps -A | grep ...without seeing the same principle could lead to further uses.
Thanks a lot once again

About my last question, more simply, I'd want to eliminate external hyperlinks in html but also in bbcode, the <a href=""> ... </a> and [ url] ... [ /url]

I have also a few lists labelled in bbcode, and for those the part about removing the bbcode hyperlinks, I can't write the sed line properly, indeed. I guess there must be a rule (like the backslash that must be appended before a slash) that I have missed.
 
Old 04-15-2007, 07:36 AM   #6
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
The backslash, \, is to "escape" the special meaning of the character that follows it. You have to put quotes around the expression so that sed doesn't change its meaning.

Any time you see non-alphanumeric characters there's a possibility you need to escape or quote it (or both). So the [ would likely need to become \[ as the ] would likely need to become \].

Don't forget to quote your expression. 'expresion'
 
Old 04-15-2007, 12:01 PM   #7
Sabinou
Member
 
Registered: Jan 2006
Location: France
Distribution: Debian Wheezy, Webmin + Virtualmin (remote dedi)
Posts: 214

Original Poster
Rep: Reputation: 30
Well, I think I tried that and this didn't work well, maybe I have forgotten one special character on the way in the attempts.

But my main need was to mass-manage html files, so the current sed script is just perfect, thank you, Jlightner !

And for phpbb, I have found a solution, there are some forums in which when you post a text there is a "break links" button, that lets the image links but deletes the hyperlinks, so that will do the job for my few phpbb lists.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Transformation Matrix Senatla Programming 1 03-26-2007 07:17 PM
LXer: Ubuntu begins its transformation LXer Syndicated Linux News 0 12-04-2006 02:21 AM
XSLT transformation of XML document using XMLNS Foomajick Programming 7 10-25-2006 05:27 AM
LXer: The On-Demand, Open Source Business Software Transformation LXer Syndicated Linux News 0 05-08-2006 11:54 AM
Windows to Linux transformation... salik Linux - Networking 3 11-27-2005 06:56 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 04:13 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration