LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-23-2013, 05:17 PM   #1
MackSix
LQ Newbie
 
Registered: Feb 2013
Posts: 6

Rep: Reputation: Disabled
Can sed do this?


I have been learning sed and have a situation where I have multiple html pages with a href on each page that points to another page on the site. What I would like to do is move the href from one part of the page up to another point on the page.

The sequence would be to read the href on each page, then locate the point on a line higher up to insert it and then delete it from the original location.

I understand how to do this on each page if I know the href, but it is different for each page.

Thanks!
 
Old 02-23-2013, 06:06 PM   #2
linosaurusroot
Member
 
Registered: Oct 2012
Distribution: OpenSuSE,RHEL,Fedora,OpenBSD
Posts: 982
Blog Entries: 2

Rep: Reputation: 244Reputation: 244Reputation: 244
What you might manage in sed would be better done in something else.
http://stackoverflow.com/questions/5...e-html-why-not
http://search.cpan.org/~cjm/HTML-Tre...b/HTML/Tree.pm

Get your page broken down into a tree and move elements about in it before displaying as HTML.
This example code breaks the page up and lists what's there.
Code:
#!/usr/bin/perl -w

$filename="index.html";

use HTML::TreeBuilder;
    my $tree = HTML::TreeBuilder->new();
    $tree->parse_file($filename);
        # Then do something with the tree, using HTML::Element

for (@{ $tree->extract_links()  }) {
      my($link, $element, $attr, $tag) = @$_;
      print
        "Hey, there's a $tag that links to ",
        $link, ", in its $attr attribute, at ",
        $element->address(), ".\n";
  }

# Finally:
$tree->delete;
 
1 members found this post helpful.
Old 02-23-2013, 06:37 PM   #3
rigor
Member
 
Registered: Sep 2011
Posts: 271

Rep: Reputation: Disabled
Hi MackSix!

Although what you're describing sounds rather unusual, you are describing a type of file editing procedure, and sed certainly can edit a file.

Using labels, and multiple "spaces", such as "hold" and "pattern" space, you can get somewhat "fancy" with sed. So it might well be possible. But since sed was intended as a "stream editor", I tend to avoid getting fancy with it, as getting fancy with it seems a bit antithetical to the concept of editing a stream of data, as a stream.

You might want to process the file twice with sed, first to locate the line you wish to move and delete the line, and then insert it on the second pass. If you are open to it, I might suggest awk might be more well suited to such a task.

Also, it depends on how you need to locate the line you wish to move. You've mentioned it's not the same every time. awk tends to have the capabilities of sed, plus programming language functionality. If you effectively need to write a program to make decisions to locate lines which are different, not completely recognizable by a simple pattern, awk might be a better choice.

Otherwise, if you prefer to use sed since you are already in the process of learning it, and you say you already know how to do this for each page, what are you asking us, exactly?

Is there no common pattern, or any sort of common elements to the href values on the different pages? Is there no common ID used for the corresponding anchor ( a element ) on each page?

I don't how familiar you are with HTML, but if you see something like this on each page:

Code:
<a  id="my-anchor-name"  href="http://my-site-name.com/page-name.html">
that is, if the href you wish to find on each page is associated with an a element that has the same value in its id attribute, you might search for it by using a pattern to find the id.

Is that clear? If that doesn't help, please let us know exactly what you are asking.

Last edited by rigor; 02-23-2013 at 06:39 PM.
 
Old 02-23-2013, 07:06 PM   #4
MackSix
LQ Newbie
 
Registered: Feb 2013
Posts: 6

Original Poster
Rep: Reputation: Disabled
The URL in the href does have a common identifying sequence of chars in it, so I can identify it by those chars. It would probably be easier for me to write a Java program to do what I want, but I might try and see if I can figure out how to do it with sed if you think it is possible. I don't want the solution, just wondering if it is capable before banging my head against it to no avail. Thanks.
 
Old 02-23-2013, 07:36 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,080

Rep: Reputation: 2610Reputation: 2610Reputation: 2610Reputation: 2610Reputation: 2610Reputation: 2610Reputation: 2610Reputation: 2610Reputation: 2610Reputation: 2610Reputation: 2610
I have seen some significantly unnatural acts performed merely to "prove" sed could do it.
You may well wind up with a severely dis-figured head for no good reason ....

Pick the tool best suited to your skills and needs.
 
Old 02-23-2013, 08:12 PM   #6
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 4,021

Rep: Reputation: 1769Reputation: 1769Reputation: 1769Reputation: 1769Reputation: 1769Reputation: 1769Reputation: 1769Reputation: 1769Reputation: 1769Reputation: 1769Reputation: 1769
Quote:
Originally Posted by MackSix View Post
just wondering if it is capable before banging my head against it to no avail. Thanks.
Yes it can do it, as sed has been demonstrated to be Turing complete. So has Ook! That does not mean that either one is a suitable language for any given task.
 
Old 02-24-2013, 08:10 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,837

Rep: Reputation: 1984Reputation: 1984Reputation: 1984Reputation: 1984Reputation: 1984Reputation: 1984Reputation: 1984Reputation: 1984Reputation: 1984Reputation: 1984Reputation: 1984
On the other hand, sed is quite awkward to use when doing multi-line work like this. As the "stream editor" it's designed to process the file in a single direction only, so moving things "up" in the file is particularly difficult.

(awk, BTW, is similarly limited to one-way processing, but at least it has a decent variable and flow-control system to make things easier.)

When doing multi-line work, you're likely to get more mileage in the long run using ed or ex, which is the scripting-capable set-up of vim (equivalent to "vim -es"). Since these are full text editors, the whole file gets loaded into a buffer first, allowing arbitrary access to line numbers.

How to use ed:
http://wiki.bash-hackers.org/howto/edit-ed
http://snap.nlc.dcccd.edu/learn/nlc/ed.html
(also read the info page)

Using ex
http://docstore.mik.ua/orelly/unix/vi/ch05_01.htm

But finally, as linosaurusroot pointed out, xml and html, being free-form and nested formats, are quite unsuited to parsing by line-and-regex-based tools like these. It's much better to use something with a dedicated parser, like perl or xmlstarlet instead, whenever possible.

Last edited by David the H.; 02-24-2013 at 08:17 PM. Reason: forgot the links
 
Old 02-24-2013, 11:58 PM   #8
MackSix
LQ Newbie
 
Registered: Feb 2013
Posts: 6

Original Poster
Rep: Reputation: Disabled
I probably won't try it with sed now. I lost interest in sed and playing with xfce 4.10 on Cygwin at the moment...
 
Old 02-25-2013, 05:23 PM   #9
rigor
Member
 
Registered: Sep 2011
Posts: 271

Rep: Reputation: Disabled
Quote:
Originally Posted by David the H. View Post
On the other hand, sed is quite awkward to use when doing multi-line work like this. As the "stream editor" it's designed to process the file in a single direction only, so moving things "up" in the file is particularly difficult.

(awk, BTW, is similarly limited to one-way processing, but at least it has a decent variable and flow-control system to make things easier.)

When doing multi-line work, you're likely to get more mileage in the long run using ed or ex, which is the scripting-capable set-up of vim (equivalent to "vim -es"). Since these are full text editors, the whole file gets loaded into a buffer first, allowing arbitrary access to line numbers.

How to use ed:
http://wiki.bash-hackers.org/howto/edit-ed
http://snap.nlc.dcccd.edu/learn/nlc/ed.html
(also read the info page)

Using ex
http://docstore.mik.ua/orelly/unix/vi/ch05_01.htm

But finally, as linosaurusroot pointed out, xml and html, being free-form and nested formats, are quite unsuited to parsing by line-and-regex-based tools like these. It's much better to use something with a dedicated parser, like perl or xmlstarlet instead, whenever possible.
Since this is a Linux forum, I would expect that we should usually be talking about Linux, not some other environment. So when we say awk, I would tend to think we are usually more specifically talking about gawk, rather than some older baseline "dialect" of awk, without the extensions that gawk provides. In fact, although there can be differences between various Distro's, on the Distro I'm using, when I run the awk command and just request the version, I'm informed that the awk command is actually gawk.

With gawk, although the default is to scan through a file, or files, in one direction, gawk is by no means limited to that. As long as there is sufficient memory and CPU speed available, the lines of a file can be quite easily loaded into an array, and processed as desired, as many times as needed, in any direction. I've also found it rather easy to extend that approach, using concepts like a "State Machine", to handle nested structures. The free-form aspects of XML or HTML can be fairly easily addressed with repetition factors for "white space" patterns.

People who have written perl modules to handle HTML and XML, have made them available for others to use. So if you have a significant task to accomplish, involving XML or HTML, if you know perl, and you are already familiar with those modules, it may be easier to make use of the existing modules, rather than write your own gawk code. But were you to examine the existing perl modules, you may discover that patterns are used in them, much as I've described here, WRT to gawk.

However, since the OP mentions the possibility of using Java, and one or more forms of the Java platform supply, XML and HTML processing capabilities, as well as pattern matching capabilities rather similar to perl's, Java could well be the best choice.

Finally, from the way the OP talks, I get the impression that even attempting to use sed, may be just an "intellectual exercise".
 
Old 02-25-2013, 05:54 PM   #10
MackSix
LQ Newbie
 
Registered: Feb 2013
Posts: 6

Original Poster
Rep: Reputation: Disabled
"Intellectual exercise" is most fitting.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Multipal line edited using sed, how to make sed specific coolpraz Programming 4 01-05-2013 01:14 PM
[Cygwin, sed] Using filenames as both files and search strings within sed lingh Linux - Newbie 5 10-20-2012 10:38 AM
[SOLVED] sed help to run sed command against multiple different file names bkone Programming 2 04-16-2012 12:27 PM
[SOLVED] sed 's/Tb05.5K5.100/Tb229/' alone but doesn't work in sed file w/ other expressions Radha.jg Programming 6 03-03-2011 07:59 AM
Insert character into a line with sed? & variables in sed? jago25_98 Programming 5 03-11-2004 06:12 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration