re: writing html on linux and windows

ergo_sum · 11-07-2003, 12:35 PM

Hello All:

My question is this. I translated a historical novel a while back while on MS os and hand coded everything in html.
Well, alot of the links are coded for a Microsoft os. Now I'm on Linux. Is there a tool out there that will extract all hyperlinks?

This is a large tome here, and a tool that extracts html would be heaven sent.

Thanks,

ergo_sum

david_ross · 11-07-2003, 01:17 PM

HTML is a universal language and should be the same on both OS's. If you want to strip all the HTML then try running the pages through a lynx dump:
lynx -dump page.html

ergo_sum · 11-07-2003, 01:38 PM

Yes, the language is the same but the hyperlinks aren't. One os uses / to delineate a directory and the other uses \. That's why I'm looking for an application that extracts hyperlinks so I can go ahead and change them.

ergo_sum

david_ross · 11-07-2003, 02:13 PM

Just write a script or something to do it for you - something like this should work:

Code:

#!/usr/bin/perl

while(<STDIN>){
$line=$_;
$line =~ s/href="(.*)"/&swaphref($1)/eg;
$line =~ s/src="(.*)"/&swapsrc($1)/eg;
print $line;
}

sub swaphref($1){
$ret="href=\"$1\"";
$ret =~ s/\\/\//xg;
return $ret;
}

sub swapsrc($1){
$ret="href=\"$1\"";
$ret =~ s/\\/\//xg;
return $ret;
}

exit;

stickman · 11-07-2003, 02:27 PM

If you know that \ is only used in the directory trees and not anywhere else:
perl -pi -e 's^\\^/^g' *
perl -pi -e 's^\\^/^g' filename

Be sure to make a backup before you do a mass search and replace.

wh33t · 11-07-2003, 03:32 PM

Come on guys... Just simply write a tool that does it for you? Simply write a program that searches through a text file and replaces the /'s for you... I mean really? The guy said he hand coded it all in html... any one who hand codes an entire historic novel doesn't know how to compile there own parsing program. And even if does, lets not assume he does. Lets actually try to help him.

I'm confused why your links would not work for linux.. Are you using link locations such as "img\imagefile.jpg" or "c:\www\img\image.jpg" the 2nd is not needed and you can use the first type of linking. I beleive its reffered to as relative locations. Its relative (related) to the root of the website. Does this help at all?

As for finding a way to replace all of your tags... it depends what you have written, its possible to use a find and replace function inside most editors... such as Kwrite or something, but that also depends on what GUI your running on your machine. Wanna post some specifics to your problem?

ergo_sum · 11-07-2003, 06:18 PM

Well, thanks, but you all aren't thinking things through, I think.
It's not just / vs \, but it's the directory structure itself. This was written a while ago, and C:\ Enriquillo\Enriquillo etc\ doesn't translate well to /user/local/apache2.

And Wht33, thank you for allowing for not only my newbieness but discerning the possibility of my complete inability to write a tool for replacing one character w/ another in a body of text.
But I think I should do this the right way, and not depend on first links vs. relative links. The text in question is a static thing but shouldn't be considered to a static thing
So, thanks.

Now, what do I do?

ergo_sum

wh33t · 11-07-2003, 06:30 PM

Well you see then. your problem is is that you used absolute links. meaning that you pointed to the absolute location of the files you were linking to. This is not neccessary (sp?). next just link relatively to your documents. So what you need to do then is load up your favourite editor for text. Can be in windows or linux, and simply find the parts you need to replace. So if you need to replace <a href="C:\ Enriquillo\Enriquillo\file.html"> with <a href="\file.html"> then use a "find and replace" function in the text editor.

(Keep in mind you do need to use /user/local/apache2. infact i think if you tried it to replace your links with that. it would not work. You have to use relative linking.)

::Find and Replace Functions::
I know textpad (not note pad or word pad, its a seperate program) for windows does this. And so does Kwrite, which is a free text editing program that comes with the default installation of KDE which is a GUI for linux. Which could be what your running. Let me know if this helps at all.

ergo_sum · 11-07-2003, 06:40 PM

Cool!

Thanks, and yes, it certainly does help.

I'm a newbie but also completely weened from MS. So it'll probably be either OpenOffice or Star Office.

ergo_sum

david_ross · 11-08-2003, 10:34 AM

Personally I would still use a global search and replace to aid your efforts - and wh33t I wasn't suggesting he wrote something all by himself - if you look at the example I gave it should work quite well. If all your links are absolute now - ie the start "C:\Enriquillo\Enriquillo\" then you only need to run another find and replace first to delete "C:\Enriquillo\Enriquillo\".

This way you can have all links relative to their own location and not to the root of any webserver - this is useful if you want to make the pages available in downloadable archive for offline viewing.

ergo_sum - if you are unsure if my script above is unable to help then feel free to post or e-mail me one of your pages and I'll check and write up a simple instruction set for you. I certainly don't think I would want to edit a whole novel by hand

wh33t · 11-08-2003, 12:44 PM

alrite sorry mate, I just understand how frustrating it is to come into these forums and ask a simple question and definetly get more confused by the "answers" people give you. I think he should be well on his way now.