LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Bookmarks: How to extract labels and addresses? (https://www.linuxquestions.org/questions/linux-newbie-8/bookmarks-how-to-extract-labels-and-addresses-156669/)

JZL240I-U 03-12-2004 07:00 AM

Bookmarks: How to extract labels and addresses?
 
Argh, I don't get it solved.

Here are the facts: During work I have to use WinNT with IE5.5 -- but I was able to collect a lot of useful bookmarks pertaining to linux.

They are structured with a handle / label plus an URL (e.g. LinuxQuestions.org Forums - where Linux newbies come for help is connected to the URL http://linuxquestions.org/questions/index.php). I want those pairs in a file for easy access e.g. to post when somebody needs information I could provide.

I tried the help system, I tried editing, I tried IE itself -- no luck.

Is there a tool in linux to make a human readable table out of the htm(l) file µ$ exports from its IE? Producing a list like

label1 URL1
label2 URL2
label3 URL3
.. ..

anybody please help ... :o

Peacedog 03-12-2004 08:06 AM

ie has an export tool which will produce a *.htm file, with a minimal amount of editing you could use the exported file in this manner. hope that helps.
good luck.

JZL240I-U 03-12-2004 08:16 AM

"I tried the help system, I tried editing, I tried IE itself -- no luck"

So you see I did that already. I can expand the Hyperlink in Winword, that looks like so:

{Hyperlink "http://www.linuxquestions.org"}

but I cant get at the text in the ""'s since everything between the {} is treated as a unit. When I discard the Hyperlink only the label is kept ... grrr. :mad:

Any Linux way?

slakmagik 03-12-2004 08:34 AM

This is hideous. Ugly, dumb, wrong and I haven't even tried to pretty it up. But as long as the fields don't vary, it may at least get a reasonable result. If they do vary we're screwed. They do in mozilla and it requires $11,$2 and still doesn't work on some records.

sed s/[\<\>\=]/\ /g bookmarks.html | awk -F"\"" '{ print $9,$2'\n' }' | sed 's/\/A//g'

Turns this sort of thing

<DT><A HREF="http://www.allcommands.com/linux%20commands%20list.html" ADD_DATE="1078450543" LAST_VISIT="1078450660" LAST_CHARSET="ISO-8859-1" ID="rdf:#$YUF9K3">Linux Commands List</A>

into this

Linux Commands List http://www.allcommands.com/linux%20commands%20list.html

(That's from mozilla - I fired up the Celeron and Cygwin later to test IE.)

JZL240I-U 03-12-2004 08:43 AM

Thanks digiot, I'll try that as soon as I get home and let you know on monday. Have a nice weekend.

slakmagik 03-12-2004 08:58 AM

Thanks. You too. :)

XavierP 03-12-2004 09:35 AM

Why not just install Mozilla on the NT box, import all the IE bookmarks and then copy the xml file? That is, unless you don't have install rights on the NT box - in which case you may be bale to get away with installing Firefox and then importing the bookmarks.

slakmagik 03-12-2004 10:18 AM

I thought he wanted to parse them. If he just wants the bookmarks, he doesn't have to install anything - he can export from IE, copy the exported html file to a floppy and just import them directly into mozilla, AFAIK. Would be easier than messing around with sed and such. :) Though if he can install mozilla on his NT box, he should. ;)

JZL240I-U 03-15-2004 04:24 AM

Quote:

Originally posted by XavierP
... That is, unless you don't have install rights on the NT box - in which case you may be able to get away with installing Firefox and then importing the bookmarks.
I don't have the rights to install anything :mad: ... but I can use email or floppy to export my bookmarks.


Quote:

Originally posted by digiot
I thought he wanted to parse them. ...
That's exactly right :).


Quote:

Originally posted by digiot
... he can export from IE, copy the exported html file to a floppy and just import them directly into mozilla, AFAIK.
Yes I know, and I do on a regular basis, thanks.


Quote:

Originally posted by digiot
Would be easier than messing around with sed and such.
:D those parameters for sed look like a crazy printer driver or the keyboard / graphics values gone haywire :D, thanks for supplying them. Here is an example what they produced:

1039513802

GnuCash - Accounting Software for Linux
http://www.gnucash.org/
GnuCash - LinuxWiki.org - Linux Wiki und Freie Software
http://www.linuxwiki.de/GnuCash
Heise News-Ticker HBCI-Internetbanking für Linux
http://www.heise.de/newsticker/data/dz-16.09.01-000/
matrica-moneyplex
http://www.matrica.de/
Online Banking with Konqueror
http://home.in.tum.de/~strutyns/banking.php
OpenHBCI - LinuxWiki.org - Linux Wiki und Freie Software
http://www.linuxwiki.de/OpenHBCI
.... and so on.

So the headings / names of the subdirectories are rendered as Numbers ... which is of not much consequence if one knows what to look for.

A little more problematic is the fact, that some/half of the paragraph-marks are Microsoft's "mirrored P" (at the end of a header-URL pair), others are just little frames or boxes (between header and URL), which on this site don't make any difference but can't be edited with Microsoft's editors ... My guess is they are not a sequence of <CRLF> but only <LF> but I can't verify that right now. Can that be changed by sed? Somewhere in the formatting string of the print command (\n)?

Thanks in any event to both of you.


All times are GMT -5. The time now is 01:23 PM.