LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-12-2004, 08:00 AM   #1
JZL240I-U
Senior Member
 
Registered: Apr 2003
Location: Germany
Distribution: openSuSE 42.1_64+Tumbleweed-KDE, Mint 17.3
Posts: 3,957

Rep: Reputation: Disabled
Bookmarks: How to extract labels and addresses?


Argh, I don't get it solved.

Here are the facts: During work I have to use WinNT with IE5.5 -- but I was able to collect a lot of useful bookmarks pertaining to linux.

They are structured with a handle / label plus an URL (e.g. LinuxQuestions.org Forums - where Linux newbies come for help is connected to the URL http://linuxquestions.org/questions/index.php). I want those pairs in a file for easy access e.g. to post when somebody needs information I could provide.

I tried the help system, I tried editing, I tried IE itself -- no luck.

Is there a tool in linux to make a human readable table out of the htm(l) file µ$ exports from its IE? Producing a list like

label1 URL1
label2 URL2
label3 URL3
.. ..

anybody please help ...
 
Old 03-12-2004, 09:06 AM   #2
Peacedog
LQ Guru
 
Registered: Sep 2003
Location: Danville, VA
Distribution: Slackware, Windows, FreeBSD, OpenBSD, Mac OS X
Posts: 5,296

Rep: Reputation: 168Reputation: 168
ie has an export tool which will produce a *.htm file, with a minimal amount of editing you could use the exported file in this manner. hope that helps.
good luck.
 
Old 03-12-2004, 09:16 AM   #3
JZL240I-U
Senior Member
 
Registered: Apr 2003
Location: Germany
Distribution: openSuSE 42.1_64+Tumbleweed-KDE, Mint 17.3
Posts: 3,957

Original Poster
Rep: Reputation: Disabled
"I tried the help system, I tried editing, I tried IE itself -- no luck"

So you see I did that already. I can expand the Hyperlink in Winword, that looks like so:

{Hyperlink "http://www.linuxquestions.org"}

but I cant get at the text in the ""'s since everything between the {} is treated as a unit. When I discard the Hyperlink only the label is kept ... grrr.

Any Linux way?
 
Old 03-12-2004, 09:34 AM   #4
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
This is hideous. Ugly, dumb, wrong and I haven't even tried to pretty it up. But as long as the fields don't vary, it may at least get a reasonable result. If they do vary we're screwed. They do in mozilla and it requires $11,$2 and still doesn't work on some records.

sed s/[\<\>\=]/\ /g bookmarks.html | awk -F"\"" '{ print $9,$2'\n' }' | sed 's/\/A//g'

Turns this sort of thing

<DT><A HREF="http://www.allcommands.com/linux%20commands%20list.html" ADD_DATE="1078450543" LAST_VISIT="1078450660" LAST_CHARSET="ISO-8859-1" ID="rdf:#$YUF9K3">Linux Commands List</A>

into this

Linux Commands List http://www.allcommands.com/linux%20commands%20list.html

(That's from mozilla - I fired up the Celeron and Cygwin later to test IE.)

Last edited by slakmagik; 03-12-2004 at 09:36 AM.
 
Old 03-12-2004, 09:43 AM   #5
JZL240I-U
Senior Member
 
Registered: Apr 2003
Location: Germany
Distribution: openSuSE 42.1_64+Tumbleweed-KDE, Mint 17.3
Posts: 3,957

Original Poster
Rep: Reputation: Disabled
Thanks digiot, I'll try that as soon as I get home and let you know on monday. Have a nice weekend.
 
Old 03-12-2004, 09:58 AM   #6
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Thanks. You too.
 
Old 03-12-2004, 10:35 AM   #7
XavierP
Moderator
 
Registered: Nov 2002
Location: Kent, England
Distribution: Debian Testing
Posts: 19,192
Blog Entries: 4

Rep: Reputation: 469Reputation: 469Reputation: 469Reputation: 469Reputation: 469
Why not just install Mozilla on the NT box, import all the IE bookmarks and then copy the xml file? That is, unless you don't have install rights on the NT box - in which case you may be bale to get away with installing Firefox and then importing the bookmarks.
 
Old 03-12-2004, 11:18 AM   #8
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
I thought he wanted to parse them. If he just wants the bookmarks, he doesn't have to install anything - he can export from IE, copy the exported html file to a floppy and just import them directly into mozilla, AFAIK. Would be easier than messing around with sed and such. Though if he can install mozilla on his NT box, he should.
 
Old 03-15-2004, 05:24 AM   #9
JZL240I-U
Senior Member
 
Registered: Apr 2003
Location: Germany
Distribution: openSuSE 42.1_64+Tumbleweed-KDE, Mint 17.3
Posts: 3,957

Original Poster
Rep: Reputation: Disabled
Quote:
Originally posted by XavierP
... That is, unless you don't have install rights on the NT box - in which case you may be able to get away with installing Firefox and then importing the bookmarks.
I don't have the rights to install anything ... but I can use email or floppy to export my bookmarks.


Quote:
Originally posted by digiot
I thought he wanted to parse them. ...
That's exactly right .


Quote:
Originally posted by digiot
... he can export from IE, copy the exported html file to a floppy and just import them directly into mozilla, AFAIK.
Yes I know, and I do on a regular basis, thanks.


Quote:
Originally posted by digiot
Would be easier than messing around with sed and such.
those parameters for sed look like a crazy printer driver or the keyboard / graphics values gone haywire , thanks for supplying them. Here is an example what they produced:

1039513802

GnuCash - Accounting Software for Linux
http://www.gnucash.org/
GnuCash - LinuxWiki.org - Linux Wiki und Freie Software
http://www.linuxwiki.de/GnuCash
Heise News-Ticker HBCI-Internetbanking für Linux
http://www.heise.de/newsticker/data/dz-16.09.01-000/
matrica-moneyplex
http://www.matrica.de/
Online Banking with Konqueror
http://home.in.tum.de/~strutyns/banking.php
OpenHBCI - LinuxWiki.org - Linux Wiki und Freie Software
http://www.linuxwiki.de/OpenHBCI
.... and so on.

So the headings / names of the subdirectories are rendered as Numbers ... which is of not much consequence if one knows what to look for.

A little more problematic is the fact, that some/half of the paragraph-marks are Microsoft's "mirrored P" (at the end of a header-URL pair), others are just little frames or boxes (between header and URL), which on this site don't make any difference but can't be edited with Microsoft's editors ... My guess is they are not a sequence of <CRLF> but only <LF> but I can't verify that right now. Can that be changed by sed? Somewhere in the formatting string of the print command (\n)?

Thanks in any event to both of you.

Last edited by JZL240I-U; 03-15-2004 at 05:27 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
cd labels? firedance Linux - Software 4 04-14-2008 09:19 AM
Anyone have the Solris 9 Cd Labels? enine Solaris / OpenSolaris 2 02-17-2005 10:53 PM
partition names and labels slinky2004 Linux - Hardware 6 12-31-2004 08:58 AM
ext3 labels and devfsd acid_kewpie Linux - General 11 05-15-2003 03:07 PM
Record labels pissing me off... Stephanie General 7 01-30-2002 10:28 AM


All times are GMT -5. The time now is 08:20 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration