LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 06-10-2009, 02:29 PM   #1
telecom_is_me
Member
 
Registered: Jun 2008
Location: Upstate NY
Distribution: Fedora on the desk / Gentoo in the Racks
Posts: 36

Rep: Reputation: 15
Is There a Way in Perl To Locating The X,Y Coordinates of Links on a Web Page


I'm trying to parse an html page and output it into a list of links with there x,y corrdinates. I'm already using getLinks from the DOM Object in PHP, as described here: http://www.phpro.org/examples/Get-Links-With-DOM.html ... it works wonders, I trim the list and only return the Text Description of the link.

From everything that I can find and everything I've tried, I can get the x,y coordinates using javascript on the client but this won't be running on the client so that's no good... and I really don't like javascript.

Does anyone know how I could go about this to grab the X,Y's coordinates of these links maybe in Perl? Any help would be appreciated, btw: Again I'm trying to keep this server side.

Thanks in advance for any help.
 
Old 06-10-2009, 03:44 PM   #2
Su-Shee
Member
 
Registered: Sep 2007
Location: Berlin
Distribution: Slackware
Posts: 509

Rep: Reputation: 41
This is only possible with JavaScript, because JavaScript "knows" the actual state of the rendered HTML/CSS and the actual window size and things like that you need to calculate XY coordinates within a webpage.

There is the possiblity to run a virtual browser from within Perl though, but AFAIK JavaScript is still necessary.

Forgot how the project was called - but it was made to measure user's click'n'drag behavior in a browser.

Ha! Here it is: http://seleniumhq.org/

And short reminder: On the server side there are no XY coordinates _yet_. They exist only after the browser rendered the page - and they all depend on window size, font size, typ of layout of the webpage - not to mention the possibility that user X filters the links...

If you just need the links - at least 10 perl modules will deliver them to you without every calculating one coordinate.

Check CPAN for HTML and parsers.

Last edited by Su-Shee; 06-10-2009 at 03:49 PM.
 
Old 06-10-2009, 03:51 PM   #3
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 451Reputation: 451Reputation: 451Reputation: 451Reputation: 451
Quote:
Originally Posted by telecom_is_me View Post
I'm trying to parse an html page and output it into a list of links with there x,y corrdinates. I'm already using getLinks from the DOM Object in PHP, as described here: http://www.phpro.org/examples/Get-Links-With-DOM.html ... it works wonders, I trim the list and only return the Text Description of the link.

From everything that I can find and everything I've tried, I can get the x,y coordinates using javascript on the client but this won't be running on the client so that's no good... and I really don't like javascript.

Does anyone know how I could go about this to grab the X,Y's coordinates of these links maybe in Perl? Any help would be appreciated, btw: Again I'm trying to keep this server side.

Thanks in advance for any help.
Just using

Perl parse DOM html

in yahoo.com yields http://search.cpan.org/~sprout/HTML-...ib/HTML/DOM.pm .

The module is capable of returning links - if I understand correctly. Is it what you need ?
 
Old 06-11-2009, 09:54 AM   #4
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,388
Blog Entries: 2

Rep: Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900
Only the browser, and its built-in javascript interpreter, knows how the HTML is rendered, and browsers will render the page differently based on many things. Perhaps you can tell us what you really want to do, or why you think you want to do this, and alternative solutions can be suggested.
--- rod.
 
Old 06-11-2009, 04:39 PM   #5
telecom_is_me
Member
 
Registered: Jun 2008
Location: Upstate NY
Distribution: Fedora on the desk / Gentoo in the Racks
Posts: 36

Original Poster
Rep: Reputation: 15
Actually I don't need the client browser to generate the page view, I'm using CutyCapt "http://cutycapt.sourceforge.net/" which emulates a safarie browser environment through the use of webkit. It requires that the server have X running on it so in essence it's viewing the page being requested. Additionally It enables me to set a fixed browser size such as 1024X768.

Perhaps I can get the javascript to do it's thing within the webkit environment...
 
Old 06-12-2009, 03:35 AM   #6
Su-Shee
Member
 
Registered: Sep 2007
Location: Berlin
Distribution: Slackware
Posts: 509

Rep: Reputation: 41
What exactly are you trying to accomplish?

There is no such thing as a fixed XY-coordinate of some link within a website.

This entirely depends on how the browser renders the page - and even if a page's got a fixed layout with fixed pixel widths and heights, I still can change the font size in my browser which will move any text - including links - to different coordinates.

Not to mention that many people just don't use 1024x768 as a resolution (me for example), don't open their browser in full screen mode (me for example) and don't use webkit-based browsers (all Firefox, Opera and IE users, for example.)

You would have to grab all this values from within a user's browser environment (which I would consider _highly_ intrusive into my privacy), it would require that a user has _actually_ JavaScript enabled, doesn't filter stuff or uses NoScript...

And I haven't even started yet with userContent.css manipulations...

If you just want to grab links, use _anything_ able handling DOM - there're at least 10 Perl modules to do that on CPAN, as Sergei suggested.

Even some Regexp would do.
 
Old 06-12-2009, 10:23 AM   #7
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,388
Blog Entries: 2

Rep: Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900
Okay, I think I can read between the lines. You want to use CutyCapt to grab web pages, save them to image files, and then turn the links into image-maps associated with the bitmap images. My recommendation is to dig into the source for CutyCap, and add your requirement as a feature. It sounds like something a lot of users could make use of, and that is precisely the definitive place to acquire the information. Since the tool can store web pages as SVG, and SVG is a XML formatted text file, you may be able to extract XY coordinate information from that format.
--- rod.

Last edited by theNbomr; 06-12-2009 at 10:25 AM.
 
  


Reply

Tags
html, perl


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
executing linux commands from web page and outputing it back to the web page ashes_sheldon Programming 8 05-09-2009 06:19 AM
how to open a web page using perl? hvivekw Programming 2 11-02-2008 07:24 PM
iceweasel won't load a perl web page rdskaroff Debian 1 02-17-2007 02:59 PM
Mozilla web page links... LinuxLala Linux - Software 3 12-30-2003 04:54 AM
Web Page Retrieval & Processing with PERL BBQ_Matt Programming 3 10-25-2003 09:20 AM


All times are GMT -5. The time now is 04:11 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration