LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-02-2015, 10:09 PM   #1
ocpaul20
LQ Newbie
 
Registered: Jan 2012
Posts: 24

Rep: Reputation: Disabled
Python urllib finding the real URI of the start page


[SOLVED] - there is no way to determine this if the remote server does not send the index file name (which I was looking for) back.
===========================

I have a URL and I want to find out the file name of the initial page it is loading. Eg: index.php, index.htm, index.html, default.htm, or any of the other possible start pages available.

Here is my code but it only gives the same URL as I started with. Is there any way to find out the default starting page name for this site please? ( By trying various options I have found out it is index.php
So I want it to return http://synapse.ararat.cz/files/contrib/index.php )

Code:
import sys, os, urllib2

starturl = "http://synapse.ararat.cz/files/contrib/"
req = urllib2.Request(starturl) # , datagen, headers)
res = urllib2.urlopen(req)
finalurl = res.geturl()
print finalurl # http://synapse.ararat.cz/files/contrib/
Thanks for any help.

Last edited by ocpaul20; 05-03-2015 at 09:57 PM. Reason: resolved - as far as it is able to be solved.
 
Old 05-03-2015, 10:48 AM   #2
SoftSprocket
Member
 
Registered: Nov 2014
Posts: 399

Rep: Reputation: Disabled
You could check the headers (res.info()) for a Location header but unless there is a redirect there probably won't be one.
 
Old 05-03-2015, 11:33 AM   #3
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,240

Rep: Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322
Well, tell me how you found that it's index.php, and I'll tell you how to do that in Python.

As far as I can tell, though, that information just isn't programmatically available for that URL in particular. It's not in any part of the HTTP response headers, which I've checked using both Chromium Developer Tools and with urllib itself.

Web servers do URL-rewriting server-side. They take "files/contrib/" and convert them internally to "index/contrib.php", then they just send the response back. All the client (Chromium, urllib2, etc) sees is that it's making a request to "index/contrib" and getting a response back.

Last edited by dugan; 05-03-2015 at 11:45 AM.
 
Old 05-03-2015, 09:54 PM   #4
ocpaul20
LQ Newbie
 
Registered: Jan 2012
Posts: 24

Original Poster
Rep: Reputation: Disabled
I found the index.php by trial and error - trying index.html, index.htm, index.php, default.html etc

If this information is not available in the headers or from the http server at the remote end, then maybe I have to determine if there is no file on the end of the url and then try a few likely candidates programatically. I know that many content management systems do not have a file name on the end due to them using page IDs instead which look bad as URLs. Anyway...

Thanks for the help and info guys.
 
  


Reply

Tags
python, url



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
urllib problem in python kahn Programming 4 01-14-2018 09:18 PM
The URI ‘ghelp:gnucash-help’ does not point to a valid page. Why? Jon_S Linux - Newbie 4 12-01-2014 05:22 PM
[SOLVED] Finding the path to the Device URI rubyyarn Linux - Hardware 10 01-21-2014 10:41 AM
LXer: How to make page numbering start on a certain page in OpenOffice.org or LibreOffice Writer LXer Syndicated Linux News 0 02-13-2012 04:30 AM
konqueror: how to send current page location and title in uri?? grease Linux - General 2 06-07-2003 10:40 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:27 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration