LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-14-2005, 12:32 PM   #1
indian
Member
 
Registered: Aug 2004
Posts: 137

Rep: Reputation: 15
Regular Expressions in Python


Hi,

I am looking to split a complete URL like www.google.com/index.html into main URL www.google.com and remaining url /index.html.

How can I do this in python ?

Thanks
 
Old 09-14-2005, 01:07 PM   #2
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Rep: Reputation: 111Reputation: 111
Code:
import urlparse
url = 'http://www.google.com/index.html'
spliturl = urlparse.urlparse(url)
print spliturl
 
Old 09-14-2005, 01:27 PM   #3
shanenin
Member
 
Registered: Aug 2003
Location: Rochester, MN, U.S.A
Distribution: Gentoo
Posts: 987

Rep: Reputation: 30
I was just playing with it a little, it seems to split some urls strangly
Code:
>>> urlparse.urlparse('http://www.linuxquestions.org/questions/showthread.php?s=&threadid=363336')
('http', 'www.linuxquestions.org', '/questions/showthread.php', '', 's=&threadid=363336', '')
 
Old 09-14-2005, 01:45 PM   #4
indian
Member
 
Registered: Aug 2004
Posts: 137

Original Poster
Rep: Reputation: 15
How is this urlparse works ? I mean if I put www.google.com/index.html than it gives some blank values.
 
Old 09-14-2005, 02:03 PM   #5
shanenin
Member
 
Registered: Aug 2003
Location: Rochester, MN, U.S.A
Distribution: Gentoo
Posts: 987

Rep: Reputation: 30
This function seems a little cleaner, it just sptilts the url into two parts as you need
Code:
def parse_url(url):

    extentions = ('.com', '.net', '.uk', '.biz', '.gov', '.org')
    for i in extentions:
        if url.find(i) != -1:
            new_url = url.replace(i, i+"!@#$%") # this adds a unique delimnater
            split_url = new_url.split("!@#$%")   # this line splits it at the newly ctreated delimiter
            return split_url
I am sure there are some flaws in this method I missed :-)

Last edited by shanenin; 09-14-2005 at 03:40 PM.
 
Old 09-14-2005, 02:10 PM   #6
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Rep: Reputation: 111Reputation: 111
Quote:
Originally posted by indian
How is this urlparse works ? I mean if I put www.google.com/index.html than it gives some blank values.
Yes, that's because it expects somthing like "http://", "ldap://", "ftp://" at the start of the string.
 
Old 09-14-2005, 10:40 PM   #7
indian
Member
 
Registered: Aug 2004
Posts: 137

Original Poster
Rep: Reputation: 15
Thanks shanein, it is working

anyway another thing which I am not able to do is to get the file name. Like if given a URL www.google.com/docs/index.html so I want to break it in www.google.com/docs/ and index.html.

I am not able to think, how to use delimiters to get the file name
 
Old 09-14-2005, 11:00 PM   #8
shanenin
Member
 
Registered: Aug 2003
Location: Rochester, MN, U.S.A
Distribution: Gentoo
Posts: 987

Rep: Reputation: 30
I am not sure I am fully following you, but you could use the split method again like this, but choose '/' as the dilimeter
Code:
>>> "http://www.google.com/docs/index.html".split('/')
['http:', '', 'www.google.com', 'docs', 'index.html']
Code:
url = "http://www.google.com/docs/index.html"
split_url = url.split('/')
file = split_url[-1]  #the element -1 is you last one in the list
print file
or as a function
Code:
def url_file(url):
    split_url = url.split('/')
    return split_url[-1]

Last edited by shanenin; 09-15-2005 at 12:01 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expressions markjuggles Programming 2 05-05-2005 11:39 AM
Regular Expressions overbored Linux - Software 3 06-24-2004 02:34 PM
help with REGULAR EXPRESSIONS ner Linux - General 23 10-31-2003 11:09 PM
Regular expressions aromes Linux - General 1 10-15-2003 12:29 PM
regular expressions? alaios Linux - General 2 06-11-2003 03:51 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:07 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration