LinuxQuestions.org - Regular Expressions in Python

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Regular Expressions in Python (https://www.linuxquestions.org/questions/programming-9/regular-expressions-in-python-363336/)

indian

09-14-2005 12:32 PM

Regular Expressions in Python

Hi,

I am looking to split a complete URL like www.google.com/index.html into main URL www.google.com and remaining url /index.html.

How can I do this in python ?

Thanks

Hko	09-14-2005 01:07 PM

Code:

import urlparse

url = 'http://www.google.com/index.html'

spliturl = urlparse.urlparse(url)

print spliturl

shanenin

09-14-2005 01:27 PM

I was just playing with it a little, it seems to split some urls strangly

Code:

>>> urlparse.urlparse('http://www.linuxquestions.org/questions/showthread.php?s=&threadid=363336')

('http', 'www.linuxquestions.org', '/questions/showthread.php', '', 's=&threadid=363336', '')

indian

09-14-2005 01:45 PM

How is this urlparse works ? I mean if I put www.google.com/index.html than it gives some blank values.

shanenin

09-14-2005 02:03 PM

This function seems a little cleaner, it just sptilts the url into two parts as you need

Code:

def parse_url(url):



    extentions = ('.com', '.net', '.uk', '.biz', '.gov', '.org')

    for i in extentions:

        if url.find(i) != -1:

            new_url = url.replace(i, i+"!@#$%") # this adds a unique delimnater

            split_url = new_url.split("!@#$%")  # this line splits it at the newly ctreated delimiter

            return split_url

I am sure there are some flaws in this method I missed :-)

Hko	09-14-2005 02:10 PM

Quote:

Originally posted by indian
How is this urlparse works ? I mean if I put www.google.com/index.html than it gives some blank values.

Yes, that's because it expects somthing like "http://", "ldap://", "ftp://" at the start of the string.

indian

09-14-2005 10:40 PM

Thanks shanein, it is working :)

anyway another thing which I am not able to do is to get the file name. Like if given a URL www.google.com/docs/index.html so I want to break it in www.google.com/docs/ and index.html.

I am not able to think, how to use delimiters to get the file name :)

shanenin

09-14-2005 11:00 PM

I am not sure I am fully following you, but you could use the split method again like this, but choose '/' as the dilimeter

Code:

>>> "http://www.google.com/docs/index.html".split('/')

['http:', '', 'www.google.com', 'docs', 'index.html']

Code:

url = "http://www.google.com/docs/index.html"

split_url = url.split('/')

file = split_url[-1]  #the element -1 is you last one in the list

print file

or as a function

Code:

def url_file(url):

    split_url = url.split('/')

    return split_url[-1]

All times are GMT -5. The time now is 07:28 PM.