LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 09-21-2005, 07:44 AM   #1
indian
Member
 
Registered: Aug 2004
Posts: 137

Rep: Reputation: 15
What is wrong with this python code ?


hi, I got some code from the ASPN's site on getting a html file wth images from web. But while compiling it is gving errors. Can anyone compile this code and tell me what is wrong here.

I have changed the links appropriately and have also created the directories.



Code:
#!/usr/bin/python

import urllib2, re
from os import path

# global re
imre = None
gifre = None

def download(url,fname):
    try:
        print "Downloading "+url+" ... ",
        furl = urllib2.urlopen(url)
        f = file(fname,'wb')
        f.write(furl.read())
        f.close()
        print "OK"
        return 1
    except:
        print "Failed"
        return 0

def gifsub(matchobj):
    return gifre.findall(matchobj.group(0))[0]

# Main procedure
def grab(wurl, outdir, wfile, wgif, lgif, cachedir = 'cache',
         tmpfile = 'tmp.htm'):
    global imre, gifre
    imre = re.compile(wgif)
    gifre = re.compile(lgif)
    # path to temporary file
    tmpf = path.join(cachedir,tmpfile)
    print "Retrieving page..."
    download(wurl, tmpf)
    f = file(tmpf,'r')
    s = f.read()
    f.close()
    all = imre.findall(s)
    res = []
    res2 = []
    # Fill up result list
    for i in all:
        if i not in res:
            res.append(i)
            res2.append(gifre.findall(i)[0])
    result = zip(res, res2)

    # Replace web links with local links
    ns = re.sub(wgif,
                gifsub, s)
    f = file(path.join(outdir,wfile),'wb')
    f.write(ns)
    f.close()

    # Download images
    for i in result:
        if not path.exists(path.join(outdir,i[1])):
            download(i[0], path.join(outdir,i[1]))

    print "Done."

if __name__ == '__main__':
    # Document URL
    wurl = 'http://www.somesiteaddress.net/page.html'
    # Path to the local directory to save the document
    outdir = '~/downloads/somesite'
    # Filename for saved page in the local directory 
    wfile = 'index.html'
    # Patterns for images:
    # - process all gif images from <http://img.anothersiteaddress.net/images>
    #   i.e. <http://img.anothersiteaddress.net/images/image.gif>
    wgif = 'http://img\.anothersiteaddress\.net/images/[^+]*?\.gif'
    # - replace the original image URL with the simple filename
    #   i.e. <http://img.anothersiteaddress.net/images/image.gif>
    #   will be <image.gif>
    lgif = '[_a-zA-Z0-9]+\.gif'

    # Directory for storing temporary files
    cachedir = '~/downloads/temp'
    # Temporary filename
    tmpfile = 'temp.htm'

    # Call the main procedure
    grab(wurl, outdir, wfile, wgif, lgif, cachedir, tmpfile)
 
Old 09-21-2005, 10:50 AM   #2
jonaskoelker
Senior Member
 
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524

Rep: Reputation: 46
read ``Smart Questions''. Especially the part about being specific--why haven't you told us what error you're getting?

--Jonas
 
Old 09-22-2005, 04:27 AM   #3
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
For me it did "compile" without errors. But at runtime, python did complain saying that ~/downloads/temp and ~/downloads/somesite didn't exist.
 
Old 09-22-2005, 04:31 AM   #4
davholla
Member
 
Registered: Jun 2003
Location: London
Distribution: Mandriva 2008 Spring
Posts: 652

Rep: Reputation: 31
a) post the exact error message
b) check that these directories do exist.
 
Old 09-22-2005, 01:33 PM   #5
indian
Member
 
Registered: Aug 2004
Posts: 137

Original Poster
Rep: Reputation: 15
I am sorry for not putting the errors.

I am getting the following errors irrespective of what URL I am putting. I have created all the directories but still ...the same error !!!

Code:
Retrieving page...
Downloading http://www.google.com/index.html ...  Failed
Traceback (most recent call last):
  File "test.py", line 85, in ?
    grab(wurl, outdir, wfile, wgif, lgif, cachedir, tmpfile)
  File "test.py", line 36, in grab
    f = file(tmpf,'r')
IOError: [Errno 2] No such file or directory: '~/downloads/temp/temp.htm'
 
Old 09-22-2005, 02:20 PM   #6
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
IMHO That's a rather clear error message. I'd say the file ~/downloads/temp/temp.htm does not exist on your system...

Last edited by Hko; 09-22-2005 at 02:22 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting the web page in python :: What's wrong with the code ? indian Programming 1 09-12-2005 03:17 PM
converting vb6 code to python mrobertson Programming 2 06-02-2005 01:27 PM
easy python code JoeUser11 Programming 1 01-21-2005 01:40 AM
I need a python code that activate a button Linh Programming 0 08-20-2004 10:47 AM
Trouble Python Code Gerardoj Programming 0 11-30-2003 04:31 PM


All times are GMT -5. The time now is 08:32 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration