LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   What is wrong with this python code ? (https://www.linuxquestions.org/questions/programming-9/what-is-wrong-with-this-python-code-365555/)

indian 09-21-2005 07:44 AM

What is wrong with this python code ?
 
hi, I got some code from the ASPN's site on getting a html file wth images from web. But while compiling it is gving errors. Can anyone compile this code and tell me what is wrong here.

I have changed the links appropriately :) and have also created the directories.



Code:

#!/usr/bin/python

import urllib2, re
from os import path

# global re
imre = None
gifre = None

def download(url,fname):
    try:
        print "Downloading "+url+" ... ",
        furl = urllib2.urlopen(url)
        f = file(fname,'wb')
        f.write(furl.read())
        f.close()
        print "OK"
        return 1
    except:
        print "Failed"
        return 0

def gifsub(matchobj):
    return gifre.findall(matchobj.group(0))[0]

# Main procedure
def grab(wurl, outdir, wfile, wgif, lgif, cachedir = 'cache',
        tmpfile = 'tmp.htm'):
    global imre, gifre
    imre = re.compile(wgif)
    gifre = re.compile(lgif)
    # path to temporary file
    tmpf = path.join(cachedir,tmpfile)
    print "Retrieving page..."
    download(wurl, tmpf)
    f = file(tmpf,'r')
    s = f.read()
    f.close()
    all = imre.findall(s)
    res = []
    res2 = []
    # Fill up result list
    for i in all:
        if i not in res:
            res.append(i)
            res2.append(gifre.findall(i)[0])
    result = zip(res, res2)

    # Replace web links with local links
    ns = re.sub(wgif,
                gifsub, s)
    f = file(path.join(outdir,wfile),'wb')
    f.write(ns)
    f.close()

    # Download images
    for i in result:
        if not path.exists(path.join(outdir,i[1])):
            download(i[0], path.join(outdir,i[1]))

    print "Done."

if __name__ == '__main__':
    # Document URL
    wurl = 'http://www.somesiteaddress.net/page.html'
    # Path to the local directory to save the document
    outdir = '~/downloads/somesite'
    # Filename for saved page in the local directory
    wfile = 'index.html'
    # Patterns for images:
    # - process all gif images from <http://img.anothersiteaddress.net/images>
    #  i.e. <http://img.anothersiteaddress.net/images/image.gif>
    wgif = 'http://img\.anothersiteaddress\.net/images/[^+]*?\.gif'
    # - replace the original image URL with the simple filename
    #  i.e. <http://img.anothersiteaddress.net/images/image.gif>
    #  will be <image.gif>
    lgif = '[_a-zA-Z0-9]+\.gif'

    # Directory for storing temporary files
    cachedir = '~/downloads/temp'
    # Temporary filename
    tmpfile = 'temp.htm'

    # Call the main procedure
    grab(wurl, outdir, wfile, wgif, lgif, cachedir, tmpfile)


jonaskoelker 09-21-2005 10:50 AM

read ``Smart Questions''. Especially the part about being specific--why haven't you told us what error you're getting?

--Jonas

Hko 09-22-2005 04:27 AM

For me it did "compile" without errors. But at runtime, python did complain saying that ~/downloads/temp and ~/downloads/somesite didn't exist.

davholla 09-22-2005 04:31 AM

a) post the exact error message
b) check that these directories do exist.

indian 09-22-2005 01:33 PM

I am sorry for not putting the errors. :(

I am getting the following errors irrespective of what URL I am putting. I have created all the directories but still ...the same error !!!

Code:


Retrieving page...
Downloading http://www.google.com/index.html ...  Failed
Traceback (most recent call last):
  File "test.py", line 85, in ?
    grab(wurl, outdir, wfile, wgif, lgif, cachedir, tmpfile)
  File "test.py", line 36, in grab
    f = file(tmpf,'r')
IOError: [Errno 2] No such file or directory: '~/downloads/temp/temp.htm'


Hko 09-22-2005 02:20 PM

IMHO That's a rather clear error message. I'd say the file ~/downloads/temp/temp.htm does not exist on your system...


All times are GMT -5. The time now is 12:40 AM.