ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
hi, I got some code from the ASPN's site on getting a html file wth images from web. But while compiling it is gving errors. Can anyone compile this code and tell me what is wrong here.
I have changed the links appropriately and have also created the directories.
Code:
#!/usr/bin/python
import urllib2, re
from os import path
# global re
imre = None
gifre = None
def download(url,fname):
try:
print "Downloading "+url+" ... ",
furl = urllib2.urlopen(url)
f = file(fname,'wb')
f.write(furl.read())
f.close()
print "OK"
return 1
except:
print "Failed"
return 0
def gifsub(matchobj):
return gifre.findall(matchobj.group(0))[0]
# Main procedure
def grab(wurl, outdir, wfile, wgif, lgif, cachedir = 'cache',
tmpfile = 'tmp.htm'):
global imre, gifre
imre = re.compile(wgif)
gifre = re.compile(lgif)
# path to temporary file
tmpf = path.join(cachedir,tmpfile)
print "Retrieving page..."
download(wurl, tmpf)
f = file(tmpf,'r')
s = f.read()
f.close()
all = imre.findall(s)
res = []
res2 = []
# Fill up result list
for i in all:
if i not in res:
res.append(i)
res2.append(gifre.findall(i)[0])
result = zip(res, res2)
# Replace web links with local links
ns = re.sub(wgif,
gifsub, s)
f = file(path.join(outdir,wfile),'wb')
f.write(ns)
f.close()
# Download images
for i in result:
if not path.exists(path.join(outdir,i[1])):
download(i[0], path.join(outdir,i[1]))
print "Done."
if __name__ == '__main__':
# Document URL
wurl = 'http://www.somesiteaddress.net/page.html'
# Path to the local directory to save the document
outdir = '~/downloads/somesite'
# Filename for saved page in the local directory
wfile = 'index.html'
# Patterns for images:
# - process all gif images from <http://img.anothersiteaddress.net/images>
# i.e. <http://img.anothersiteaddress.net/images/image.gif>
wgif = 'http://img\.anothersiteaddress\.net/images/[^+]*?\.gif'
# - replace the original image URL with the simple filename
# i.e. <http://img.anothersiteaddress.net/images/image.gif>
# will be <image.gif>
lgif = '[_a-zA-Z0-9]+\.gif'
# Directory for storing temporary files
cachedir = '~/downloads/temp'
# Temporary filename
tmpfile = 'temp.htm'
# Call the main procedure
grab(wurl, outdir, wfile, wgif, lgif, cachedir, tmpfile)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.