How do I download a file with cutoff (let's say 5MB)?
I'm trying to cut down on unnecessary downloading in some scripts I have, and in most cases it will actually suffice to download a chunk of files, for ex the first 5MB. I've looked thru all the wget options and the only remotely useful option I found was -Q quota (--quota=quota):
Specify download quota for automatic retrievals. The value can be specified in bytes (default), kilobytes (with k suffix), or megabytes (with m suffix). Note that quota will never affect downloading a single file. So if you specify wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz, all of the ls-lR.gz will be downloaded. The same goes even when several URLs are specified on the command-line. However, quota is respected when retrieving either recursively, or from an input file. Thus you may safely type wget -Q2m -i sites---download will be aborted when the quota is exceeded. Unfortunately, that option doesn't do what I thought it would (it doesn't work on single files). Googling the problem didn't seem to produce any helpful results. Can anyone help? |
I don't understand why you want to deliberately cut off a D/L at 5 megs, and since wget doesn't work that way there must be a reasonable explanation for it. Anyway. You could encapsulate wget in a script which tails the wget log or stats the D/L file until it reaches the threshold and then kills the wget session.
|
Quote:
After looking at curl and not finding an option either, I ended up implementing a wget process spawned in a child and the parent process checking for file size every second. Once the process finishes, it kills the child and wget with it, which is exactly what you proposed. This, however, is a dirty hack and I don't like it. An option in wget would be so great (or another utility). I mean, they already have the --quota option, why not extend it for single files as well? :-/ |
Without going into specifics
Cats curiosity 'n such? :-] is a dirty hack and I don't like it. An option in wget would be so great (or another utility). I mean, they already have the --quota option, why not extend it for single files as well? Like I said before wget not supporting that must have a reasonable explanation, and I suppose it is about how the protocols deal with serving single files, you know, like how you can ask a server to -c a download at position n but not tell it to stop at position n. Second thing is you're confusing your very specific needs with needs the general public would accept as useful, only thing comes to mind is say when you're testing D/L's. If you're adamant it's a good option, why not ask the wget maintainers to implement it? I mean there's nothing stopping you from asking them... |
Quote:
|
Heh, I knew there was an easier way!
Courtesy of the guy on wget's mailing list. curl -r/--range <range> (HTTP/FTP) Retrieve a byte range (i.e a partial document) from a HTTP/1.1 or FTP server. Ranges can be specified in a number of ways. 0-499 specifies the first 500 bytes 500-999 specifies the second 500 bytes -500 specifies the last 500 bytes 9500 specifies the bytes from offset 9500 and forward 0-0,-1 specifies the first and last byte only(*)(H) 500-700,600-799 specifies 300 bytes from offset 500(H) 100-199,500-599 specifies two separate 100 bytes ranges(*)(H) |
Neat. Thanks for sharing.
|
I had the same problem, thanks for the curl tip.
According to the man page, wget should have worked, using the quota option -Q and an -i input file of URL's including a short dummy URL and then the desired URL; however, -Q is not cutting off the download as described in the man page. I am using wget to capture an audio stream in a cron job. |
Quote:
|
All times are GMT -5. The time now is 12:08 AM. |