LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   How do I download a file with cutoff (let's say 5MB)? (https://www.linuxquestions.org/questions/linux-general-1/how-do-i-download-a-file-with-cutoff-lets-say-5mb-522088/)

archon810 01-23-2007 12:50 PM

How do I download a file with cutoff (let's say 5MB)?
 
I'm trying to cut down on unnecessary downloading in some scripts I have, and in most cases it will actually suffice to download a chunk of files, for ex the first 5MB. I've looked thru all the wget options and the only remotely useful option I found was -Q quota (--quota=quota):

Specify download quota for automatic retrievals. The value can be specified in bytes (default), kilobytes (with k suffix), or megabytes (with m suffix).
Note that quota will never affect downloading a single file. So if you specify wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz, all of the ls-lR.gz will
be downloaded. The same goes even when several URLs are specified on the command-line. However, quota is respected when retrieving either recursively,
or from an input file. Thus you may safely type wget -Q2m -i sites---download will be aborted when the quota is exceeded.


Unfortunately, that option doesn't do what I thought it would (it doesn't work on single files).

Googling the problem didn't seem to produce any helpful results. Can anyone help?

unSpawn 01-23-2007 06:08 PM

I don't understand why you want to deliberately cut off a D/L at 5 megs, and since wget doesn't work that way there must be a reasonable explanation for it. Anyway. You could encapsulate wget in a script which tails the wget log or stats the D/L file until it reaches the threshold and then kills the wget session.

archon810 01-23-2007 11:26 PM

Quote:

Originally Posted by unSpawn
I don't understand why you want to deliberately cut off a D/L at 5 megs, and since wget doesn't work that way there must be a reasonable explanation for it. Anyway. You could encapsulate wget in a script which tails the wget log or stats the D/L file until it reaches the threshold and then kills the wget session.

well, the purpose of the cutoff has to do with the amount of bandwidth our company ends up consuming compared to what we could consume if we only downloaded partials. Without going into specifics, I need to process/convert/do stuff to the 1st 5-15 seconds of video clips, so the 1st 5MB pretty much covers it. Instead of downloading 100MB files, I will be downloading 5MB.

After looking at curl and not finding an option either, I ended up implementing a wget process spawned in a child and the parent process checking for file size every second. Once the process finishes, it kills the child and wget with it, which is exactly what you proposed. This, however, is a dirty hack and I don't like it. An option in wget would be so great (or another utility). I mean, they already have the --quota option, why not extend it for single files as well? :-/

unSpawn 01-24-2007 05:11 AM

Without going into specifics
Cats curiosity 'n such? :-]


is a dirty hack and I don't like it. An option in wget would be so great (or another utility). I mean, they already have the --quota option, why not extend it for single files as well?
Like I said before wget not supporting that must have a reasonable explanation, and I suppose it is about how the protocols deal with serving single files, you know, like how you can ask a server to -c a download at position n but not tell it to stop at position n. Second thing is you're confusing your very specific needs with needs the general public would accept as useful, only thing comes to mind is say when you're testing D/L's. If you're adamant it's a good option, why not ask the wget maintainers to implement it? I mean there's nothing stopping you from asking them...

archon810 01-24-2007 11:21 PM

Quote:

Originally Posted by unSpawn
Without going into specifics
Cats curiosity 'n such? :-]


is a dirty hack and I don't like it. An option in wget would be so great (or another utility). I mean, they already have the --quota option, why not extend it for single files as well?
Like I said before wget not supporting that must have a reasonable explanation, and I suppose it is about how the protocols deal with serving single files, you know, like how you can ask a server to -c a download at position n but not tell it to stop at position n. Second thing is you're confusing your very specific needs with needs the general public would accept as useful, only thing comes to mind is say when you're testing D/L's. If you're adamant it's a good option, why not ask the wget maintainers to implement it? I mean there's nothing stopping you from asking them...

hehe, I will try to do that. And I think wget would implement this internally instead of using server commands. It would monitor the buffer or just already downloaded stats and then just issue a stop and maybe trim/clean up. Let's see what happens.

archon810 02-08-2007 04:35 PM

Heh, I knew there was an easier way!

Courtesy of the guy on wget's mailing list.

curl
-r/--range <range>
(HTTP/FTP) Retrieve a byte range (i.e a partial document) from a
HTTP/1.1 or FTP server. Ranges can be specified in a number of
ways.

0-499 specifies the first 500 bytes

500-999 specifies the second 500 bytes

-500 specifies the last 500 bytes

9500 specifies the bytes from offset 9500 and forward

0-0,-1 specifies the first and last byte only(*)(H)

500-700,600-799
specifies 300 bytes from offset 500(H)

100-199,500-599
specifies two separate 100 bytes ranges(*)(H)

unSpawn 02-08-2007 05:10 PM

Neat. Thanks for sharing.

_K. 04-23-2007 10:04 AM

I had the same problem, thanks for the curl tip.

According to the man page, wget should have worked, using the quota option -Q and an -i input file of URL's including a short dummy URL and then the desired URL; however, -Q is not cutting off the download as described in the man page.

I am using wget to capture an audio stream in a cron job.

archon810 04-23-2007 11:36 PM

Quote:

Originally Posted by _K.
I had the same problem, thanks for the curl tip.

According to the man page, wget should have worked, using the quota option -Q and an -i input file of URL's including a short dummy URL and then the desired URL; however, -Q is not cutting off the download as described in the man page.

I am using wget to capture an audio stream in a cron job.

glad somebody else found it helpful.


All times are GMT -5. The time now is 12:08 AM.