Quote:
Originally Posted by Velotrol
then find a way to know the filesize (should show on screen), get size and organize links.txt according to the size.
|
The size of a file is sometimes contained in the http-header. Such that you only need to fetch the header to find the filesize. And
curl has an option to download only the header - very fast...
Quote:
-I, --head
(HTTP/FTP/FILE) Fetch the HTTP-header only! HTTP-servers feature the command HEAD
which this uses to get nothing but the header of a document.
|
And the typical command to view a files header might be...
Code:
$ curl -s -S -v -I http://popcon.debian.org/by_inst.gz
> HEAD /by_inst.gz HTTP/1.1
> User-Agent: curl/7.26.0
> Host: popcon.debian.org
> Accept: */*
>
* additional stuff not fine transfer.c:1037: 0 0
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
< Date: Mon, 26 Nov 2012 22:23:29 GMT
< Server: Apache
< Last-Modified: Mon, 26 Nov 2012 17:12:01 GMT
< ETag: "29e671b-1f4dac-4cf69081880ea"
< Accept-Ranges: bytes
< Content-Length: 2051500
< Content-Type: application/x-gzip
where the filesize in bytes is shown in red.
Such that using curl, I can write a small script to take a list of urls, obtain their file-sizes, and then sort them from smallest to largest. But if the file-size for an url cannot be found, then it defaults here to 10M. Nothing polished, but it does help address your problem.
Code:
#!/bin/bash
# A list of test urls
cat >/tmp/list.in <<EOF
http://popcon.debian.org/by_inst.gz
http://www.nasa.gov/templateimages/redesign/modules/header/header_logo.gif
http://2.bp.blogspot.com/-VL13sAi0cNQ/ULL0my3VCpI/AAAAAAAAGeA/5sb0AMI5c78/s1600/brittany_a.jpg
http://feedproxy.google.com/~r/linuxquestions/latest/~3/lUORnB0GYgI/showthread.php
EOF
>/tmp/list.out
while read urlx ; do
numx="$(curl -s -S -v -I $urlx 2>&1 |
awk '/Content-Length/ { sub(/\r$/,"") ; print $3}')"
echo -e "${numx:-10000000}\t$urlx" >>/tmp/list.out
done < /tmp/list.in
sort -n /tmp/list.out | column -t
# OUTPUT
3710 http://www.nasa.gov/templateimages/redesign/modules/header/header_logo.gif
156494 http://2.bp.blogspot.com/-VL13sAi0cNQ/ULL0my3VCpI/AAAAAAAAGeA/5sb0AMI5c78/s1600/brittany_a.jpg
2051500 http://popcon.debian.org/by_inst.gz
10000000 http://feedproxy.google.com/~r/linuxquestions/latest/~3/lUORnB0GYgI/showthread.php
Happy with ur solution... then tick "yes" and mark as Solved!