LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   wget produces corrupt files?? (https://www.linuxquestions.org/questions/linux-software-2/wget-produces-corrupt-files-733768/)

hbar 06-17-2009 10:39 PM

wget produces corrupt files??
 
Can someone try this and see if this happens? Download one of the .txt.gz archive files on this page: https://www.redhat.com/archives/amd64-list/ , and you should be able to open it. Now download one using wget, and the file can't be uncompressed because it is not a gzip file! What is happening!?

billymayday 06-17-2009 11:20 PM

Looks like someone forgot to gzip them. It looks like a mail file in raw for to me. Try renaming to .txt and see what I mean when you open it in an editor.

mbostwick 06-17-2009 11:31 PM

I agree. I am able to open it with nano and see it plain text.

hbar 06-18-2009 09:23 AM

The smaller files seem to be plain text but I can reproduce the problem with the larger ones. Often the connection is interrupted and the download resumes automatically, and this results in a bad file. Is there any way around this? Downloading the file in Firefox works fine but I need to use wget (or something similarly non-interactive).

Code:

$ wget https://www.redhat.com/archives/amd64-list/2008-January.txt.gz
--2009-06-18 10:20:52--  https://www.redhat.com/archives/amd64-list/2008-January.txt.gz
Resolving www.redhat.com... 69.192.64.112
Connecting to www.redhat.com|69.192.64.112|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5848 (5.7K) [application/x-gzip]
Saving to: `2008-January.txt.gz'

99% [=====================================> ] 5,847      --.-K/s  in 0s     

2009-06-18 10:20:53 (272 MB/s) - Connection closed at byte 5847. Retrying.

--2009-06-18 10:20:54--  (try: 2)  https://www.redhat.com/archives/amd64-list/2008-January.txt.gz
Connecting to www.redhat.com|69.192.64.112|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 26780 (26K), 20933 (20K) remaining [application/x-gzip]
Saving to: `2008-January.txt.gz'

100%[++++++++==============================>] 26,780      --.-K/s  in 0.1s   

2009-06-18 10:20:55 (139 KB/s) - `2008-January.txt.gz' saved [26780/26780]


hbar 06-18-2009 04:37 PM

Ah, I've got it. The server was doing something funny when it detected wget. By forging the useragent and referer, it works fine. I wish it didn't have to come to this, but they asked for it....

billymayday 06-18-2009 04:42 PM

What did they ask for? People not to copy all of their archives?


All times are GMT -5. The time now is 05:51 AM.