LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 05-20-2006, 12:21 AM   #1
northy_ie
LQ Newbie
 
Registered: Nov 2005
Location: Dublin, Ireland
Distribution: Fedora Core 5
Posts: 8

Rep: Reputation: 0
Stopping wget to follow 302 codes?


Hi there

I'm currently having some trouble getting wget to do as I like..

My system is Fedora Core 5 with wget 1.10.2-3.2.1.
Wget should try downloading a file from a server. The returncode tells me, if the file did exist or not.
The part I'm having problem with is, that the 404-redirection, which is being used on the server, messes things up for me.


This is the command I run ;
[root@client1 db]# wget -nc -R -A"*.db" http://shadow/db/db6.db
--05:05:22-- http://shadow/db/db6.db
=> `db6.db'
Resolving shadow... 192.168.0.5
Connecting to shadow|192.168.0.5|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://shadow/404.html [following]
File `404.html' already there; not retrieving.

As you can see, there is a 302-redir to a html-document and wget follows this even though I restricted it to .db files with the -A".db"-command and thereby giving me a positive return.
Is there any way I can get wget to stop following the 302-code?

Thanks for any hints on what I could to
 
Old 05-20-2006, 06:28 AM   #2
slzckboy
Member
 
Registered: May 2005
Location: uk - Reading
Distribution: slack 10.2 kde 3.4.2 kernel 2.6.15
Posts: 452

Rep: Reputation: 30
even if wget didn't follow the re-direction path,then the outcome is still the same,i.e you can't retrieve the document.

The server will not serve the document as you have specified it.

The server seems the be redirecting you to its 404(Contnent Not found)page ,thus saying that it can't find such a document on in its tree.

Thus I would say that the 404.html file was a negative return ,no??

Why the server didn't just send back HTTP 404 status to start with,I don't know ; seems a bit irregular;but the important fact is that it dosn't seem to give you a 200 OK to indicate success.

cheers

Robert.
 
Old 05-20-2006, 08:13 PM   #3
northy_ie
LQ Newbie
 
Registered: Nov 2005
Location: Dublin, Ireland
Distribution: Fedora Core 5
Posts: 8

Original Poster
Rep: Reputation: 0
hey there
you see, I don't want to retrieve it, if it's not there.

I made a script that tries wgetting files on a webserver and after each successful wget, it adds a +1 to a counting file, and tries to get the next files.
If a file does not exist, it should not add the +1 and quit.

The problem is, that wget -q returns a success code, thereby adding the +1 to the counting file and continuing to the next item. Therefore I'd like wget to just try and download what it's been told and return a negative if anything makes problems (object moved, 404, etc.)

If it helps, I can add the script to see, if there might be another workaround for this.
 
Old 05-21-2006, 04:44 AM   #4
slzckboy
Member
 
Registered: May 2005
Location: uk - Reading
Distribution: slack 10.2 kde 3.4.2 kernel 2.6.15
Posts: 452

Rep: Reputation: 30
My bash is not the strongest,but I'll have a go and I'm sure someone else would be able to offer a suggestion.

Couple of questions

if the page,object does exists,but is in a different place on the server you want to treat that as a failure?!!

Does your code handle a normal 404 Content not found ok.
If so how is it translating what wget returns??

Actually,i'm curious now.Pls post your script.
 
Old 05-30-2006, 03:02 PM   #5
northy_ie
LQ Newbie
 
Registered: Nov 2005
Location: Dublin, Ireland
Distribution: Fedora Core 5
Posts: 8

Original Poster
Rep: Reputation: 0
Sorry for the late reply, work just went crazy

It's a bit better now, so I have more time for trying things

this is the code:
Code:
#!/bin/bash
read NUMBER <./counter
cd db
if wget -q -nc -R -A "*.db" http://shadow/db/db$NUMBER.db
then
  let "NUMBER += 1" #increment
echo "$NUMBER" > ../counter
cd ..
sleep 5
echo sleeping..
./oots.sh
fi
#EOF
I might be completely wrong with this, so I'll try to explain what I want to do:

There is a number in the file "counter", currently it's 5.
The script (which is going to be sheduled in cron) is going to put that number into the variable $NUMBER.
It then should try dowloading the DB with that number from the URL given in the wget statement.
If the download is successfull, it should add a +1 to the file "counter" and restart the process.
If the download is not successfull (file does not exist), it should just quit.
 
Old 05-30-2006, 03:05 PM   #6
northy_ie
LQ Newbie
 
Registered: Nov 2005
Location: Dublin, Ireland
Distribution: Fedora Core 5
Posts: 8

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by slzckboy
if the page,object does exists,but is in a different place on the server you want to treat that as a failure?!!
Exactly!

Quote:
Originally Posted by slzckboy
Does your code handle a normal 404 Content not found ok.
If so how is it translating what wget returns??
No I'm not translating the wget returns, I just hoped it would work like that, because my bash skills are pretty limited
 
Old 05-30-2006, 06:36 PM   #7
slzckboy
Member
 
Registered: May 2005
Location: uk - Reading
Distribution: slack 10.2 kde 3.4.2 kernel 2.6.15
Posts: 452

Rep: Reputation: 30
Looking at the man the Recursive accept/reject: relates to the FTP options of wget.

I don't believe you can disable re-direction from the command line.
Redirection is a standard HTTP feature so I guess wget is just concerned about retreiving your requested entity from the server,no matter where the server is serving it from.


One way round this would be to run the wget command with -S -o log..
i.e

wget linuxquestions.org -S -o log
Code:
from log file..

HTTP request sent, awaiting response...
 1 HTTP/1.0 301 Moved Permanently
 2 Date: Tue, 30 May 2006 22:20:15 GMT
 ...........
now if I form the url proper www prefix i.e wget www.linuxquestions.org
you get

Code:
from log file..

HTTP request sent, awaiting response...
 1 HTTP/1.0 200 OK
 2 Date: Tue, 30 May 2006 22:15:39 GMT
......
Thus you can parse the subsequent log file with a script focusing on the line beginning with HTTP/1.0 to find out if the 1st instance of HTTP/1.0 number was a 20x or a 30x or whatever.Anything other than a 200 1st go would be a failure.
It is to my shame that after 2 years of using Linux my bash skills are nearly non existant.I can read basic bash and make sense of it but I will leave it to you or someone else to figure out how to parse the response.I imagine it would be trivial.
Hope this helps.
 
Old 05-31-2006, 07:55 PM   #8
slzckboy
Member
 
Registered: May 2005
Location: uk - Reading
Distribution: slack 10.2 kde 3.4.2 kernel 2.6.15
Posts: 452

Rep: Reputation: 30
the following code is not really robust.It assumes that a file called counter is present in the working directory and that the file is initialised to some sane integer value already.
Create counter and add the value 0 to it.
then do script linuxquestions.org at the cmd line ,where script is the name of the file you saved the script as.

linuxquestions.org will cause a re-direction because of the missing www prefix and thus the value in counter will not be incremented.
If you do www.linuxquestions.org,then the value will be incremented.
for your purpose change the script so that it checks for a .db file a .ie change



Code:
suffix=${website: -4}
...
if [ ${suffix} != ".org" ];then
....

 to 

suffix=${website: -3}
...
if [ ${suffix} != ".db" ];then
.....
here is the full code.
Code:
#!/bin/sh

argcount=1

if [ $# -ne $argcount ];then
        echo "Pls state the url for wget to connect to"
        exit
fi

website=$1
suffix=${website: -4}

if [ ${suffix} != ".org" ];then
        echo "${website} has incorrect suffix :${suffix}"
        exit
fi

echo "Connecting to website :${website}"

        /usr/bin/wget ${website} -S -o log

        if [ -e ./log ];then
                if [ -e ./counter ];then
                        number=`cat ./log | grep -m 1 HTTP/1.0 | cut -d ' ' -f 4`
                        echo "HTTP/1.0 response:$number"

                        if [ ${number} -eq 200 ];then
                                read count <./counter
                                echo ${count}
                                let count+=1
                                echo ${count} > ./counter

                        fi
                fi
        else
                echo "Problem either wget didn't build log file,or please ensure prerequesite\
                log and counter files are installed"
        fi
Only a clean 1st time HTTP 200 OK return code will cause couter to be incremented.

cheers.
 
Old 09-03-2007, 02:13 AM   #9
dovregubben
LQ Newbie
 
Registered: Sep 2007
Location: Tacoma, WA
Distribution: Debian
Posts: 1

Rep: Reputation: 0
Better late than never

I realize this thread is a bit old, but I found it while looking for a way to do exactly what the original poster was doing, sorting out http responses without actually fetching anything. It didn't take me long to rule out wget. That's just too much parsing for something really simple. I decided to use curl.

Either the curl man page is cryptic (surprise!) or I just don't know how to read, but I couldn't figure out how to suppress the header from being dumped to stdout, so I just came up with an easy way to parse just the http server response out:

Code:
hc=`curl -Isw '\t%{http_code}' $url | cut -sf 2`
And because I hate people giving me the answer without giving me an explanation, here's what it does:

hc=`...`--------to put the output into a variable
curl------------rtfm (I know, I hate it when people say this, but I think the project maintainers can explain it better than I can)
I---------------to retrieve the http header only
s---------------silent mode (suppresses progress bar)
w---------------upon successful termination, write out the following format string
\t--------------tab character (default delimiter for cut command)
%{http_code}----the only output from curl that we're interested in
$url------------the variable I'm storing my url in
|---------------pipe (duh)
cut-------------rtfm (see above)
s---------------skip any line that doesn't contain the delimiter (tab)
f 2-------------output the second field of each line

The result is that your variable will now contain the server response code.

I've often found that when parsing data from a *nix utility is my headache, finding a way to format the output -- or an alternative utility to accomplish the same -- is the easy (lazy) way out. In the event that the output contains tab characters, it's quite easy to change the delimiter for the cut command to something else using the -d option.

Hope this helps someone who has had the same plight.

Kurt

Last edited by dovregubben; 09-03-2007 at 02:22 AM. Reason: Forgot to mention something....
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Are the hex codes for colors in a jpg the same codes as used in html? abefroman Linux - Security 3 07-31-2005 04:21 PM
ICS and 302 redirects don't mix dataangel Linux - Networking 1 08-22-2004 03:36 PM
GNU wget return codes for shell script greenhornet Programming 3 05-09-2004 08:51 PM
Can't start X server - kernel 2.6.4.1-302 l2g Red Hat 4 04-07-2004 05:14 AM
Apache 2 Error 302 / JSP adcworks Linux - General 1 08-19-2002 11:31 AM


All times are GMT -5. The time now is 11:58 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration