LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   wget wildcard problem (https://www.linuxquestions.org/questions/programming-9/wget-wildcard-problem-4175436059/)

bradvan 11-07-2012 06:21 AM

wget wildcard problem
 
I have to change my script to download virus definition updates from ftp to an http connection. So, I am trying to use wget. There will be a different number in the file name each day. So, I am attempting to use wildcards. I keep getting back no match. I've opened a web browser and confirmed that the file (avvdat-6888.zip) exists in that directory. I'd appreciate it if someone could look at my syntax and make some suggestions.
Code:

wget --accept avvdat-????.zip http://server.somewhere/path/dat/
I've tried enclosing avvdat-????.zip in single quotes, but that doesn't work either.

Thanks!

Habitual 11-07-2012 06:28 AM

post the file name in it's entirety. or what is the usual format of "????" in "avvdat-????.zip"?
Does it change?

markush 11-07-2012 06:51 AM

Read the manpage for wget! Wildcards don't work with the http-protocol. The --accept option is listed in the manpage under "FTP-options".

It would be necessary that you find out the filenames before you execute the wget command, then it would be possible, for example to use
Code:

for file in filename1 filename2 filename3 ;
    do wget http://server/somewhere/path/data/$file ;
done ;

Markus

bradvan 11-07-2012 06:58 AM

Really? My man page lists "-A acclist --accept acclist" under "Recursive Accept/Reject Options." I have also tried adding --recursive to the command. The file names will be avvdat-6888.zip. The part changing everyday is the numeric part. Thus, I was using "????." There is an ini file I can download first and parse through to get the correct name for the day, but I was hoping to avoid that step. :)

markush 11-07-2012 07:05 AM

It should be relatively easy to parse the inifile first. I know for sure that wget can't use wildcards with http, for the manpage I must say that it is somewhat confusing.

Maybe you post an example of the inifile, then it's easier to help you.

Markus

bradvan 11-07-2012 07:21 AM

OK, crap. :( I was hoping to keep it simple. The ini file appropriate section has:
Code:

[AVV-ZIP]
DATVersion=6888
FileName-avvdat-6888.zip
FilePath=/current/VSCANDAT1000/DAT/0000/
FileSize=11033442
MD5=1d4sfbb345a342e4c3

I see repeated throughout the ini file the DATVersion line and is is identical. So, I think a wget on the avvdat.ini file followed by
Code:

VRS=$(grep 'DATVersion=' avvdat.ini | head -1 | cut -d= -f2)
will get me the number and then I just need to
Code:

wget http://server.somewhere/current/VSCANDAT1000/DAT/0000/avvdat-${VRS}.zip
I think that should work. :)

bradvan 11-07-2012 07:34 AM

Rats. It looks like something is happening with the variable substitution. When I issue the command, it types back:
Code:

--2012-11-07 08:23:30-- http://server.somewhere/current/VSCANDAT1000/DAT/0000/avvdat-6888%0D.zip
...
ERROR 404: Bad Request

It looks like I am getting an extra '%0D' in there somehow. I switched from cut to sed:
Code:

VRS=$(grep 'DATVersion=' avvdat.ini | head -1 | sed -e 's/^.*=\([0-9]\{4,\}\).*/\1/')
and that seems to have solved the problem. Thanks so much for your help guys! :)

markush 11-07-2012 07:35 AM

Quote:

Originally Posted by bradvan (Post 4824085)
Code:

[AVV-ZIP]
DATVersion=6888
FileName-avvdat-6888.zip
FilePath=/current/VSCANDAT1000/DAT/0000/
FileSize=11033442
MD5=1d4sfbb345a342e4c3


It's even simpler. I think the line should have been
Code:

FileName=avvdat-6888.zip
You can simply source the inifile and then use the variables in the wget command.
Code:

source ini-file
wget http://the.server/$FilePath/$FileName

Markus

bradvan 11-07-2012 09:05 AM

Thanks for the follow-up. Yes, the "-" was a typo. The DATVersion=6888 is consistent throughout the ini file, but FileName is not. Also, get lots of error messages with "source avvdat.ini." Thanks again for the suggestions and prodding in the correct direction. :)


All times are GMT -5. The time now is 12:13 PM.