LinuxQuestions.org - wget - using --user-agent option still results in 403/forbidden error

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - wget - using --user-agent option still results in 403/forbidden error (https://www.linuxquestions.org/questions/linux-software-2/wget-using-user-agent-option-still-results-in-403-forbidden-error-4175446105/)

tensigh

01-17-2013 08:47 PM

wget - using --user-agent option still results in 403/forbidden error

I'm hoping if someone can tell me what I might be doing wrong.

A number of websites have directories I want to download or graphics I want to DL. I can almost always access them through a browser or by directly downloading the file using wget:

wget http://www.somesite.com/dir1/dir2/pic1.jpg.

But if I try to use -r or -m, I get the dreaded "403/Forbidden" error, despite being able to open the file in my browser.

I've tried many combinations of -U options; -U firefox -U Mozilla, -U "Mozilla, platform, blah blah" and they NEVER work.

Is there something else I can do? Most of the time when I Google this issue, the solutions stop with forging a user agent. That never seems to work for me.

What am I doing wrong?

mina86

01-18-2013 06:32 AM

What's the site in question?

Habitual

01-18-2013 09:06 AM

Can you browse to http://www.somesite.com/dir1/dir2/pic1.jpg traditionally?

tensigh

01-19-2013 12:41 AM

I can see the page/graphics through a browser.

This is the company I work for, and like I said in my original post, I can get the link through a browser:

http://www.barclayvouchers.co.jp/ima...mainvisual.jpg

Naturally, if I wget the graphic directly, it also works, but that defeats the purpose of using wget (addressing the filenname specifically).

But that isn't the question - this happens to me on a LOT of websites. Not only my own company's website, but MOST of the time I get a 403/Forbidden error.

So the original question remains: am I doing something wrong with wget? I've tried using -m and -U with all types of descriptions after -U and they never work; I always end up at 403/Forbidden.

Habitual

01-19-2013 08:53 AM

Quote:

I can almost always access them through a browser or by directly downloading the file using wget

wrt "almost always"...what do the logs say about these events specifically?

You should be able to find 4 events in the logs, 2 for the browser (1 each Success and Fail) and 2 for wget (1 each Success and Fail)

Again, all 4 details should be in the logs.

I'm assuming apache error 403 is from apache,so ...
What are the owner:group perms on /path/to/dir1/dir2/pic1.jpg?
What are, if any .htaccess files (or comparable httpd.conf inclusions)?
Have you tried from another host to use the dreaded -r or -m options?
wget version? terminal >

Code:

wget --version | head -1

lsb_release -drc

output please. Thanks.

Please let us know.

NyteOwl

01-19-2013 01:33 PM

It is likely the htaccess file is set up to prevent document/image "leeching" by direct download.

tensigh

01-19-2013 04:14 PM

.htaccess file

@NyteOwl:

I guess that's what I'm asking; will an .htaccess file block wget:

- even if the user agent is forged, and
- even if the files are accessible through a web browser?

Every time I get a 403 error, both of the above conditions are met.

tensigh

01-19-2013 04:25 PM

Quote: