LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 01-15-2009, 01:13 PM   #1
jlarsen
Member
 
Registered: Jan 2005
Location: Dallas, TX
Distribution: Slackware 13.0
Posts: 76

Rep: Reputation: 15
Apache - stop files from being accessible from web browser


I have a web based application that allows a user to create reports and one option is to make a PDF file for downloading. So far the only way I found to make the file accessible for display was to put it in a folder beyond the DocumentRoot. In a folder like /var/www/htdocs/pdfdirectory/ (DocumentRoot = /var/www/htdocs)

Problem is that if someone knew the filename they could see the document by just pointing a browser to:
http://www.example.com/pdfdirectory/somefilename.pdf

I'm not by any means an expert in this area. What is a better way to do this?

Should the document be created in another directory somewhere and Apache configured differently so it can see it?

Also, is there a way to know when the PDF file has been successfully downloaded so it could be deleted immediately?

Thanks in advance for any advice.
 
Old 01-15-2009, 01:22 PM   #2
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,630

Rep: Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573
Quote:
Originally Posted by jlarsen View Post
I have a web based application that allows a user to create reports and one option is to make a PDF file for downloading. So far the only way I found to make the file accessible for display was to put it in a folder beyond the DocumentRoot. In a folder like /var/www/htdocs/pdfdirectory/ (DocumentRoot = /var/www/htdocs)

Problem is that if someone knew the filename they could see the document by just pointing a browser to:
http://www.example.com/pdfdirectory/somefilename.pdf

I'm not by any means an expert in this area. What is a better way to do this?

Should the document be created in another directory somewhere and Apache configured differently so it can see it?

Also, is there a way to know when the PDF file has been successfully downloaded so it could be deleted immediately?
Not sure what you're going for. You say you want them to display it, but don't want them pointing a browser to it....how are they going to display it, then??

Since you want to delete the file after creation and/or display, why not put a nugget of code out there, to compress the PDF, and email it to whatever address the user says? After the email is sent, the file can be deleted, and those are conditions that are easy to check for.
 
Old 01-15-2009, 01:51 PM   #3
jlarsen
Member
 
Registered: Jan 2005
Location: Dallas, TX
Distribution: Slackware 13.0
Posts: 76

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by TB0ne View Post
Not sure what you're going for. You say you want them to display it, but don't want them pointing a browser to it....how are they going to display it, then??
The user who generates the report through the system is authorized to see the data on the report, so no problem there. What I'm concerned about is a different person being able to point a browser to the PDF file which is still on the server and access information that they should not be seeing.

There must be a better solution than deleting the files at login/logoff or with a cron job.

Thanks for the email idea, but I would prefer not to go that way if possible.
 
Old 01-15-2009, 02:09 PM   #4
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,630

Rep: Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573
Quote:
Originally Posted by jlarsen View Post
The user who generates the report through the system is authorized to see the data on the report, so no problem there. What I'm concerned about is a different person being able to point a browser to the PDF file which is still on the server and access information that they should not be seeing.

There must be a better solution than deleting the files at login/logoff or with a cron job.

Thanks for the email idea, but I would prefer not to go that way if possible.
Ah, I see where you're going. Can you password-protect/encrypt the PDF file? Generate the user a unique password, and pop it to them when the PDF is generated. Then when they go to view the document, (even if they download it to their local drive), they'll have to know the password. If not, it won't matter who has the web link. Not perfect, since no matter what you present via the web can be saved/printed/etc., but secure enough.

As far as removing the file, how about a nugget of code after the report is generated, which sets an AT job up, keeping the report on disk for some length of time, then deleting it. If the user knows the report will only be available for an hour or two, they'll not generate it until they're ready to read it. After generating, they'll either read it online, or download it.
 
Old 01-19-2009, 11:23 AM   #5
jlarsen
Member
 
Registered: Jan 2005
Location: Dallas, TX
Distribution: Slackware 13.0
Posts: 76

Original Poster
Rep: Reputation: 15
Thanks, something similar to the at command will do the trick. Just wish at would accept a full command line as an argument instead of having to put the command in another file. It would just be easier to call from within the application without having to install a extra file on every server. Don't suppose you know a trick for that? I didn't see anything in the man page.
Ex. (this does not work with at - gives error):
some code to add 5 minutes to current time and format it correctly (say it ends up being 1554), then execute
at 1554 "rm /path/to/SOME_REPORT.pdf"

If not I guess the application could just create a file containing the command above and then run with something like
at 1554 delete_pdf_script
However that leaves a the file delete_pdf_script behind. After 500 people run different reports it could get messy.
 
Old 01-19-2009, 12:11 PM   #6
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,630

Rep: Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573
Quote:
Originally Posted by jlarsen View Post
Thanks, something similar to the at command will do the trick. Just wish at would accept a full command line as an argument instead of having to put the command in another file. It would just be easier to call from within the application without having to install a extra file on every server. Don't suppose you know a trick for that? I didn't see anything in the man page.
Ex. (this does not work with at - gives error):
some code to add 5 minutes to current time and format it correctly (say it ends up being 1554), then execute
at 1554 "rm /path/to/SOME_REPORT.pdf"

If not I guess the application could just create a file containing the command above and then run with something like
at 1554 delete_pdf_script
However that leaves a the file delete_pdf_script behind. After 500 people run different reports it could get messy.
Yeah, AT is kind of a pain. However, you can put the last line of the at script, to be deletion of the script itself, so it will 'self destruct' after running. That does work, and I've used that before in my environment.
 
Old 01-19-2009, 05:46 PM   #7
servat78
Member
 
Registered: Jan 2009
Posts: 100

Rep: Reputation: 17
You could create a CGI script that acts as the PDF-link, but instead of being a PDF file it will authenticate access (either by login, or by requesting a download password, or by checking the IP of the request maker, ...) - if access is granted, then the CGI sends a valid HTTP header identifying the subsequent data as a PDF file and starting to send the real PDF file as binary data in pieces.

The web browser on the other end of the connection will read the HTTP header, and if it has some PDF-reading plugin (Acrobat), then the file will open within the browser, otherwise a dialog will popup to ask for opening/downloading options.

This way your PDF can be anywhere on the filesystem, the CGI only needs permissions to access it, but being outside of webroot is not a restriction. The outside world doesn't have to know anything about the true location of the PDF on the site.

Debian

Last edited by servat78; 02-19-2009 at 11:09 AM.
 
Old 01-20-2009, 07:38 PM   #8
jlinkels
Senior Member
 
Registered: Oct 2003
Location: Bonaire
Distribution: Debian Lenny/Squeeze/Wheezy/Sid
Posts: 4,102

Rep: Reputation: 494Reputation: 494Reputation: 494Reputation: 494Reputation: 494
Well, generating graphs and PDF files is always a pita. I did it like this. Whenever a (any) user enters the page where these files are stored, they are deleted when they are older than 5 minutes. This is the php function I use for that:

PHP Code:
function del_tmp_files (){
        
$tmpdir "{$_SERVER['DOCUMENT_ROOT']}/tmp";
        
$delcmd "find $tmpdir -cmin +5 -name mr_\*  -exec rm {} \;";
        echo 
exec ($delcmd);

All files I create get a random filename (recognizable by the first few characters). In the header which directs the download I take care of suggesting a regular file name for the user:

PHP Code:
header('Content-Type: application/csv');
header("Content-Length: " .(string)(filesize($fname)) );
header('Content-Disposition: attachment; filename="summary_ext_bc.csv"'); 
So anyone browsing around has to know the name of the random file, it is never the same for a long time. Web visitors are not able browsing the www-root directory, that is a setting in Apache I think. And the filename you see while downloading is NOT the filename you get.

I admit it is not watertight, but it helps...

jlinkels
 
Old 01-20-2009, 09:13 PM   #9
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,630

Rep: Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573Reputation: 2573
Quote:
Originally Posted by jlinkels View Post
Well, generating graphs and PDF files is always a pita. I did it like this. Whenever a (any) user enters the page where these files are stored, they are deleted when they are older than 5 minutes. This is the php function I use for that:


So anyone browsing around has to know the name of the random file, it is never the same for a long time. Web visitors are not able browsing the www-root directory, that is a setting in Apache I think. And the filename you see while downloading is NOT the filename you get.

I admit it is not watertight, but it helps...

jlinkels
That's a sweet function, and a great idea/implementation. Thanks for sharing it.
 
Old 01-20-2009, 10:14 PM   #10
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Quote:
Originally Posted by jlinkels
Web visitors are not able browsing the www-root directory, that is a setting in Apache I think.
At very least, be sure to use an Options -Indexes directive in the <Directory /www-root> container. (If Options None is set, they're already turned off.)
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Troubles with NFS Share being Accessible via Apache and the Web djsting Linux - Newbie 3 06-29-2008 04:00 PM
Reverse proxy (apache) causing ASP.net web apps to stop working linuxmandrake Linux - Server 0 05-29-2008 05:02 PM
Do you use the stop button in your Web browser? General General 15 05-21-2006 05:34 AM
Hi, I've got an apache server which keeps crashing when I hit it with a web browser. humbletech99 Linux - Networking 6 11-30-2005 03:06 AM
Limit browser connections to Apache and stop download accelerators Moloko Linux - Software 0 02-07-2005 07:38 AM


All times are GMT -5. The time now is 05:51 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration