LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-17-2013, 08:01 PM   #1
sneakyimp
Senior Member
 
Registered: Dec 2004
Posts: 1,001

Rep: Reputation: 67
How to replicate a folder and its contents instantly across N servers?


I have been asked to make sure that N web servers (Amazon EC2 instances: maybe two, maybe three, maybe four...maybe N) all maintain the exact same contents for a particular folder. Let's call the folder /home/my_folder. Any additions, subtractions, or changes to the files and directories in this folder will be performed on one master machine and must be propagated *immediately* (or ASAP) to N slaves.

I have considered using NFS to create a shared directory on some machine and just have the slaves mount it, but I worry about performance when the N web servers are responding to HTTP requests that reference these shared files. Would every HTTP request result in a file system action to check the modification date of the file?

Alternatively, I am considering having the N slaves all mount this NFS share and use lsyncd (running locally on the slave) to watch the NFS share for changes. When a change is detected, the slave machine will copy the changes to a copy of the shared folder to its local file system for serving HTTP requests. Can lsync watch a share mounted via NFS? Is there a possiblity that lsync might trigger twice when a large file gets uploaded via FTP?. The page I linked says:
Quote:
Lsyncd watches a local directory trees event monitor interface (inotify or fsevents). It aggregates and combines events for a few seconds and then spawns one (or more) process(es) to synchronize the changes.
I really hope someone might help me to understand how I might address this request.
 
Old 01-17-2013, 08:56 PM   #2
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,790

Rep: Reputation: 650Reputation: 650Reputation: 650Reputation: 650Reputation: 650Reputation: 650
Quote:
Would every HTTP request result in a file system action to check the modification date of the file?
.. why? It should just serve the file. If you're thinking about caching you can control that from the server side.
Personally, I'd just go with the nfs mounted shared directory.
 
Old 01-17-2013, 09:02 PM   #3
sneakyimp
Senior Member
 
Registered: Dec 2004
Posts: 1,001

Original Poster
Rep: Reputation: 67
EDIT: Thanks for your response!

Quote:
Originally Posted by kbp View Post
.. why? It should just serve the file.
I guess I'm wondering how, in practice, apache interacts with the file system when serving files. Would an image be kept in RAM and served from there? If someone changed the file, how would Apache know to refresh the contents of RAM from the stored file? Seems to me that Apache would need to check the last modification date of the file every time its requested in order to know if it needed to refresh RAM. In a situation where you have N servers accessing a single shared volume, this sounds like a lot of traffic hammering on one shared drive. Throw in some network congestion and latency and you have a real problem on your hands.

Quote:
Originally Posted by kbp View Post
If you're thinking about caching you can control that from the server side.
What do you mean "server side" ? Is this an apache configuration that will cache files? Do the files get cached on the local file system? If so, *how* do we know when to refresh them if the original file contents change? Again, I'm thinking that every slave machine would be checking the contents of the NFS share to make sure they had the latest version. I could be wrong.

Quote:
Originally Posted by kbp View Post
Personally, I'd just go with the nfs mounted shared directory.
Suppose you had a dozen slaves under extremely heavy traffic. Might your nfs mounted shared directory become a serious bottleneck?
 
Old 01-17-2013, 09:22 PM   #4
phobozad
LQ Newbie
 
Registered: Jan 2005
Distribution: Fedora Core 3
Posts: 19

Rep: Reputation: 1
AFS http://www.openafs.org/
 
1 members found this post helpful.
Old 01-17-2013, 09:22 PM   #5
sharadchhetri
Member
 
Registered: Aug 2008
Location: INDIA
Distribution: Redhat,Debian,Suse,Windows
Posts: 179

Rep: Reputation: 23
Quote:
Originally Posted by sneakyimp View Post
I have been asked to make sure that N web servers (Amazon EC2 instances: maybe two, maybe three, maybe four...maybe N) all maintain the exact same contents for a particular folder. Let's call the folder /home/my_folder. Any additions, subtractions, or changes to the files and directories in this folder will be performed on one master machine and must be propagated *immediately* (or ASAP) to N slaves.

I have considered using NFS to create a shared directory on some machine and just have the slaves mount it, but I worry about performance when the N web servers are responding to HTTP requests that reference these shared files. Would every HTTP request result in a file system action to check the modification date of the file?

Alternatively, I am considering having the N slaves all mount this NFS share and use lsyncd (running locally on the slave) to watch the NFS share for changes. When a change is detected, the slave machine will copy the changes to a copy of the shared folder to its local file system for serving HTTP requests. Can lsync watch a share mounted via NFS? Is there a possiblity that lsync might trigger twice when a large file gets uploaded via FTP?. The page I linked says:


I really hope someone might help me to understand how I might address this request.
for EC2 ,check glusterfs

http://www.gluster.org/
 
1 members found this post helpful.
Old 01-17-2013, 09:27 PM   #6
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,790

Rep: Reputation: 650Reputation: 650Reputation: 650Reputation: 650Reputation: 650Reputation: 650
I'd expect an NFS server should happily serve 12 clients, even if they were busy. Your best bet is to load test the site when configured with each option, see whether it performs acceptably and how much headroom you have.
 
Old 01-17-2013, 10:08 PM   #7
sneakyimp
Senior Member
 
Registered: Dec 2004
Posts: 1,001

Original Poster
Rep: Reputation: 67
Quote:
Originally Posted by phobozad View Post
Thanks for the link. It does look powerful, but that's a great deal of information to digest. I'm in a pretty big hurry unfortunately and it has lots of prerequisites.
 
1 members found this post helpful.
Old 01-17-2013, 10:12 PM   #8
sneakyimp
Senior Member
 
Registered: Dec 2004
Posts: 1,001

Original Poster
Rep: Reputation: 67
Quote:
Originally Posted by sharadchhetri View Post
for EC2 ,check glusterfs

http://www.gluster.org/
This looks very interesting. Any particular reason you think this is especially appropriate for EC2?
 
Old 01-17-2013, 10:22 PM   #9
sneakyimp
Senior Member
 
Registered: Dec 2004
Posts: 1,001

Original Poster
Rep: Reputation: 67
Quote:
Originally Posted by kbp View Post
I'd expect an NFS server should happily serve 12 clients, even if they were busy. Your best bet is to load test the site when configured with each option, see whether it performs acceptably and how much headroom you have.
I am, of course, hoping to set up memcached and some caching, but am hoping to be able to handle about 2,000 page requests per second with my cluster. In a worst-case scenario, if I assume 20 assets (css, images, javascript) per page all of which are on this shared drive (an unlikely scenario), then that would suggest 2,000 * 20 = 40,000 file system checks per second (to determine if a cache refresh is needed). Does that still sound "happy" ?

I doubt I will have the time & leeway with my employer to test all the options. I hope you are right as I expect I will be attempting this.

Any additional advice would be much appreciated.
 
Old 01-18-2013, 04:51 PM   #10
sharadchhetri
Member
 
Registered: Aug 2008
Location: INDIA
Distribution: Redhat,Debian,Suse,Windows
Posts: 179

Rep: Reputation: 23
Quote:
Originally Posted by sneakyimp View Post
I am, of course, hoping to set up memcached and some caching, but am hoping to be able to handle about 2,000 page requests per second with my cluster. In a worst-case scenario, if I assume 20 assets (css, images, javascript) per page all of which are on this shared drive (an unlikely scenario), then that would suggest 2,000 * 20 = 40,000 file system checks per second (to determine if a cache refresh is needed). Does that still sound "happy" ?

I doubt I will have the time & leeway with my employer to test all the options. I hope you are right as I expect I will be attempting this.

Any additional advice would be much appreciated.
I have implemented glusterfs in EC2 , for me no problem at all, it is working fine.
Note: start with two node glusterfs ,check the load and do some testing for some period. If all looks good then you can use it.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting the contents of a folder in a CPIO archive to another folder Refractor Linux - Software 4 07-10-2012 06:12 PM
Scripts to list folder contents and copy images from folder and subfolders brunces Linux - Newbie 6 11-03-2011 02:23 PM
Replicate file shares for redundant servers? toben Linux - Software 1 12-06-2007 08:48 AM
want to 'tar' a folder without some contents of folder shipon_97 Linux - Newbie 5 10-13-2007 05:21 AM
Can I instantly replicate data over 2 drives on different servers with a codafs? abefroman Linux - Hardware 1 09-14-2005 01:03 PM


All times are GMT -5. The time now is 06:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration