LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to replicate a folder and its contents instantly across N servers? (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-replicate-a-folder-and-its-contents-instantly-across-n-servers-4175446098/)

sneakyimp 01-17-2013 07:01 PM

How to replicate a folder and its contents instantly across N servers?
 
I have been asked to make sure that N web servers (Amazon EC2 instances: maybe two, maybe three, maybe four...maybe N) all maintain the exact same contents for a particular folder. Let's call the folder /home/my_folder. Any additions, subtractions, or changes to the files and directories in this folder will be performed on one master machine and must be propagated *immediately* (or ASAP) to N slaves.

I have considered using NFS to create a shared directory on some machine and just have the slaves mount it, but I worry about performance when the N web servers are responding to HTTP requests that reference these shared files. Would every HTTP request result in a file system action to check the modification date of the file?

Alternatively, I am considering having the N slaves all mount this NFS share and use lsyncd (running locally on the slave) to watch the NFS share for changes. When a change is detected, the slave machine will copy the changes to a copy of the shared folder to its local file system for serving HTTP requests. Can lsync watch a share mounted via NFS? Is there a possiblity that lsync might trigger twice when a large file gets uploaded via FTP?. The page I linked says:
Quote:

Lsyncd watches a local directory trees event monitor interface (inotify or fsevents). It aggregates and combines events for a few seconds and then spawns one (or more) process(es) to synchronize the changes.
I really hope someone might help me to understand how I might address this request.

kbp 01-17-2013 07:56 PM

Quote:

Would every HTTP request result in a file system action to check the modification date of the file?
.. why? It should just serve the file. If you're thinking about caching you can control that from the server side.
Personally, I'd just go with the nfs mounted shared directory.

sneakyimp 01-17-2013 08:02 PM

EDIT: Thanks for your response!

Quote:

Originally Posted by kbp (Post 4872557)
.. why? It should just serve the file.

I guess I'm wondering how, in practice, apache interacts with the file system when serving files. Would an image be kept in RAM and served from there? If someone changed the file, how would Apache know to refresh the contents of RAM from the stored file? Seems to me that Apache would need to check the last modification date of the file every time its requested in order to know if it needed to refresh RAM. In a situation where you have N servers accessing a single shared volume, this sounds like a lot of traffic hammering on one shared drive. Throw in some network congestion and latency and you have a real problem on your hands.

Quote:

Originally Posted by kbp (Post 4872557)
If you're thinking about caching you can control that from the server side.

What do you mean "server side" ? Is this an apache configuration that will cache files? Do the files get cached on the local file system? If so, *how* do we know when to refresh them if the original file contents change? Again, I'm thinking that every slave machine would be checking the contents of the NFS share to make sure they had the latest version. I could be wrong.

Quote:

Originally Posted by kbp (Post 4872557)
Personally, I'd just go with the nfs mounted shared directory.

Suppose you had a dozen slaves under extremely heavy traffic. Might your nfs mounted shared directory become a serious bottleneck?

phobozad 01-17-2013 08:22 PM

AFS http://www.openafs.org/

sharadchhetri 01-17-2013 08:22 PM

Quote:

Originally Posted by sneakyimp (Post 4872539)
I have been asked to make sure that N web servers (Amazon EC2 instances: maybe two, maybe three, maybe four...maybe N) all maintain the exact same contents for a particular folder. Let's call the folder /home/my_folder. Any additions, subtractions, or changes to the files and directories in this folder will be performed on one master machine and must be propagated *immediately* (or ASAP) to N slaves.

I have considered using NFS to create a shared directory on some machine and just have the slaves mount it, but I worry about performance when the N web servers are responding to HTTP requests that reference these shared files. Would every HTTP request result in a file system action to check the modification date of the file?

Alternatively, I am considering having the N slaves all mount this NFS share and use lsyncd (running locally on the slave) to watch the NFS share for changes. When a change is detected, the slave machine will copy the changes to a copy of the shared folder to its local file system for serving HTTP requests. Can lsync watch a share mounted via NFS? Is there a possiblity that lsync might trigger twice when a large file gets uploaded via FTP?. The page I linked says:


I really hope someone might help me to understand how I might address this request.

for EC2 ,check glusterfs

http://www.gluster.org/

kbp 01-17-2013 08:27 PM

I'd expect an NFS server should happily serve 12 clients, even if they were busy. Your best bet is to load test the site when configured with each option, see whether it performs acceptably and how much headroom you have.

sneakyimp 01-17-2013 09:08 PM

Quote:

Originally Posted by phobozad (Post 4872571)

Thanks for the link. It does look powerful, but that's a great deal of information to digest. I'm in a pretty big hurry unfortunately and it has lots of prerequisites.

sneakyimp 01-17-2013 09:12 PM

Quote:

Originally Posted by sharadchhetri (Post 4872572)
for EC2 ,check glusterfs

http://www.gluster.org/

This looks very interesting. Any particular reason you think this is especially appropriate for EC2?

sneakyimp 01-17-2013 09:22 PM

Quote:

Originally Posted by kbp (Post 4872576)
I'd expect an NFS server should happily serve 12 clients, even if they were busy. Your best bet is to load test the site when configured with each option, see whether it performs acceptably and how much headroom you have.

I am, of course, hoping to set up memcached and some caching, but am hoping to be able to handle about 2,000 page requests per second with my cluster. In a worst-case scenario, if I assume 20 assets (css, images, javascript) per page all of which are on this shared drive (an unlikely scenario), then that would suggest 2,000 * 20 = 40,000 file system checks per second (to determine if a cache refresh is needed). Does that still sound "happy" ?

I doubt I will have the time & leeway with my employer to test all the options. I hope you are right as I expect I will be attempting this.

Any additional advice would be much appreciated.

sharadchhetri 01-18-2013 03:51 PM

Quote:

Originally Posted by sneakyimp (Post 4872605)
I am, of course, hoping to set up memcached and some caching, but am hoping to be able to handle about 2,000 page requests per second with my cluster. In a worst-case scenario, if I assume 20 assets (css, images, javascript) per page all of which are on this shared drive (an unlikely scenario), then that would suggest 2,000 * 20 = 40,000 file system checks per second (to determine if a cache refresh is needed). Does that still sound "happy" ?

I doubt I will have the time & leeway with my employer to test all the options. I hope you are right as I expect I will be attempting this.

Any additional advice would be much appreciated.

I have implemented glusterfs in EC2 , for me no problem at all, it is working fine.
Note: start with two node glusterfs ,check the load and do some testing for some period. If all looks good then you can use it.


All times are GMT -5. The time now is 03:06 PM.