Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hmmm, I had a rather strange/bizarre or even clever idea the other day. I have been using linux to run web-hosting for friends and an idea occured to me when my friend asked me how he could setup his own hosting.
I had the idea that a group of fairly competant Linux users with >= DSL connections, could setup servers in their homes and run a distributed web hosting system.
I was just wondering... How hard would that be to setup? It would be similar to having a server farm/cluster, just that each machine would be in a different house/location.
Do you guys think it could work? Are there any examples of this having already been done? What would be the recommended setup? Separate machines for www/SFTP, Mail, DNS etc OR a set of machines each running www/SFTP as mirrors with mail and DNS distributed between them?
If this hasnt been done before or if anyone is interested we could setup an experiment and see how it could work.
The way I would deploy it is (concentrating on www: )
* set up a bunch of (almost) identical web servers.
* set up a DNS server (can also be a web server, but doesn' have to be).
* set that DNS server to give the IP's of all webservers in a round robin fashion.
* this will load-balance the servers
So, eg 3 servers: 192.168.0.{1,2,3} and the dns 192.168.0.4
the dns servers gets a request for the web-server and returns the 1st IP
on the next request, it returns the 2nd IP, .... the 4th request gets back nr1, ...
DNS can't be distributed on IP basis (it can be on a cluster, but IMO dns is 1 proces, so... it wouldn't matter).
How would we make sure that they are kept up to date. So if a user (for example) logs in and uses SFTP and uploads their site. That it isnt just displayed every 3 requests (if 3 servers are used). It would require fair amount of traffic between hosts to update it all on the fly. Though if a lot was changed it would take ages to mirror it if you did it every 12/24 hr for example.
Ever used FreeBSD? I like it and its wat i run at the moment. If we can get a few more ppl i would consider setting up a sourceforge project or something similar.
Regards
---
P.S. If this system worked it could become a quite powerful way to host. I would guess you need to setup another DNS server to add redundancy...
Distribution: OpenBSD 4.6, OS X 10.6.2, CentOS 4 & 5
Posts: 3,660
Rep:
Well that method is essentially just a mirroring system. It's really not distributing the content, it's duplicating it.
If you wanted to be really crazy, you could have each system host certain sites, but NFS mount all the other sites (on the other hosts) through an IPSec tunnel to each of the other hosts. This would accomplish distributing the content, but you're going to have a real problem with performance I would think.
If you wanted to go with the above example of mirroring the same content to multiple hosts and serving it via DNS round-robin, then you would just have to rsync each host periodically (like, every hour) to make sure they all have the same content. Of course, you would need to figure out how to handle situations like if a file has been deleted from one host, how do you tell if it was really deleted on purpose? The other hosts are going to think that file is just missing so they will try to replace it.
So you might go with a central repository where all the edits happen, and that gets rsync'd out to the service hosts every hour or every few hours. Now you have a single point of failure (the repository) which will pretty much nuke your site (especially if it gets compromised, because it will cause the changes to replicate to the other hosts). Another problem is how to allow end-users to edit their files? If you have them log into the repository, again single point of failure. That's also letting people log into what should be your most secure box--not a good idea. On the other hand, if you have users login to the different remote hosts via DNS round robin that means you'll need to replicate all your user accounts to every host. You'll also need to replicate their authentication credentials. Regular UNIX password files won't work very well for that, so now you're looking at Kerberos or RADIUS.
I could go round and round with circular reasoning forever, but the point is that webhost is a whole lot more complicated than it seems.
Another thing that you may need to consider is the replication of sql databases. A forum for example would not work well in this setup without realtime updates which is likely to be a strain on bandwidth.
Distribution: OpenBSD 4.6, OS X 10.6.2, CentOS 4 & 5
Posts: 3,660
Rep:
Quote:
Originally posted by pnh73 It seems that distributed webhosting is being able to create a successful balance between redundancy and security and performance.
Redudency yes, performance maybe*, security no. Your entire system is as weak as the weakest host, since each system has to trust each other for data replication (plus your authentication would need to be replicated, unless you want it to be a total nightmare). This means that a cracker only has to get lucky (or smart) with one host and they can own your entire distributed network.
*performance is largely tied to how well you can distribute the network load, and DNS round-robin is notoriously bad at that. What happens if several clients simultaneously connect to a host that only has 128K/s up? What happens if one client has a really large download that takes say, 30 minutes (meanwhile round robin will be throwing other connections at the same box)? DNS round-robin can control the number of simultaneous connections, nor can it account for varying bandwidth capacity of different hosts.
Really there is nothing new going on here, you're just looking at it from the distributed standpoint right away. Most hosting companies start with a single location, then they have to figure out how to make it redudent later on. You're just skipping the first mistake. The rest of the picture is unchanged. You need a lot of complex equipment, intelligent load balances, database replicators, virtual private networks, etc, etc... It's no different than for instance, Rackspace.com, NTT/Verio webhosting, etc.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.