LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   Distributed Webhosting (https://www.linuxquestions.org/questions/linux-networking-3/distributed-webhosting-142271/)

pnh73 02-04-2004 01:49 PM

Distributed Webhosting
 
Hmmm, I had a rather strange/bizarre or even clever idea the other day. I have been using linux to run web-hosting for friends and an idea occured to me when my friend asked me how he could setup his own hosting.

I had the idea that a group of fairly competant Linux users with >= DSL connections, could setup servers in their homes and run a distributed web hosting system.

I was just wondering... How hard would that be to setup? It would be similar to having a server farm/cluster, just that each machine would be in a different house/location.

Do you guys think it could work? Are there any examples of this having already been done? What would be the recommended setup? Separate machines for www/SFTP, Mail, DNS etc OR a set of machines each running www/SFTP as mirrors with mail and DNS distributed between them?

If this hasnt been done before or if anyone is interested we could setup an experiment and see how it could work.

Thanks for your time

Regards,

nielchiano 02-04-2004 02:38 PM

i' interested in helping out!!

The way I would deploy it is (concentrating on www: )

* set up a bunch of (almost) identical web servers.
* set up a DNS server (can also be a web server, but doesn' have to be).
* set that DNS server to give the IP's of all webservers in a round robin fashion.
* this will load-balance the servers

So, eg 3 servers: 192.168.0.{1,2,3} and the dns 192.168.0.4
the dns servers gets a request for the web-server and returns the 1st IP
on the next request, it returns the 2nd IP, .... the 4th request gets back nr1, ...

DNS can't be distributed on IP basis (it can be on a cluster, but IMO dns is 1 proces, so... it wouldn't matter).

Mail I think can't either be distributed....

pnh73 02-04-2004 02:47 PM

How would we make sure that they are kept up to date. So if a user (for example) logs in and uses SFTP and uploads their site. That it isnt just displayed every 3 requests (if 3 servers are used). It would require fair amount of traffic between hosts to update it all on the fly. Though if a lot was changed it would take ages to mirror it if you did it every 12/24 hr for example.

Ever used FreeBSD? I like it and its wat i run at the moment. If we can get a few more ppl i would consider setting up a sourceforge project or something similar.

Regards
---

P.S. If this system worked it could become a quite powerful way to host. I would guess you need to setup another DNS server to add redundancy...

chort 02-04-2004 02:52 PM

Well that method is essentially just a mirroring system. It's really not distributing the content, it's duplicating it.

If you wanted to be really crazy, you could have each system host certain sites, but NFS mount all the other sites (on the other hosts) through an IPSec tunnel to each of the other hosts. This would accomplish distributing the content, but you're going to have a real problem with performance I would think.

If you wanted to go with the above example of mirroring the same content to multiple hosts and serving it via DNS round-robin, then you would just have to rsync each host periodically (like, every hour) to make sure they all have the same content. Of course, you would need to figure out how to handle situations like if a file has been deleted from one host, how do you tell if it was really deleted on purpose? The other hosts are going to think that file is just missing so they will try to replace it.

So you might go with a central repository where all the edits happen, and that gets rsync'd out to the service hosts every hour or every few hours. Now you have a single point of failure (the repository) which will pretty much nuke your site (especially if it gets compromised, because it will cause the changes to replicate to the other hosts). Another problem is how to allow end-users to edit their files? If you have them log into the repository, again single point of failure. That's also letting people log into what should be your most secure box--not a good idea. On the other hand, if you have users login to the different remote hosts via DNS round robin that means you'll need to replicate all your user accounts to every host. You'll also need to replicate their authentication credentials. Regular UNIX password files won't work very well for that, so now you're looking at Kerberos or RADIUS.

I could go round and round with circular reasoning forever, but the point is that webhost is a whole lot more complicated than it seems.

pnh73 02-04-2004 03:14 PM

Quote:

I could go round and round with circular reasoning forever, but the point is that webhost if a shole lot more complicated than it seems.
It seems that distributed webhosting is being able to create a successful balance between redundancy and security and performance.

david_ross 02-04-2004 03:30 PM

Another thing that you may need to consider is the replication of sql databases. A forum for example would not work well in this setup without realtime updates which is likely to be a strain on bandwidth.

pnh73 02-04-2004 04:30 PM

Ah, crumbs. Forgot about those. Though we could have a separate DB server which was backed up mirrored etc to another to add redundancy.

chort 02-04-2004 05:22 PM

Quote:

Originally posted by pnh73
It seems that distributed webhosting is being able to create a successful balance between redundancy and security and performance.
Redudency yes, performance maybe*, security no. Your entire system is as weak as the weakest host, since each system has to trust each other for data replication (plus your authentication would need to be replicated, unless you want it to be a total nightmare). This means that a cracker only has to get lucky (or smart) with one host and they can own your entire distributed network.

*performance is largely tied to how well you can distribute the network load, and DNS round-robin is notoriously bad at that. What happens if several clients simultaneously connect to a host that only has 128K/s up? What happens if one client has a really large download that takes say, 30 minutes (meanwhile round robin will be throwing other connections at the same box)? DNS round-robin can control the number of simultaneous connections, nor can it account for varying bandwidth capacity of different hosts.

Really there is nothing new going on here, you're just looking at it from the distributed standpoint right away. Most hosting companies start with a single location, then they have to figure out how to make it redudent later on. You're just skipping the first mistake. The rest of the picture is unchanged. You need a lot of complex equipment, intelligent load balances, database replicators, virtual private networks, etc, etc... It's no different than for instance, Rackspace.com, NTT/Verio webhosting, etc.


All times are GMT -5. The time now is 05:43 PM.