Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I need to build a website to download big quantity of information; the downloaded data will be saved in many servers, so i have to add a load balancing system.
How can i start?
What is the easyest software to do something like this?
Could you clarify a bit what you mean with building a website to download big quantity of information? Do you mean you are building a website that offers a lot of data that can be downloaded? Or are you doing it the other way around, that is, that your website will download data from other servers and store it on your servers?
Load balancing a website and the traffic isn't that difficult but I'm not sure about the data, how that could be processed.
I dont know if "load balancing" or "distributed data store" are the right term.
I have 1 web server and multiple servers to store information. I have to find some way to decide what is the correct data server to use whenever the website needs to download information from other servers.
In my opinion if you only have one webserver that will do all the work, then you don't need load balancing. Load balancing is a way of distributing 'the workload' between multiple servers that offer the same service, like a webserver, and that 'hide' behind a virtual IP. For more information about load balancing you can check out heartbeat (Google is your friend).
It seems that you only need the distributed data store to be able to save what's downloaded on multiple servers. I would check out the second link in my previous post (Sector/Sphere). I personally haven't worked with it but it looks like they have a lot of possibilities using scripts. That way you would be able to save data using scripts as filters.
You could create a cluster of all nodes - And all nodes in turn could be connected to Single SAN (If you choose - you can use NAS)
Since with a cluster - one of the nodes could be "Master Node" - this node will have channel bonding (preferably 2 bonds - Active Backup/mode 1 - High Availablity) - Should this go off for some reason - Clustered "Download" Service could resume from other node.
Downloads will continue unhampered - and will go to designated RAID/SAN/NAS - whatever.
Well NAS is nothing but a network storage and computer takes it as a network storage. SAN would be better off. Just for storing data there is no need of cluster. SAN could do your job. And if you are looking for data redundancy, you can get SAN with RAID as well.
I would like to know why are you looking for clustering for just data storage and downloads?
Asimba: how can i use multiple servers to store X quantity of information with a SAN or a NAS network?
Why a SAN or a NAS network is a better choice than a distributed data store?
X quantity of information - you could use a daemon/cron job to monitor (ifconfig tx/rx data)
ifconfig will return tx/rx numbers - how you interpret those - you need to work on them.
Secondly - a unified shared storage - is better in terms of maintenance - since you might want to install RAID (Whatever kind you choose) - you wont have to go to individual nodes and take care of RAID/IO scheduling/multi blah blah.
And considering that you have specialized storage stuff like HP EVA - enterprise virtual array - you might be better off with simply plugging in drives as and when needed.
but again - that's lazy me - you might have different opinion.
NAS - Network attached storage. Its like making a file server and mounting the shares at the client. Its not virtual and client knows that it is using a shared/networked storage.
SAN - Storage area network - Clients will treat it as local storage even though being on network.
SANs could be expensive and will have a dedicated hardware requirement. On the other hand, you can create NAS using some software packages like FreeNAS and installing it on your hardware. A cheaper option for network storage.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.