All the information? There's quite a lot of it out there, much more than can be discussed on a single forum threads. People have written entire PhD dissertations about this sort of thing. In fact, there's a free on-line eBook
written by a physicist at Duke describing how to build a beowulf compute cluster. I'd suggest giving it a perusal.
A couple of questions you need to answer:
- Do all computers processing data work independently, or do they need to communicate frequently? If the former, do you have sfotware to parcel out chunks of work to each machine? If the latter, is your code written in parallel (e.g. using MPI) so it can run on a distributed memory cluster?
- Depending on the size of your data set and communications pattern, it's likely the networking between the machines will be the bottleneck. Have you taken this into account?
- Do you have the data analysis software readily at hand, or do you need to write it?
I've built several Beowulf clusters over the past ten-odd years, and here are another few things to keep in mind:
- Laptops (depending on the model) are not really designed and engineered to be continuously running flat-out 25/7/365. Your rate of hardware failure (especially fans and hard drives) might wind up being pretty high. Are you prepared for this?
- At the bare minimum, you'll need a master node that exports its hard drive via NFS for common data sharing amongst cluster nodes. It's also useful to have common user accounts; I suggest deploying LDAP (a pain to set up, but weay more secure than NIS if you do it right and use SSL). Kerberos can also integrate well with LDAP, but setting it up is quite complex. You'll definitely want to set up keys so users can ssh between machines passwordless.
- If you have high I/O requirements, you may even need to set up multiple NFS servers or deploy a parallel file system like Lustre, Gluster, or Ceph.
- You might need a batch scheduler. TORQUE, Open Grid Engine, Slurm, and LSF/OpenLava are popular choices. Depending on how complex scheduling will be, you may need an additional scheduler e.g. Maui.
Without more detail about how your software works and various other requirements, it's really tough to give good answers to your questions. However, there are some basic design considerations that you need to keep in mind.