So I have a little linux cluster of 8 computers in total (1 master node, 7 slave nodes). The way it's set up is that the master node has 3 ethernet cards (1 for accessing the outside world thus bypassing the router, 1 for accessing the LAN, and the last connects to an 8-port switch with the 7 slave nodes). Anyway everything was working just great until last weekend when I decided to change the outside world connection from DHCP to static so I could bypass the router and thus not have to worry about port forwarding. This actually worked great and I was excited to see it working again (a couple of months ago this is how it was set up but after moving the cluster across the building I switched it to DHCP). A couple of hours later I noticed that the slave nodes could no longer contact the master node. I'm not sure if this was related to changing from DHCP to static or if someone else had SSH'd into the cluster and changed settings (there is one other person I know of who could have done this).
So I've been racking my brain the past few days trying to figure out what exactly happened so any help would be very much appreciated.
Here are my symptoms:
The master node can ping itself but none of the slave nodes can ping it. The master node also still has internet access and can access the LAN (so the other 2 ethernet cards seem to be working perfectly fine). The lights on the actual ethernet card are blinking and turn off when I unplug the cable from the switch, which to me indicates it's working.
Here's the kicker, all 7 slave nodes can ping each other but none can ping the master node. I've tried changing which ports are being used by the switch but regardless of which cable is plugged into which port, the symptoms never change.
I tried switching it back to DHCP and of course it still didn't work, which indicates that maybe another setting was changed? Also when I switched from DHCP to static, the prompt changed from:
"airlinux:~/Desktop" to "master:~/Desktop" and visa versa.
Anyway I'm not the person who set this thing up but somehow I'm in charge of making sure it keeps working (lucky me
) so i was hoping someone could give me a little more direction as to what I can check or how to debug this thing.