LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   ROCK (https://www.linuxquestions.org/questions/rock-65/)
-   -   How to connect to a newly installed node via PXE (https://www.linuxquestions.org/questions/rock-65/how-to-connect-to-a-newly-installed-node-via-pxe-4175528146/)

tuanle 12-13-2014 03:36 AM

How to connect to a newly installed node via PXE
 
Hi everybody,
I am just first time with Rocks installation. My assumed small cluster is from FE (IBM x3550) and two nodes (x3650), Cat5 cable with A-B RJ45 connection with an un-managed switch.
Thus, I have 2 questions:
1. With the command "insert-ethers" I have installed two nodes via PXE from scratch. They passed the installation successfully and rebooted. But after rebooting, DHCP could not initiate Linux OS and dieds, leaving a message "Booting from local disk ...". I cannot connect to them, and of course, ssh to compute-0-X was impossible> My nodes are already installed or not?
After node installing, for the next reboot I changed the boot order from "CD-Network-Harddisk0-..." to "CD-Harddisk0-Network-..." and the nodes are seen and it is possible to do commands ssh, rocks sync users ...
2. When I issued mpirun (OpenMPI and other software were installed on the frontend before installing compute nodes) like "mpirun --hostfile myhostfile -np X abc" (X = # of cores, abc = executable), I ran on error that nodes' /usr (and OS in general) is only pure installation, without any software like Intel compilers and OpenMPI as I have on the frontend.

Many thanks to any explanation and suggestion.

Le Tuan,
Hanoi Univ. of Sci. and Technol.

unSpawn 12-22-2014 05:28 PM

Quote:

Originally Posted by tuanle (Post 5283957)
(..) But after rebooting, DHCP could not initiate Linux OS and dieds,

Any particular error message?


Quote:

Originally Posted by tuanle (Post 5283957)
(..)After node installing, for the next reboot I changed the boot order from "CD-Network-Harddisk0-..." to "CD-Harddisk0-Network-..." and the nodes are seen and it is possible to do commands ssh, rocks sync users ...

IMHO the "right" boot order would have been Network-Harddisk-CD because you want to use PXE to install the compute node and then Harddisk-Network-CD as the OS and software is installed.


Quote:

Originally Posted by tuanle (Post 5283957)
When I issued mpirun (OpenMPI and other software were installed on the frontend before installing compute nodes) (..) I ran on error that nodes' /usr (and OS in general) is only pure installation, without any software like Intel compilers and OpenMPI as I have on the frontend.

Install the HPC roll on the compute nodes manually and try again?

tuanle 12-22-2014 09:17 PM

Dear unSpawn
Thank you for reply.
1. Only "Booting from local disk ...". After the number of ten time DHCP invoque, the error message only about checking media,cable,etc.
2. I mean that initially the order was "CD-Network-Harddisk0.." as followed from Rocks User Manual, and after the installing via PXE (sometime I put DVD for the installing from DVD, but result the same), I have to change in BIOS the boot order to "CD-Harddisk0-Network-..." for turning on compute nodes.
3. The same thing, if I disconnect compute nodes from the network. Compute nodes have been installed from DVD if DVD is in DVD drive.

About my 2nd question, I was highlighted that I must install application software into frontend's /export/share/apps. But when I copied the tar.gz (installation) files to frontend's /export/share/apps, I could see them only in frontend's /share/apps, but not in compute nodes' /share at all (for it, I made ssh compute-0-X; ls -l /share)

unSpawn 12-23-2014 04:08 AM

Quote:

Originally Posted by tuanle (Post 5289323)
1. Only "Booting from local disk ...". After the number of ten time DHCP invoque, the error message only about checking media,cable,etc.

Next time please also check the frontend DHCP daemon logs?


Quote:

Originally Posted by tuanle (Post 5289323)
2. I mean that initially the order was "CD-Network-Harddisk0.." as followed from Rocks User Manual, and after the installing via PXE (sometime I put DVD for the installing from DVD, but result the same), I have to change in BIOS the boot order to "CD-Harddisk0-Network-..." for turning on compute nodes.

Well, if that works then at least you have got a workaround :-]


Quote:

Originally Posted by tuanle (Post 5289323)
About my 2nd question, I was highlighted that I must install application software into frontend's /export/share/apps. But when I copied the tar.gz (installation) files to frontend's /export/share/apps, I could see them only in frontend's /share/apps, but not in compute nodes' /share at all (for it, I made ssh compute-0-X; ls -l /share)

Hmm. No idea how to diagnose or fix that but you could push the HPC roll to each node individually for the time being, yes?

tuanle 01-12-2015 04:01 AM

Dear unSpawn,
Now I have to peacefully coexist with the changing the BIOS boot order to "Harddisk 0" first after x3650s' node installation via PXE. I have only several node so it does not difficult at all. Anyway, please tell me, how can I follow the frontend DHCP log?
For the last question I have posted, I realized that I have to issue the path so the compute nodes could find it. So, instead of "/export/apps/Application_paths..." I used "/share/apps/Application_path...". It was because of my naive experience on Rocks.
But it appeared an another difficulty: I found, with the only 10/100Mps switch for making interconnection between the cluster members' "eth0" the time needed for calculation 2-, 3-times longer than it was done by one separated server (the same hardware as the compute nodes)! So, I cannot use more CPU cores from one cluster member for "normal" rate. Using the second, Gigabit one, switch for connecting compute nodes' "eth1", like for InfiniBand scheme, can improve significantly the situation? And, in the case of the improvement, how to connect the frontend to it too?
Thanks for any suggestion.


All times are GMT -5. The time now is 03:55 AM.