LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Tip: Isolating a Linux problem (https://www.linuxquestions.org/questions/linux-newbie-8/tip-isolating-a-linux-problem-359161/)

sundialsvcs 09-01-2005 10:05 AM

Tip: Isolating a Linux problem
 
Unlike Microsoft Windows, Linux is a very de-centralized system. This is one of the things that makes it robust, but it also makes it tricky to isolate where the root-cause of any problem may actually lie. Let me suggest a step-by-step approach:

If you get to a LILO or Grub prompt of any kind, then you can be sure that the basic hardware is functional, that at-least the boot-device has been found. If not, then either the correct boot-device has not been specified, or the master-boot-record (MBR) image on that device is wrong, or perhaps the disk-partition table is wrong. (Windows systems usually expect the LBA partition-flag to be set.) LILO can also have problems if you make any change to the boot-area without re-running the LILO command, which of course is why I use Grub.

Dual-boot systems really aren't such a great idea, especially if you are learning, because each OS tends to assume that it's alone in the world and may unintentionally interfere with the other. Also, it's much harder to diagnose two very complex systems that are trying to live together in the same house. Drag that old machine out of the closet... it still works fine, you know.

If Linux starts to come up, then dies with a "kernel panic" almost immediately, the most likely culprit is that it can't find the main filesystem partition... usually specified with the root= parameter in the kernel. A more-complex issue with the hardware might be present but is fairly unlikely.

Notice the difference in how the grub boot-loader numbers devices, vs. how Linux does the same. Notice the difference between the /boot partition-location and the root "/" partition. If you're trying to use root=LABEL=/ then "good luck." Subtleties matter at this point, 'cuz hardware's kinda stoopid sometimes.

The next thing that may happen is the so-called initrd, or "initial RAM-disk," which is a slightly-funky sequence that the computer goes through when the Linux is "halfway-up" (so to speak) so that it can configure itself the rest of the way, loading critical device-drivers and so on. Major problems usually don't surface there.

The next thing that happens will be the startup-sequence which scrolls by very fast. This is controlled by a script such as /etc/rc.d/rc.sysinit and stuff in /etc/rc.d/rc5.d/ (for example). One way to diagnose a problem like this is with a kernel command-line word such as single (go to single-user mode), or confirm, which should cause the system to walk-through the startup sequence one step at a time. Now you are trying to isolate the problem because it's going to be found in a particular subsystem.

The last thing that happens, usually, is to start the graphic display. This is called X-Windows or maybe Xorg. If you get to a login: prompt but don't see anything graphical going on, the problem will be somewhere in X-windows. (And the resemblance between "X-windows" and "Microsoft Windows" is purely coincidental.)

One useful command is dmesg, which replays those startup-messages. Another very useful place to look are the various files in /var/log. Whenever any subsystem pukes ... ;) ... something will be said about it in some log somewhere.

And finally, man, for "manual." As in documentation. It's not great, it's not all complete, but there's a helluva lot of it.

Most of the Linux system is composed of loosely-coupled subsystems, and the first step in tackling a problem in a running system is to decide which one it is. For instance, look at a list of all of the running processes (ps -a) and type man on each of their names. Or use locate to see what files are out there with a name like that. Browse them... read-only, of course. Try the command lsmod, which lists the "kernel modules" that you're using, and see what you can find about each of those.

A diary will help. Don't try to rely on your memory all the time because sooner or later you find yourself repeating the old cartoon: "Mrs. Johnson? May I leave the room? My brain is full..." Start by describing the problem, in some detail, to yourself so that you can "step back and look at it" and ponder what it might be, writing it down even if you think you already know. Give yourself a little time to think. I find that when I can state the nature of the problem fully and in-writing, I am nine-tenths of the way toward solving it. Be systematic; methodical. Hey, it works for doctors.

A very useful resource is the search-features of this web-site in particular: linuxquestions.org. No matter what's hapening to you now ... :cry: ... there's an excellent chance that it happened to lots of other folks too ... ::cry: :eek: :mad: :confused: ... and they (or somebody) already found the answer .. :cool: ... and wrote it down already where you can ... :study: ... find it!

bushidozen 09-01-2005 10:17 AM

Great post. Hopefully many people will find it very helpful.


All times are GMT -5. The time now is 05:51 PM.