LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   What to do when a Linux server is "slow" (https://www.linuxquestions.org/questions/linux-newbie-8/what-to-do-when-a-linux-server-is-slow-764799/)

StupidNewbie 10-27-2009 10:26 AM

What to do when a Linux server is "slow"
 
Hi all:

I am a network admin that's pretty new to Linux. I have a couple SLES10 boxes running and occasionally our programmers will complain that they are "slow."

When someone complains that a server is running slowly but it doesn't seem like a network issue (i.e. ping times are fine, no one else is complaining of slow network response), what should I be looking for on a Linux server? For example, on a Windows workstation you might try cleaning out temporary internet files, looking for a hung process, etc.

Is there equivalent troubleshooting I can do for a Linux server to see if something may be using abnormal amounts of processing power or memory, if there's a runaway or hung process, a directory full that could slow down the system, etc?

I'm not looking for specific instructions, just general troubleshooting tips to gather information about why a server might be responding slowly.

Thanks!

TB0ne 10-27-2009 11:10 AM

Quote:

Originally Posted by StupidNewbie (Post 3734178)
Hi all:

I am a network admin that's pretty new to Linux. I have a couple SLES10 boxes running and occasionally our programmers will complain that they are "slow."

When someone complains that a server is running slowly but it doesn't seem like a network issue (i.e. ping times are fine, no one else is complaining of slow network response), what should I be looking for on a Linux server? For example, on a Windows workstation you might try cleaning out temporary internet files, looking for a hung process, etc.

Is there equivalent troubleshooting I can do for a Linux server to see if something may be using abnormal amounts of processing power or memory, if there's a runaway or hung process, a directory full that could slow down the system, etc?

I'm not looking for specific instructions, just general troubleshooting tips to gather information about why a server might be responding slowly.

Thanks!

Sure. The top, iostat, and sar programs are all good ones. Top can show you what's eating your CPU, memory load, etc. iostat and sar can look at disk IO and other performance metrics.

And since you're dealing with programmers, make sure you ask THEM questions, too. Are they compiling a big wad of code in one window, and are AMAZED that the system is under load in their other terminal? :) Database related stuff going on?? Could be beating the disk up, if you're on a development server, or if the database is on one disk, and the process is getting IO bound.

Check out Nagios and other monitoring tools like that. Gives you handy graphs of CPU, memory, network performance, etc., and is a great thing to have if you're monitoring multiple servers.

StupidNewbie 10-27-2009 11:40 AM

Excellent response. Thanks for the tips. I will check out some of those programs and commands right now. Strangely, one of the programmers sent me an email telling me I should check out Nagios for network monitoring...haha

Other replies still welcome! Thanks!

TB0ne 10-27-2009 12:09 PM

Quote:

Originally Posted by StupidNewbie (Post 3734252)
Excellent response. Thanks for the tips. I will check out some of those programs and commands right now. Strangely, one of the programmers sent me an email telling me I should check out Nagios for network monitoring...haha

Other replies still welcome! Thanks!

You're welcome...and if you have a particular 'problem' user, one of my best tools for that is a Wiffle-bat. Sure, you've got to walk down to their desk, but results are immediate. :)

pcunix 10-27-2009 01:23 PM

You probably won't need this, but thosewith less experience may find this http://aplawrence.com/Unixart/slow.html helpful.

onebuck 10-27-2009 04:03 PM

Hi,

Quote:

Originally Posted by TB0ne (Post 3734283)
You're welcome...and if you have a particular 'problem' user, one of my best tools for that is a Wiffle-bat. Sure, you've got to walk down to their desk, but results are immediate. :)

I'm not that nice! I use their code reference book (not the dummies book, to light). :)

salasi 10-28-2009 01:06 PM

Quote:

Originally Posted by StupidNewbie (Post 3734178)
...what should I be looking for on a Linux server? For example, on a Windows workstation you might try cleaning out temporary internet files, looking for a hung process, etc.

Is there equivalent troubleshooting I can do for a Linux server to see if something may be using abnormal amounts of processing power or memory, if there's a runaway or hung process, a directory full that could slow down the system, etc?

I'm not looking for specific instructions, just general troubleshooting tips to gather information about why a server might be responding slowly.

Well, the irritatingly general response to this is:
  • find out what's wrong
  • put it right
note that this is in slight contrast to the approach that you outline for windows, where you are 'trying the things that usually work'. So, with windows, you aren't actually troubleshooting in a logical way, just throwing some pre-canned solutions at the wall and seeing if something works. Hey, you can always tell them to re-boot, and if that didn't work tell them they did it wrong.

TB0ne has already outlined some tools for the 'find out what's wrong' phase. In addition, I like, or liked, ksysguard. It is not as detailed a tool as some as those that TB0ne mentions, but it can be configured to look at several different aspects of performance slowdowns. It is however a bit heavy in memory footprint and if not having enough memory is your problem or part of your problem, then that would be exactly what you don't want. (And the kde 4 version of ksysguard is a bit shaky, so far.) It does have the advantage that it can be used remotely, if you can figure out how to configure it to give you what you want.

The particular things that you want to look for are (to start with)
  • does any program look to be directly using excesive amounts of cpu
  • is I/O (usually disk, but maybe network) causing lots of cpu to dissappear down a black hole

One of the difficulties with the tools that TB0ne suggested is that it is easy to misinterpret the output (its a complex subject...it applies to an extent with any tool, but some of the output probably isn't at all obvious at first); I have to suggest that you need to read the man page more than once or a tutorial on the tool in question if you are not to make mistakes with them.

One thing to watch out for is, if you see lots of disk activity, that doesn't mean that there is a disk problem. It may mean, eg, you don't have enough memory (or that you do have memory hogs) and that is causing excessive swapping.

StupidNewbie 10-29-2009 08:57 AM

Quote:

Originally Posted by salasi (Post 3735588)
The particular things that you want to look for are (to start with)
  • does any program look to be directly using excesive amounts of cpu
  • is I/O (usually disk, but maybe network) causing lots of cpu to dissappear down a black hole

This is good. Will the tools tb0ne mentioned point me in the right direction for finding these kinds of things? The problem is not that I didn't know that, it's that I don't know how to tell if one of those items is true in Linux.

I'm assuming that the tools tb0ne mentioned will help with that, but I haven't had a chance to look at the man pages for them. Also, I don't have a benchmark for what is "normal" so I wouldn't really know what to look for even if I did have the right tools.

Don't get me wrong though. All of this is GREAT information and I appreciate it! I'll continue researching on my own too :)

TB0ne 10-29-2009 10:26 AM

Quote:

Originally Posted by StupidNewbie (Post 3736640)
This is good. Will the tools tb0ne mentioned point me in the right direction for finding these kinds of things? The problem is not that I didn't know that, it's that I don't know how to tell if one of those items is true in Linux.

I think so, but I'm sure you'll discover other tools that will help you as well. Each environment is different, so finding what's 'best' is a matter of trial and error.
Quote:

I'm assuming that the tools tb0ne mentioned will help with that, but I haven't had a chance to look at the man pages for them. Also, I don't have a benchmark for what is "normal" so I wouldn't really know what to look for even if I did have the right tools.
Normal is relative. :) Best thing I can suggest is to run the tools at different times during the day, each day, for about two weeks or so, and get an average of what's going on. Keep that as your 'normal' statistic, and go from there. While you're doing this, ask your users if they're having performance issues at any particular times/days, so you can then track down what's going on. Perhaps there's a job running that can be moved to off-time, so it doesn't impact the users, or maybe you're running out of resources...knowing WHEN to look is a great first step. Nagios, Big Brother, and other tools that do this monitoring on a 24/7 basis, and give you nice web pages and graphs, are a tremendous asset. You'll have to set them up, though, but it's well worth the investment.
Quote:

Don't get me wrong though. All of this is GREAT information and I appreciate it! I'll continue researching on my own too :)
That's the only real way to get any good knowledge. :)

chrism01 10-29-2009 08:38 PM

All good advice above. 'Normal' is unique to each box, but, loosely speaking, if the load avgs >5 (consistently) its struggling a bit, avg >10 definitely a problem. If swap usage consistently > 25% of designated swap, you've got issues.
Note that Linux will run/walk with these loads or greater, but its not how you'd like to see it.

salasi 10-30-2009 07:27 AM

Quote:

Originally Posted by StupidNewbie (Post 3736640)
This is good. Will the tools tb0ne mentioned point me in the right direction....

Yes, up to a point. The tools are likely to give you a good pointer as to the detail of the symptom, it is up to you to turn that into 'and what should I do about that' and this can have more or less difficulty depending on what you find.

If, eg, you find that you don't have enough memory or that disk performance might be a bit on the low side, there might be an affordable solution to those problems and you might just not care beyond that. (And, sometimes, just spending money on the user's problem makes the user feel better, even if it doesn't improve performance.) If, however, you are pointed at a less affordable solution, you might be more inclined to be asking 'Why is the memory/disk under so much load? Why isn't the disk subsystem better performing' and following that path might be a bit more, errr, educational.

Quote:

I'm assuming that the tools tb0ne mentioned will help with that, but I haven't had a chance to look at the man pages for them....Also, I don't have a benchmark for what is "normal" so I wouldn't really know what to look for even if I did have the right tools.
You have the systems available when they are working well. Look at what is happening when things are working well and try to work out what is different when they aren't working well.

Have a look at top (or atop or htop if you prefer one of those) as its pretty good at pushing the big hitters towards the top of the display. Is it 'one big thing' or the 'aggregation of lots of little things'?

Quote:

I haven't had a chance to look at the man pages
That is time that you need to make; there is no alternative. And no one is really going to help you much if you haven't read the man page first.

Quote:

Nagios and other monitoring tools like that
Nagios is a good tool, but from where you are (just starting) you might find the learning curve is problematic, particularly if you need to be seen to be making progress in the short term. If you have the time, you could do worse than look at some of the simpler tools in the short term while working on something more sophisticated with, eg, nagios to cover the possibility that you are need to do something longer term/more sophisticated.

I should also mention vmstat, if you have virtual memory problems (but don't believe the first line of its output...as is mentioned in the man page).

StupidNewbie 10-30-2009 10:14 AM

Thanks for all the great advice guys. I thnk I've got all I need to go much further now. At the same time I'm actually building a linux from scratch system at home right now so that will help with understanding how everything works together.

onebuck 10-30-2009 10:20 AM

Hi,

I must agree that you should be using the 'man command' to familiarize yourself. Just a few links to aid you;

Linux Documentation Project
Rute Tutorial & Exposition
Linux Command Guide
Utimate Linux Newbie Guide
LinuxSelfHelp
Getting Started with Linux
Advanced Bash-Scripting Guide
Virtualiation- Top 10

The above links and others can be found at 'Slackware-Links'. More than just SlackwareŽ links!

:hattip:


All times are GMT -5. The time now is 06:51 AM.