du command to be used in shell script without affecting the server load

emlynjose · 09-23-2015, 05:06 AM

Hi,

I need to use du command in a shell script to monitor the disk usage on nearly 1000+ servers. We are doing this via cfengine. I used du command in the script(which is run by a cron)to find the size of the directories used by the users. But my manager suggests that using du command can affect the system load since there are too many servers involved here. What can I do to run this script without affecting the system load? Is there any command like sleep() to be added in script to resolve this?

Regards,
Emlyn.

Gary Baker · 09-23-2015, 05:44 AM

Have you explored the nice command? I.e. Always be a nice guy? Check Linux nice google it.

chrism01 · 09-23-2015, 06:23 AM

I don't know much about cfengine, but surely the du cmd is running on the target nodes anyway?
If you're worried about the overhead of returning the results to the master (if that's what you are doing), consider using snmp instead - it may use less.

pan64 · 09-23-2015, 06:26 AM

would be nice to know what is the goal, probably df can be used instead, or other tools. Are they networked filesystems? (nfs, or similar)? Probably you need to run it only on one single host....

emlynjose · 09-23-2015, 08:11 PM

Thank you for the response!

Here i am trying to find the size of the directories(which is actually the users workspace) inside a filesystem. So df will not help.
First i checked filesystem space(using df) and if the usage is more than 50, i tried finding the directories which takes more space in that filesystem. That is the goal.

It is not an nfs. How can i use nice and snmp in this case? I was told to do a random check with du so that it doesn't affect the system load.

Thank you!
BR, Emlyn

rknichols · 09-23-2015, 08:44 PM

du needs to walk through the filesystem to calculate usage, and that is always going to cause high I/O load on the server. You could get a similar report by enabling quotas and giving each user a really big quota -- perhaps larger than the filesystem so that the quota never gets in the way. Then you could use repquota to get a quick report of how much space each user was using. Note that this will not be by directory, just a total per filesystem for each user.

emlynjose · 09-23-2015, 08:57 PM

hi thank you @rknichols .. but this doesn't concern users home dir.. We already have this feature enabled and quota check is done which is a different task. This concerns only the directories inside a filesystem, which are the workspaces for the users.

BR, Emlyn.

chrism01 · 09-23-2015, 09:45 PM

Perhaps you could give an example (made up names if you feel paranoid).
However, quotas per user are ALL files per user if enabled on each fs. IE not just homedirs.
How frequently do you need to do this per user / per fs ? EG once a day shouldn't hurt anyone, especially if done out of hrs.

rknichols · 09-23-2015, 10:33 PM

Quote:

Originally Posted by emlynjose

hi thank you @rknichols .. but this doesn't concern users home dir.. We already have this feature enabled and quota check is done which is a different task. This concerns only the directories inside a filesystem, which are the workspaces for the users.

Disk quotas are kept per filesystem, so unless the workspaces are in the same filesystem as the home directories, ... .

emlynjose · 09-29-2015, 05:10 AM

Unfortunately disk quota will not work here.

Can i sort the directories based on size without using du command? is it possible?
The thing is, we have this large filesystem of TB/ZB size with so many directories and i need to find the directories which are using more space(i.e) more than a particular size limit per user(which has to be calculated from the Size of df command). du is the issue here.

pan64 · 09-29-2015, 05:38 AM

probably you can find other ways, but it looks like du is the direct way. Actually you can try ls -lR and parse the output (and other similar tricks as well).

chrism01 · 09-29-2015, 08:18 PM

If the dirs you are interested in are top-of-partition dirs, then snmp could be an option - otherwise not.

Certainly, du is the conventional way to check space on a per dir basis. If you are really worried about potential performance issues, consider running at low priority (ie use the 'nice' cmd) and/or run du out of hrs.

As pan64 points out, just listing everything out and then calculating (possibly on another node) is another option.
Incidentally, instead of 'l' use 's' if you only need the sizes & names (ie not owners/perms/dates etc)

Code:

ls -sR

emlynjose · 09-29-2015, 09:21 PM

I need some other info about the dir as well. So I am thinking to go with "nice"command. Can anyone help me with the command inside my script? I am not so familiar with using nice command.
Below is the du command I use in script:

Code:

function dlu_dfinfo {
  [ -z "$1" ] && \
    dlu_die "Wrong or missing argument in dlu_dfinfo"
    RETVAL=$(du -sk --time --time-style=+s "$1") || \
     return 1
    RETUSER=$(stat -c U "$1") || \
    return 1
    [ $RETUSER == root ] && \
    [ -x /usr/lib64/lxc ] && \
    RETUSER=$(echo $1 | awk -F/ '{ print $NF }')

echo "$RETVAL $RETUSER"
}

Below is how the function is being called:

Code:

TMP_INFO=$(dlu_dfinfo "$TMP_TARGET_PATH")

So where can i use nice command here? Also how does it work?

BR, Emlyn

rknichols · 09-29-2015, 10:25 PM

The problem with using the nice command here is that the major impact from du is I/O activity, and, when a process wakes up after waiting on a disk I/O operation, it starts running at a fixed priority regardless of its CPU usage history or "nice" setting. I doubt you'll see much difference between running

Code:

RETVAL=$(du -sk --time --time-style=+s "$1")

and

Code:

RETVAL=$(nice -19 du -sk --time --time-style=+%s "$1")

Note also that it's hard to test this. If you re-run the same du command while all of the needed inodes are still cached in memory, it runs almost instantaneously with little or no I/O activity.

syg00 · 09-29-2015, 10:44 PM

Quote:

Originally Posted by pan64

Actually you can try ls -lR and parse the output (and other similar tricks as well).

This is a really good suggestion.
I just ran some strace tests on a small directory structure (85G, 3500 files), and du was consistently issuing around 15 times the number of fstat calls as ls - and taking more than 10 times as much CPU time (not much in my case).
I did several runs to ensure all the data was in cache, and the numbers consistent. The ls output has the dir name on one line and the total on the next - easy enough to parse in awk, perl, whatever.

Not wanting to derail the thread, just adding some support to pan64 idea.