LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 09-23-2015, 05:06 AM   #1
emlynjose
LQ Newbie
 
Registered: Sep 2015
Posts: 7

Rep: Reputation: Disabled
du command to be used in shell script without affecting the server load


Hi,

I need to use du command in a shell script to monitor the disk usage on nearly 1000+ servers. We are doing this via cfengine. I used du command in the script(which is run by a cron)to find the size of the directories used by the users. But my manager suggests that using du command can affect the system load since there are too many servers involved here. What can I do to run this script without affecting the system load? Is there any command like sleep() to be added in script to resolve this?

Regards,
Emlyn.
 
Old 09-23-2015, 05:44 AM   #2
Gary Baker
Member
 
Registered: Mar 2007
Location: Whitsett,NC
Distribution: Slackware 14.1 and MINT 17.1
Posts: 105

Rep: Reputation: 3
Have you explored the nice command? I.e. Always be a nice guy? Check Linux nice google it.
 
Old 09-23-2015, 06:23 AM   #3
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
I don't know much about cfengine, but surely the du cmd is running on the target nodes anyway?
If you're worried about the overhead of returning the results to the master (if that's what you are doing), consider using snmp instead - it may use less.
 
Old 09-23-2015, 06:26 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
would be nice to know what is the goal, probably df can be used instead, or other tools. Are they networked filesystems? (nfs, or similar)? Probably you need to run it only on one single host....
 
Old 09-23-2015, 08:11 PM   #5
emlynjose
LQ Newbie
 
Registered: Sep 2015
Posts: 7

Original Poster
Rep: Reputation: Disabled
Thank you for the response!

Here i am trying to find the size of the directories(which is actually the users workspace) inside a filesystem. So df will not help.
First i checked filesystem space(using df) and if the usage is more than 50, i tried finding the directories which takes more space in that filesystem. That is the goal.

It is not an nfs. How can i use nice and snmp in this case? I was told to do a random check with du so that it doesn't affect the system load.

Thank you!
BR, Emlyn
 
Old 09-23-2015, 08:44 PM   #6
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,779

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
du needs to walk through the filesystem to calculate usage, and that is always going to cause high I/O load on the server. You could get a similar report by enabling quotas and giving each user a really big quota -- perhaps larger than the filesystem so that the quota never gets in the way. Then you could use repquota to get a quick report of how much space each user was using. Note that this will not be by directory, just a total per filesystem for each user.
 
Old 09-23-2015, 08:57 PM   #7
emlynjose
LQ Newbie
 
Registered: Sep 2015
Posts: 7

Original Poster
Rep: Reputation: Disabled
hi thank you @rknichols .. but this doesn't concern users home dir.. We already have this feature enabled and quota check is done which is a different task. This concerns only the directories inside a filesystem, which are the workspaces for the users.

BR, Emlyn.
 
Old 09-23-2015, 09:45 PM   #8
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Perhaps you could give an example (made up names if you feel paranoid).
However, quotas per user are ALL files per user if enabled on each fs. IE not just homedirs.
How frequently do you need to do this per user / per fs ? EG once a day shouldn't hurt anyone, especially if done out of hrs.
 
Old 09-23-2015, 10:33 PM   #9
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,779

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Quote:
Originally Posted by emlynjose View Post
hi thank you @rknichols .. but this doesn't concern users home dir.. We already have this feature enabled and quota check is done which is a different task. This concerns only the directories inside a filesystem, which are the workspaces for the users.
Disk quotas are kept per filesystem, so unless the workspaces are in the same filesystem as the home directories, ... .
 
Old 09-29-2015, 05:10 AM   #10
emlynjose
LQ Newbie
 
Registered: Sep 2015
Posts: 7

Original Poster
Rep: Reputation: Disabled
Unfortunately disk quota will not work here. Can i sort the directories based on size without using du command? is it possible?
The thing is, we have this large filesystem of TB/ZB size with so many directories and i need to find the directories which are using more space(i.e) more than a particular size limit per user(which has to be calculated from the Size of df command). du is the issue here.
 
Old 09-29-2015, 05:38 AM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
probably you can find other ways, but it looks like du is the direct way. Actually you can try ls -lR and parse the output (and other similar tricks as well).
 
1 members found this post helpful.
Old 09-29-2015, 08:18 PM   #12
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
If the dirs you are interested in are top-of-partition dirs, then snmp could be an option - otherwise not.

Certainly, du is the conventional way to check space on a per dir basis. If you are really worried about potential performance issues, consider running at low priority (ie use the 'nice' cmd) and/or run du out of hrs.

As pan64 points out, just listing everything out and then calculating (possibly on another node) is another option.
Incidentally, instead of 'l' use 's' if you only need the sizes & names (ie not owners/perms/dates etc)
Code:
ls -sR

Last edited by chrism01; 09-29-2015 at 08:21 PM.
 
Old 09-29-2015, 09:21 PM   #13
emlynjose
LQ Newbie
 
Registered: Sep 2015
Posts: 7

Original Poster
Rep: Reputation: Disabled
I need some other info about the dir as well. So I am thinking to go with "nice"command. Can anyone help me with the command inside my script? I am not so familiar with using nice command.
Below is the du command I use in script:
Code:
function dlu_dfinfo {
  [ -z "$1" ] && \
    dlu_die "Wrong or missing argument in dlu_dfinfo"
    RETVAL=$(du -sk --time --time-style=+s "$1") || \
     return 1
    RETUSER=$(stat -c U "$1") || \
    return 1
    [ $RETUSER == root ] && \
    [ -x /usr/lib64/lxc ] && \
    RETUSER=$(echo $1 | awk -F/ '{ print $NF }')

echo "$RETVAL $RETUSER"
}
Below is how the function is being called:
Code:
TMP_INFO=$(dlu_dfinfo "$TMP_TARGET_PATH")
So where can i use nice command here? Also how does it work?

BR, Emlyn
 
Old 09-29-2015, 10:25 PM   #14
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,779

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
The problem with using the nice command here is that the major impact from du is I/O activity, and, when a process wakes up after waiting on a disk I/O operation, it starts running at a fixed priority regardless of its CPU usage history or "nice" setting. I doubt you'll see much difference between running
Code:
RETVAL=$(du -sk --time --time-style=+s "$1")
and
Code:
RETVAL=$(nice -19 du -sk --time --time-style=+%s "$1")
Note also that it's hard to test this. If you re-run the same du command while all of the needed inodes are still cached in memory, it runs almost instantaneously with little or no I/O activity.
 
Old 09-29-2015, 10:44 PM   #15
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Quote:
Originally Posted by pan64 View Post
Actually you can try ls -lR and parse the output (and other similar tricks as well).
This is a really good suggestion.
I just ran some strace tests on a small directory structure (85G, 3500 files), and du was consistently issuing around 15 times the number of fstat calls as ls - and taking more than 10 times as much CPU time (not much in my case).
I did several runs to ensure all the data was in cache, and the numbers consistent. The ls output has the dir name on one line and the total on the next - easy enough to parse in awk, perl, whatever.

Not wanting to derail the thread, just adding some support to pan64 idea.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell script for run an shell script on server using ssh bloodstreetboy Linux - Server 5 01-12-2013 03:23 AM
How to pass command line arguments from one shell script to another shell script VijayaRaghavanLakshman Linux - Newbie 5 01-20-2012 09:12 PM
load default value , shell script moata_u Programming 8 03-10-2011 09:05 AM
How to load a GTK 2.x theme with a command line or shell script? QueenZ Linux - Newbie 17 08-06-2010 10:04 AM
Using shell command output as input in shell script - how to do? EnderX Linux - Newbie 2 06-30-2010 12:46 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration