Sorry for not being explicit from the start. Here is what I have to do:
I work on Unix based servers (I know this post should be under Unix/Solaris etc. but it's general thing, and my chances are higher here), and there are processes that periodically generate log files. The thing is that often there is no purging entry in crontab to automatically delete the old and obsolete files. This of course, in time even though there is plenty of space on HDD, eats all inodes, meaning no write can be done anymore (not to mention the very slow OS when this happens). So I made up a script to identify where exactly the folder containing lots and lots of files is located in order to further investigate. The script work just fine, but it is totally memory inefficient(so I need awk to replace part of it). Downwards is the script:
echo "Start first phase. This will take a while ..."
find / -name "*" -type f -fls result.txt
# this phase literally searches all files on root file system and dumps full path to each one of them in result.txt. No fancy things here, just a find command
echo "First phase done. Second phase is starting. It also will take a while ..."
sed 's/^.*[0-9][0-9] \(\/.*\)\/.*$/\1/g' result.txt >> intermediar.txt
# the sed command here retains only the directory to where the file is situated. I used some regex to describe the pattern encountered on each line
echo "Last phase. Results are shown downwards:"
#from now on what's done is a simple count of each occurrence of a distinct path (since in case a folder contains 100000 text files each file is printed
with it's full path in result.txt, it means that the path appears 100000 times in this file ... so it is counted). The thing is that the script accepts an
input variable which is the minimum number of files a folder should contain in order to be displayed.
#Well, what I need is to replace this counting part of the script with something awk based. I tried different approaches but failed. Indeed I should read
carefully the awk manual, but I ask for help this time, as i don't have sufficient time to improve my awk skills.
#Thank you in advance.
for i in $(cat intermediar.txt)
if [[ $var == $var1 ]]
if (( $min_val <= $suma ));then
echo "$suma ---> $var"
The thing with this script is that at third phase when loading all lines of result.txt "for i in $(cat intermediar.txt)" , the memory it eats goes of the scale, and for a server this is the last thing anyone would want.
Thank you in advance for your help
Last edited by coss_cat; 12-09-2011 at 11:17 AM.