LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Difference in sizes of df & du (http://www.linuxquestions.org/questions/linux-newbie-8/difference-in-sizes-of-df-and-du-4175457301/)

smilemukul 04-08-2013 12:30 AM

Difference in sizes of df & du
 
HI,

There is a scenario where there is a difference in sizes of df & du & where I had followed the steps as,

1. Checked lsof where there was deleted files
2. Executed, lsof | grep "deleted" | awk '{print $2}' | xargs kill -9 to kill the deleted processes in lsof.

but I wanted to know what will be the correct approch as a system admin to handle these type of issues in production environment where other application might also be linked with the processes of deleted files in lsof without reboot.

Any solution will be appreciated.

sag47 04-08-2013 02:10 AM

It highly depends on what you're doing.

There's a list of open file descriptors for a process located in /proc/<PID>/fd. "ls -l" will tell you more information about the file descriptor (e.g. which file descriptor is linked to the deleted file). You could then truncate that file to clear the file and free up some space. For example, let's say your application only has one file open (other than the default stdin, stderr, and stdout). Normally your deleted open file would be open with file descriptor 3. So you would truncate that file (pretend your PID is 12345)
Code:

cd /proc/12345/fd
echo -n '' > ./3

Remember, before truncating any file descriptor associated with a process be sure that you're operating on the right one with "ls -l"!

As far as I know there's no good way to reestablish the hard link of the deleted file. If you need to recover the data from the deleted log you could tail -f the file descriptor
Code:

tail -c +0 -f /proc/12345/fd/3 > /path/filename
It will also get all subsequent writes to that file since the -f option is set for tail.

There is a github project which attempts to resolve the "restore a deleted hard link" problem called fdlink however I have not used that kernel module myself. I'd just recommend using the tail -f command and clearing the file descriptor every once in a while (because the file descriptor and the file will essentially be doubling the data produced on the HDD).

You can read more about the proc filesystem in the kernel documentation proc.txt.

See this behavior in action

You can play with this behavior using a python interactive prompt.
Code:

$ mkdir ~/sandbox
$ cd ~/sandbox
$ python

In interactive python mode,
Code:

f=open("test.log","a")
f.write("my log test 1\n")
f.close()

In bash, while python interactive check the contents of the log file.
Code:

$ cat test.log
my log test 1
$

Open the file again in python interactive,
Code:

f=open("test.log","a")
f.write("my log test 2\n")

In bash, delete test.log while python has it open. Check it out to see that it was deleted. Check the python process ID and then visit the open file descriptor (pretend lsof reported a PID of 12345).
Code:

$ rm test.log
$ lsof | grep deleted | grep test.log
$ cd /proc/12345/fd
$ ls -l
$ cat 3
my log test 1
$ echo -n '' > ./3
$ cat 3
$ tail -c +0 -f ./3 > ~/sandbox/test.log

Now in the python interactive prompt your last f.write() command hasn't been committed to disk. This will happen when you close the file of which the previous tail command will catch. In python interactive,
Code:

f.close()
You can now stop the tail -f command since the file has been closed for writing. Go ahead and cat your new test.log file. You should see the last written lines.
Code:

$ cat ~/sandbox/test.log
my log test 2
$

Kind of a neat little experiment.

SAM

chrism01 04-08-2013 02:13 AM

Basically you'd use lsof & fuser to check what delete files are still in use, then look at those apps and decide what to do.
This is part of being a SysAdmin, there's no exact procedure, you have to use your initiative.
If you're not sure about an app, find out who who does know and talk to them.
Make notes for future ref.
Normally an app shouldn't take up so much space that you have to resort to this anyway.
Only do this as a last resort...

smilemukul 04-08-2013 05:39 AM

Thanks for the info but how can I find the correct file descriptor no under /proc/12345/fd as there are number of numerical files in it.

pan64 04-08-2013 06:07 AM

I have SuSE 11 and after removing test.log I got another filename: (3 -> <dir>/.nfs0000000000f49c800002022e), and also it was not on the list of the deleted files (lsof | grep deleted). I assume those files are not really deleted, they are still in use. If you want you can use du on them or grep or whatever to identify anyone of them. Modifying those files by something like echo "" > ./3 or similar may cause problems (improper functionality?), I would rather kill that app.

sag47 04-08-2013 08:29 AM

Quote:

Originally Posted by smilemukul (Post 4927412)
Thanks for the info but how can I find the correct file descriptor no under /proc/12345/fd as there are number of numerical files in it.

Code:

cd /proc/12345/fd
ls -l

The long format listing (-l or minus ell) will present a list of links, one of them being to the deleted file which has been opened for writing.

Quote:

Originally Posted by pan64 (Post 4927431)
I have SuSE 11 and after removing test.log I got another filename: (3 -> <dir>/.nfs0000000000f49c800002022e), and also it was not on the list of the deleted files (lsof | grep deleted). I assume those files are not really deleted, they are still in use. If you want you can use du on them or grep or whatever to identify anyone of them. Modifying those files by something like echo "" > ./3 or similar may cause problems (improper functionality?), I would rather kill that app.

Interesting, I've not experienced that behavior myself. I've only tested this on RHEL, Fedora, and Ubuntu. I'm not doubting your report of the strange behavior; just that I haven't personally seen it. What kernel are you using?

Here's what I see on my Ubuntu system.
Code:

sam@farcry:~/sandbox/deleted_hardlink$ lsof | grep deleted | grep test.log
python    25555        sam    3w      REG              0,21        12  16121975 /home/sam/sandbox/deleted_hardlink/test.log (deleted)
sam@farcry:~/sandbox/deleted_hardlink$ cd /proc/25555/fd
sam@farcry:/proc/25555/fd$ ls
0  1  2  3
sam@farcry:/proc/25555/fd$ ls -l
total 0
lrwx------ 1 sam sam 64 Apr  8 08:34 0 -> /dev/pts/9
lrwx------ 1 sam sam 64 Apr  8 08:34 1 -> /dev/pts/9
lrwx------ 1 sam sam 64 Apr  8 08:34 2 -> /dev/pts/9
l-wx------ 1 sam sam 64 Apr  8 08:34 3 -> /home/sam/sandbox/deleted_hardlink/test.log (deleted)
sam@farcry:/proc/25555/fd$

Here's systems and kernel versions I've tested the behavior with.
Code:

$ head -n1 /etc/issue
Red Hat Enterprise Linux Server release 6.1 (Santiago)
$ uname -rm
2.6.32-71.el6.x86_64 x86_64
$ python --version
Python 2.6.6

$ head -n1 /etc/issue
Ubuntu 12.04.2 LTS \n \l
$ uname -rm
3.2.0-38-generic x86_64
$ python --version
Python 2.7.3

$ cat /etc/issue
Fedora release 16 (Verne)
$ uname -rm
3.6.11-4.fc16.x86_64 x86_64
$ python --version
Python 2.7.3

I got the behavior I originally described in post #2 on all of those systems.

pan64 04-08-2013 09:04 AM

Quote:

Originally Posted by sag47 (Post 4927498)
What kernel are you using?

Linux hostname 2.6.32.45-0.3-xen #1 SMP 2011-08-22 10:12:58 +0200 x86_64 x86_64 x86_64 GNU/Linux

SUSE Linux Enterprise Desktop 11 SP1 (x86_64)

I think the reason is that my home is located on an nfs drive...


All times are GMT -5. The time now is 09:31 PM.