LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-07-2015, 12:14 PM   #1
Suyiko
LQ Newbie
 
Registered: May 2015
Posts: 3

Rep: Reputation: Disabled
Space disk "used" in df is nowhere to be found with du


Hello,

I am facing an issue with a filesystem (/dev/sda3); I see space used on it (around 365GB) when I am looking at the host with "df -h" command.

Code:
[root@srv_omega /]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3             443G  365G   56G  87% /
tmpfs                  95G   56K   95G   1% /dev/shm
/dev/sda1             484M   39M  421M   9% /boot
/dev/sdb1             3.6T  1.3T  2.2T  36% /hadoop/disk1
/dev/sdc1             3.6T  1.3T  2.2T  37% /hadoop/disk2
/dev/sdd1             3.6T  1.3T  2.2T  36% /hadoop/disk3
/dev/sde1             3.6T  1.3T  2.2T  37% /hadoop/disk4
/dev/sdf1             3.6T  1.3T  2.2T  36% /hadoop/disk5
/dev/sdg1             3.6T  1.3T  2.2T  36% /hadoop/disk6
/dev/sdh1             3.6T  1.3T  2.2T  36% /hadoop/disk7
/dev/sdi1             3.6T  1.3T  2.2T  36% /hadoop/disk8
/dev/sdj1             3.6T  1.3T  2.2T  36% /hadoop/disk9
/dev/sdk1             3.6T  1.3T  2.2T  36% /hadoop/disk10
/dev/sdl1             3.6T  1.2T  2.3T  36% /hadoop/disk11
/dev/sdm1             3.6T  1.3T  2.2T  36% /hadoop/disk12
/dev/sdn1             3.6T  1.3T  2.2T  36% /hadoop/disk13
/dev/sdo1             3.6T  1.3T  2.2T  37% /hadoop/disk14
/dev/sdp1             3.6T  1.1T  2.4T  30% /hadoop/disk15
cm_processes           95G  8.2M   95G   1% /var/run/cloudera-scm-agent/process
I have looked if any hidden file might cause the issue, no joy.

Code:
[root@srv_omega /]# pwd
/
[root@srv_omega /]#  ls -lrtha
total 121K
drwxr-xr-x    2 root root 4.0K Jun 28  2011 srv
drwxr-xr-x    2 root root 4.0K Jun 28  2011 mnt
drwxr-xr-x    2 root root 4.0K Jun 28  2011 media
drwxr-xr-x    2 root root 4.0K Dec 20  2012 cgroup
drwx------    2 root root  16K Jun  2  2014 lost+found
drwxr-xr-x    2 root root 4.0K Jun  2  2014 selinux
-rw-r--r--    1 root root    0 Jun  3  2014 .autorelabel
drwxr-xr-x   18 root root 4.0K Jun  5  2014 hadoop
drwxr-xr-x   21 root root 4.0K Jun  5  2014 var
dr-xr-xr-x    9 root root  12K Jun 20  2014 lib64
dr-xr-xr-x    2 root root  12K Jun 21  2014 sbin
dr-xr-xr-x    2 root root 4.0K Jun 21  2014 bin
dr-xr-xr-x    5 root root 1.0K Jun 22  2014 boot
dr-xr-x---    5 root root 4.0K Jun 22  2014 root
drwxr-xr-x    6 root root 4.0K Jun 22  2014 opt
drwxr-xr-x    3 root root 4.0K Dec 10 19:11 home
dr-xr-xr-x   13 root root 4.0K Dec 12 16:18 lib
dr-xr-xr-x 1140 root root    0 Apr 30 15:11 proc
drwxr-xr-x   13 root root    0 Apr 30 15:11 sys
-rw-r--r--    1 root root    0 Apr 30 15:11 .autofsck
drwxr-xr-x    2 root root    0 Apr 30 15:11 misc
drwxr-xr-x    2 root root    0 Apr 30 15:11 net
drwxr-xr-x   15 root root 4.0K Apr 30 15:12 usr
drwxr-xr-x   19 root root 4.6K Apr 30 15:12 dev
dr-xr-xr-x   27 root root 4.0K Apr 30 15:12 ..
dr-xr-xr-x   27 root root 4.0K Apr 30 15:12 .
drwxr-xr-x  122 root root  12K May  4 03:33 etc
drwxrwxrwt   16 root root 4.0K May  7 06:14 tmp
So I try to find where the space is used with a "du -sh" command

Code:
[root@srv_omega /]# pwd
/
[root@srv_omega /]# du -sh *
7.8M    bin
29M     boot
4.0K    cgroup
280K    dev
26M     etc
19T     hadoop
124K    home
144M    lib
26M     lib64
16K     lost+found
4.0K    media
0       misc
4.0K    mnt
0       net
7.9G    opt
du: cannot access `proc/9170/task/27326/fdinfo/538': No such file or directory
du: cannot access `proc/45119/task/45119/fd/4': No such file or directory
du: cannot access `proc/45119/task/45119/fdinfo/4': No such file or directory
du: cannot access `proc/45119/fd/4': No such file or directory
du: cannot access `proc/45119/fdinfo/4': No such file or directory
du: cannot access `proc/45160': No such file or directory
0       proc
3.8M    root
17M     sbin
4.0K    selinux
4.0K    srv
0       sys
3.9M    tmp
2.6G    usr
16G     var
So as far as I understand, only /hadoop is a suitable suspect (as cumulative size of all the other folders on "/" are well below the 365GB)

Code:
[root@srv_omega hadoop]# cd /
[root@srv_omega /]# cd /hadoop
[root@srv_omega hadoop]# ls -lrtha
total 72K
drwxr-xr-x  2 root root 4.0K Jun  5  2014 disk16
drwxr-xr-x 18 root root 4.0K Jun  5  2014 .
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk1
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk11
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk10
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk13
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk12
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk14
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk2
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk4
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk3
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk6
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk5
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk8
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk7
drwxr-xr-x  4 root root 4.0K Jun 22  2014 disk9
drwxr-xr-x  5 root root 4.0K Nov 19 20:02 disk15
dr-xr-xr-x 27 root root 4.0K Apr 30 15:12 ..
All folders from 1 to 15 are on different filesystems, so the folder disk16 seems to be the only option but there is nothing in it.

Code:
[root@srv_omega hadoop]# cd disk16/
[root@srv_omega disk16]# ls -lrtha
total 8.0K
drwxr-xr-x 18 root root 4.0K Jun  5  2014 ..
drwxr-xr-x  2 root root 4.0K Jun  5  2014 .
[root@srv_omega disk16]#
I just don't get it; no folder seems responsible for the "365Gb"...

Any idea on how I could try to find out where those "365GB" are ?
 
Old 05-07-2015, 12:49 PM   #2
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 2,968

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
There are two possibilities. One is a deleted file that is still held open by some process. That file would be hidden from du. You can run (as root)
Code:
lsof | grep -i del
and look through the output for any huge files. The other is a file hidden under an active mount point, possibly because you at some point ran a program without one of those volumes mounted. You can check for those:
Code:
mkdir /tmp/tmproot
mount --bind / /tmp/tmproot
Now you can look in /tmp/tmproot and see the root filesystem by itself, without anything mounted on it. Running "du -s *" in that directory should show you what is using the space. You can remove any unwanted files from there.

After you are done:
Code:
umount /tmp/tmproot
rmdir /tmp/tmproot
 
4 members found this post helpful.
Old 05-07-2015, 06:53 PM   #3
Kustom42
Senior Member
 
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,604

Rep: Reputation: 414Reputation: 414Reputation: 414Reputation: 414Reputation: 414
rknichols is dead on right. This is caused by a file being deleted but a process still having a value for it as an open file descriptor in the /proc/ directory. Using lsof will identify the process for you.

This is the ONLY time I will ever say this... but rebooting your linux box is the easiest way to fix it if you are able to.

If its prod or you cant reboot as a solution use the lsof and restart individual processes.
 
Old 05-20-2015, 08:32 AM   #4
Suyiko
LQ Newbie
 
Registered: May 2015
Posts: 3

Original Poster
Rep: Reputation: Disabled
Hello,

Sorry for the late reply, been busy on other issues :/ I have rebooted the server to be sure I was not in the 1st scenario (file held by other process), as I expected it did not change a thing.

rknichols, you were right, something fishy happened at some point with the mount point, after the "mount --bind" command I found the below in a subfolder:

Quote:
[root@srv_omega hadoop]# pwd
/tmp/tmproot/hadoop
[root@srv_omega hadoop]# du -sm *
1 disk1
1 disk10
1 disk11
1 disk12
1 disk13
1 disk14
347006 disk15
1 disk16
1 disk2
1 disk3
1 disk4
1 disk5
1 disk6
1 disk7
1 disk8
1 disk9
When I look at the fstab & lsblk, my guess is that now every disk is correctly mounted but was not "at some point":

Quote:

[...]
LABEL=DISK1 /hadoop/disk1 ext4 noatime,nodiratime 0 0
LABEL=DISK2 /hadoop/disk2 ext4 noatime,nodiratime 0 0
LABEL=DISK3 /hadoop/disk3 ext4 noatime,nodiratime 0 0
LABEL=DISK4 /hadoop/disk4 ext4 noatime,nodiratime 0 0
LABEL=DISK5 /hadoop/disk5 ext4 noatime,nodiratime 0 0
LABEL=DISK6 /hadoop/disk6 ext4 noatime,nodiratime 0 0
LABEL=DISK7 /hadoop/disk7 ext4 noatime,nodiratime 0 0
LABEL=DISK8 /hadoop/disk8 ext4 noatime,nodiratime 0 0
LABEL=DISK9 /hadoop/disk9 ext4 noatime,nodiratime 0 0
LABEL=DISK10 /hadoop/disk10 ext4 noatime,nodiratime 0 0
LABEL=DISK11 /hadoop/disk11 ext4 noatime,nodiratime 0 0
LABEL=DISK12 /hadoop/disk12 ext4 noatime,nodiratime 0 0
LABEL=DISK13 /hadoop/disk13 ext4 noatime,nodiratime 0 0
LABEL=DISK14 /hadoop/disk14 ext4 noatime,nodiratime 0 0
LABEL=DISK15 /hadoop/disk15 ext4 noatime,nodiratime 0 0
[...]
Quote:
[root@srv_omega hadoop]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465.7G 0 disk
├─sda1 8:1 0 499.8M 0 part /boot
├─sda2 8:2 0 15.6G 0 part [SWAP]
└─sda3 8:3 0 449.6G 0 part /
sdb 8:16 0 3.7T 0 disk
└─sdb1 8:17 0 3.7T 0 part /hadoop/disk1
sdd 8:48 0 3.7T 0 disk
└─sdd1 8:49 0 3.7T 0 part /hadoop/disk3
sdc 8:32 0 3.7T 0 disk
└─sdc1 8:33 0 3.7T 0 part /hadoop/disk2
sdf 8:80 0 3.7T 0 disk
└─sdf1 8:81 0 3.7T 0 part /hadoop/disk5
sde 8:64 0 3.7T 0 disk
└─sde1 8:65 0 3.7T 0 part /hadoop/disk4
sdh 8:112 0 3.7T 0 disk
└─sdh1 8:113 0 3.7T 0 part /hadoop/disk7
sdi 8:128 0 3.7T 0 disk
└─sdi1 8:129 0 3.7T 0 part /hadoop/disk8
sdj 8:144 0 3.7T 0 disk
└─sdj1 8:145 0 3.7T 0 part /hadoop/disk9
sdg 8:96 0 3.7T 0 disk
└─sdg1 8:97 0 3.7T 0 part /hadoop/disk6
sdm 8:192 0 3.7T 0 disk
└─sdm1 8:193 0 3.7T 0 part /hadoop/disk12
sdn 8:208 0 3.7T 0 disk
└─sdn1 8:209 0 3.7T 0 part /hadoop/disk13
sdl 8:176 0 3.7T 0 disk
└─sdl1 8:177 0 3.7T 0 part /hadoop/disk11
sdp 8:240 0 3.7T 0 disk
└─sdp1 8:241 0 3.7T 0 part /hadoop/disk15
sdo 8:224 0 3.7T 0 disk
└─sdo1 8:225 0 3.7T 0 part /hadoop/disk14
sdk 8:160 0 3.7T 0 disk
└─sdk1 8:161 0 3.7T 0 part /hadoop/disk10
So now I think I just have to move the data from the disk "sda" to "sdp"; any idea on the proper/safest way to do it ?

EDIT : All I can think of is to move the data to another location, then copying it back.

Last edited by Suyiko; 05-20-2015 at 08:39 AM.
 
Old 05-20-2015, 10:20 AM   #5
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 2,968

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
You can move stuff directly from /tmp/tmproot/{wherever} to any mounted filesystem, e.g.:
Code:
mv /tmp/tmproot/hadoop/disk15/XXX /hadoop/disk15/
It's just like moving files between any other separate filesystems.

BTW, if you're curious about just when "at some point" was, you can use "ls -lc" to examine the inode change times.

Last edited by rknichols; 05-20-2015 at 10:39 AM. Reason: add "BTW, ..."
 
1 members found this post helpful.
Old 05-20-2015, 11:07 AM   #6
Suyiko
LQ Newbie
 
Registered: May 2015
Posts: 3

Original Poster
Rep: Reputation: Disabled
With ls -lc I can see that things went south a long time ago.

Due to the nature of the data (Hadoop HDFS datanode) I won't be able to clean this without shutting down the HDFS service and that may wait some time as it is in production. Anyway, I think that I have all sides covered now and will be able to manage to get back to the way it is supposed to be.

Thank you very much for the help of this, really helped and learn quite a bit. Really appreciated.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] X: "loading extension glx" "no screens found" "fatal server error" (w/ nvidia driver) Geremia Slackware 7 12-29-2014 12:00 PM
[SOLVED] Fedora 19 showing Low Disk Space on "filesystem root". oceanmbs Fedora 18 10-23-2014 11:27 AM
[SOLVED] Disk Utility and "df" show different disk space. confusion sets in. reed68 Linux - Newbie 6 11-18-2010 01:44 PM
Standard commands give "-bash: open: command not found" even in "su -" and "su root" mibo12 Linux - General 4 11-11-2007 11:18 PM
Cedega "Disk space required for the installation exceeds available disk space" Solved Spewdemon LinuxQuestions.org Member Success Stories 1 10-18-2007 07:19 PM


All times are GMT -5. The time now is 06:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration