LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-19-2014, 12:57 AM   #1
srigias
LQ Newbie
 
Registered: Dec 2014
Posts: 3

Rep: Reputation: Disabled
Why is du command showing incorrect results


Hi,

The below output showing 21GB files each.

[root@myhost data]# ls -l
total 100092
-rw-rw---- 1 ora4 ora4 22548586496 Dec 18 21:09 temp01.dbf
-rw-rw---- 1 ora4 ora4 22548586496 Dec 18 19:38 temp02.dbf


But when i used du command, its shows only 49MB.
[root@myhost data]# du -sh *
49M temp01.dbf
49M temp02.dbf

Could you please let me know, how to correct the values.


Sri
 
Old 12-19-2014, 01:18 AM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_10{.0|.1|.2}
Posts: 4,009
Blog Entries: 1

Rep: Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110
What filesystem is this?

A guess - they are db files, probably sparse - filled with reserved space - which shows with ls but not with du - du is showing space actually used.

There is not anything to "fix", both sizes are correct in their respective contexts.

What do you get with...

Code:
du -sh --apparent-size *

Last edited by astrogeek; 12-19-2014 at 01:30 AM.
 
Old 12-19-2014, 01:48 AM   #3
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,654

Rep: Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255
UNIX/Linux native filesystems allow files to have "holes".

This is done partly for speed, and partly to save disk space.

When files are being written with records in a random order, there is no need to actually allocate (and write) disk blocks that full of 0 byte values. It is only necessary to identify that the space is virtually present to preserve record locations in disk blocks actually written.

The original use was for using hash access to disk files - specifically a dictionary lookup. Instead of storing every word in a dictionary, it instead used a master dictionary (root words only, with prefix and suffix identifications), and a hash table containing a single bit - 1 for valid, 0 for error. To make lookups of root words fast, a "perfect hash" function was used for every root word entry. This hashing caused the lookup values to be spread over a 4GB range (much too large for the 5 MB disks in use in 1973), so to make them "fit" the huge number of blocks that had 0 bit values were just not written. If you did read them, you got a block with 0 values.

Now it did introduce a problem - if you copied the file the blocks would get allocated. And in 1973, you ran out of disk space when you did that.

Doesn't happen as often now - but for speed of writing (and reading), not having to perform the overhead of actually allocating/writing unused disk blocks is a big performance improvement for single record writes. It is still useful for hash functions too - now the hash can generate 64 bit values without running out of disk space.

Over time, cp (now derived/taken from the GNU project) can recognize files with holes, and preserves the holes during copies (basically just scanning the file for block boundaries where the data block has nothing but nulls, then seeks to the next block boundary with data without writing anything in between).

The resulting difference you see in the ls and du is the logical size (the expected data size) and the physical size (the actual recorded data).

Even for files without holes you can see a difference - the ls command will show the size of the recorded data, and the du command will show the size of the data plus any overhead blocks needed to manage the storage of the data. This overhead is usually reported as "meta-data" as it is data that only tells the kernel where the actual data blocks are.

An example of such is shown in http://dysphoria.net/OperatingSystem...tion_unix.html. Not all filesystems are like this, but most Linux native filesystems are "close".

When the file is small, only direct pointers to data blocks are used. When the file is larger, it requires more pointers to data blocks, so an additional blocks are allocated for indirect pointers. If the file is REALLY large, then there are blocks allocated that only point to blocks containing pointers (double indirect), if the file is REALLY HUGE, then there are pointers to blocks containing pointers to blocks (triple indirect). In the example image, these are shown in various shades of blue, and only shows this for single and double indirect blocks.

The ls command shows only the byte length of the data. the du command shows the blocks used for the data, plus the blocks used for the overhead.

You can search for documentation on how the various filesystems are used (ext2/3 is similar to the above, where ext4 adds extents that work a bit differently), xfs uses a more drastic difference (its target is for really large volumes where the overhead of indirect blocks would be huge.. and wanted more flexible volume sizing).
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Wireless card showing as incorrect hardware Touch Linux - Hardware 2 05-31-2011 11:04 PM
[SOLVED] df showing incorrect used space after lvextend sanschag Linux - General 2 10-01-2009 11:21 AM
Hard Drive showing incorrect capacity xer Linux - Hardware 7 12-27-2006 11:03 PM
Md5sum Keeps turning up incorrect but consistent results! Dachy Slackware 10 01-26-2006 04:15 PM
'Last Post' Column showing incorrect data Shade LQ Suggestions & Feedback 1 04-20-2005 02:31 AM


All times are GMT -5. The time now is 02:10 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration