LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Linux file sizes (shell) (https://www.linuxquestions.org/questions/linux-general-1/linux-file-sizes-shell-329758/)

Zeno McDohl 06-02-2005 08:23 PM

Linux file sizes (shell)
 
I don't fully understand file sizes on *nix.
Code:

[zeno@boralis quests]$ du -ha Zeno*
4.0K    Zeno.qdt
[zeno@boralis quests]$ du -ba Zeno*
35      Zeno.qdt

It displays it as 4KB, but then with -b (bytes) it says 35. Can anyone explain this to me? I thought there was 1024 bytes in a kilobyte.

trickykid 06-02-2005 08:37 PM

Reporting the inode size.. for every file is within a 4k block.. some will report it as 4k if its actually under that size, etc. The -h in your first command stands for human readable, so it just rounds up to the size of the inode the file exists in.
You'll notice that directories will show up as 4k blocks as well... when they're probably literally a bit or so in size.. ;)

Zeno McDohl 06-02-2005 08:40 PM

I see. So...
Code:

[zeno@boralis mud]$ du -b quests
6000    quests

In KB, what would that be? Is that result an accurate amount?

trickykid 06-02-2005 08:47 PM

Quote:

Originally posted by Zeno McDohl
I see. So...
Code:

[zeno@boralis mud]$ du -b quests
6000    quests

In KB, what would that be? Is that result an accurate amount?

Well, the -b is for sizes in bytes.. how many bytes are in a kilobyte? 1024.. I hope you can do basic math.. ;)

trickykid 06-02-2005 08:49 PM

But remember, different filesystems and how you formatted your system will show different results. Usually the output in -bytes is the most accurate count and or if its a larger output, in kilobytes will be more accurate.. I always notice when you do -human readable form.. it will round up or off to make the number easier to read instead of a bunch of decimal places, etc.

Zeno McDohl 06-02-2005 08:52 PM

Yeah, I was working on my shell and got a little confused because I forgot to use -b and it displayed ~300. I see now, thanks.

Zeno McDohl 06-02-2005 09:01 PM

Actually now I'm not sure.
Code:

[zeno@boralis mud]$ du -h
396K    ./quests

Code:

[zeno@boralis mud]$ du -b
6000    ./quests

Is the -h very inaccurate? This is why I'm thinking I can't do math. ;)
Because... 6000/1024 ~ 6. Not 396..?

pmarques 06-03-2005 07:08 AM

No, no... the -b option doesn't only give byte granularity, it also gives "apparent size". With just the -h option, "du" will report actual "disk usage".

Every file is composed of N blocks. A block is typically 4k bytes for a large enough drive, i.e. any modern drive.

However, most filesystems in Linux support what we call "sparse files".

If you open a file, seek to position X, and then write a byte, the apparent file size will be X, but the real space occupied on disk won't include blocks that were never written to, it will be just one block. Basically only file positions that have been written to will take up space on the disk, whereas positions that were never written will be read as all zeros.

You can try this yourself doing:

dd if=/dev/zero of=my_test_file bs=1 seek=4M count=1

this should produce a file that is +4Mb apparent size, but that just takes one or two blocks of disk.

pmarques 06-03-2005 07:18 AM

Oops, I just read your post again and this wasn't your problem.

Your problem is that you're checking the size of a directory that is full of small files.

If you have a directory with 600 files with 10 bytes each, the apparent size is 6000 bytes, but the actual disk usage will be a block for each file: 600 * 4Kb = 2400Kb

Different filesystems will handle this differently, however. That is why we have so many ;)

I remember that ReiserFS (I'm not sure if it was version 3 or 4) was trying to store small files together with the metadata information of the directory so that they wouldn't waste a full block, but I was never a ReiserFS fan.

If you use a lot of small files you can try to set the block size to a smaller value to have less "slack", at some performance cost. This isn't an easy operation if you don't have enough storage space to temporarily move all your files...


All times are GMT -5. The time now is 09:35 PM.