LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   Portable sparse defrag by moving files using dd ? (http://www.linuxquestions.org/questions/linux-software-2/portable-sparse-defrag-by-moving-files-using-dd-856653/)

H_TeXMeX_H 01-16-2011 11:03 AM

Portable sparse defrag by moving files using dd ?
 
I want to make a simple, portable sparse defragmentation script, but I'm not sure if the theory behind it is sound, so here's my theory:

Assumptions (all should be true, but check them):

dd can be used to create contiguous files, and is used to make swap files because of this.

Fragmentation occurs when files that exist increase in size so that they collide with the next file on the disk and must be split up.

Linux and Linux filesystems try to keep files sparse so that files have room to expand and the collision is less likely.

Theory:

For all files in the filesystem, I will use dd to make contiguous copies to replace the originals.

I find to find all files on the filesystem 'find / -type f', and for each one use:

Code:

mv file file.defrag
dd bs=4K if=file.defrag of=file
rm -f file.defrag

Requirements:

Space on the filesystem.

Maybe a live CD environment ?

And that should do it ... is it really that simple ? How can I know if this actually works ? Is there any way to see how fragmented files are in Linux ? This should theoretically work regardless of filesystem, as long as it is journaled and running Linux.

EDIT:
This is the usual way to sparse defrag a filesystem that doesn't have a defrag program, like JFS:
https://wiki.archlinux.org/index.php...ragmenting_JFS

It involves moving all files off the disk, recreating the filesystem, and moving them all back. I think it's excessive, and I don't have an extra HDD of large enough size.

stress_junkie 01-16-2011 11:06 AM

I don't think so. I believe that the dd command, as used in your script, will just create files the same way as any other method. I think that you would have to use the bs=<filesize> count=1 somewhere in there to achieve your goal, at least in theory.

Don't forget that small files reside entirely in the inodes so they don't suffer from sparseness.

Single block files also do not suffer from sparseness.

I wonder about the practical value of defragmenting files in UNIX/Linux. I've never had a performance problem on systems that have been running for years. I should probably phrase that as "I've never seen disk/file performance degrade over time on UNIX/Linux systems".

These days it is barely worth doing even on Windows systems since disks are so fast. I tell my clients to only defrag their files once a year or less and only if they really feel compelled to do it at all. On Windows systems the performance degradation comes from the registry getting messed up over time, not from file fragmentation. (If you reinstall Windows the machine will usually run as fast as the day it was purchased even though the files are severely fragmented.)

I remember when Winchester disk (PC disks) data throughput was measured in KB/sec. That was slow enough to make file defragmentation worthwhile. I don't see the value of it today.

NOTE: THIS NEXT PART IS INCORRECT!!!
And another thing. :) Don't forget that modern disks reorganize disk writes in their buffer at the hardware level to optimize their access to the platters thereby reducing the benefit of defragmenting files on disk since they are probably being (...partially at least...) defragmented (...or more importantly, optimized...) automatically in the disk buffer. (I'm not really sure if this is completely correct. Actually, the more I think about it the more I think it is not correct.)

JZL240I-U 01-17-2011 02:10 AM

I think you crash already with the "mv". As I understand things the way you used it is a "rename" (look at the upper section of http://linux.die.net/man/1/mv). You'll at least have to move it to a different file system i.e. partition.

jhuizer 01-17-2011 05:54 AM

Quote:

Originally Posted by stress_junkie (Post 4226232)
And another thing. :) Don't forget that modern disks reorganize disk writes in their buffer at the hardware level to optimize their access to the platters thereby reducing the benefit of defragmenting files on disk since they are probably being (...partially at least...) defragmented (...or more importantly, optimized...) automatically in the disk buffer. (I'm not really sure if this is completely correct. Actually, the more I think about it the more I think it is not correct.)

I thought it was the Linux kernel reordering reads/writes for optimal performance.

JZL240I-U 01-17-2011 06:27 AM

Quote:

Originally Posted by jhuizer (Post 4227184)
I thought it was the Linux kernel reordering reads/writes for optimal performance.

Depends on the file system (and its version, i.e. age). For ext4 it is moved back from the kernel to the file system in the coming / newest(?) version, of the others I don't know.

stress_junkie 01-17-2011 07:30 AM

Quote:

Originally Posted by jhuizer (Post 4227184)
I thought it was the Linux kernel reordering reads/writes for optimal performance.

Yes. You are correct.

I was confusing the way that a disk can reorder the read/writes in its buffer with file optimization. Actually the disk will only put the file fragments on the locations that the kernel tells it to place them. Therefore the disk buffer will not optimize file fragment placement on the disk.

What actually does happen in the disk buffer is a reordering of the read/write commands to optimize the head movement during execution of the list of commands. And this is only on newer disks.

I considered removing the erroneous part of my original post but I felt that it would be cheating. That is why I made those comments at the end.

H_TeXMeX_H 01-17-2011 10:59 AM

One other way I was considering was indeed moving files off the disk onto another disk and moving them back.

I also agree that fragmentation will be experienced mostly with larger files, so I'll probably find files greater than the block size ... right ?

Still, why wouldn't it work with copying files within the filesystem, a new copy of the file is made, and the file is NOT fragmented when the copy is made ... or at least I think it isn't.

That's the problem, there's no tool to asses file fragmentation, if anyone knows of one, please list it. I know there's one for Window$, but that one is not very specific (like which exact files).

JZL240I-U 01-17-2011 11:02 AM

You might deign to read the link I provided...
;)

H_TeXMeX_H 01-17-2011 11:17 AM

I did. I rename the file with 'mv' then copy it to a new file with the original name using 'dd' (this one should in theory not be fragmented), then I delete the one I moved.

JZL240I-U 01-17-2011 11:23 AM

If you rename a file, its internal structure / fragmentation will not be affected.

When you use "dd" my understanding is, that it faithfully reproduces the aforementioned structure, else you might as well use "cp" (why do you use "dd" and not "cp" by the way?). Perhaps you'd better ask here: http://www.linuxquestions.org/questi...ommand-362506/

H_TeXMeX_H 01-18-2011 10:06 AM

I was hoping dd would only create files that are not fragmented, as it does when creating a swap file. I don't know if cp does this.

Either way, when I have more time, I will try to mess with a script like this and post the results if I can quantify them in some way.

jefro 01-18-2011 04:00 PM

Why not use tar?

JZL240I-U 01-19-2011 02:39 AM

Well, I offered a link to the one true and real dd thread on LQ... ;)

H_TeXMeX_H 01-19-2011 11:22 AM

Quote:

Originally Posted by jefro (Post 4229248)
Why not use tar?

How would I use it here ?

H_TeXMeX_H 05-16-2011 04:19 AM

It looks like 'filefrag' was the program I needed to test this theory. So far it seems that a mere 'cp' will work and will defragment files. 'dd' also works to about the same extent.


All times are GMT -5. The time now is 05:18 PM.