[SOLVED] Copy and Remove files

JJJCR · 07-14-2014, 07:53 PM

hello guys, i copied some files to another folder and I notice it takes only few minutes.
But when I remove the files it takes more than an hour.

Now, I really don't know whether all the files was really copied.

I'm using SUSE Ent. Server 11

Any ideas guys, how Linux does the copying on the background? Or is this a normal operation?

Thanks.

sag47 · 07-14-2014, 08:31 PM

It is not a normal latency for that operation. Deleting files should be fairly instant. Try removing one file at a time and use strace. With strace you'll be able to see which system call the command is hanging up on. Iostat can show you some statistics as well. If top looks normal then there is no good reason for an rm operation to take longer than a copy operation unless it is enumerating of millions of tiny files.

JJJCR · 07-14-2014, 09:22 PM

Thanks sag47, I was using putty when I was copying and removing files.

So there's a good possibility that all files were copied successfully?

I will try to check as what you suggested. Thanks for the idea.

sag47 · 07-14-2014, 10:36 PM

Quote:

Originally Posted by JJJCR

So there's a good possibility that all files were copied successfully?

Unfortunately my crystal ball is in the shop :-). If you're concerned about the integrity of a file copy operation then you can do a few things.

If it's only a single file or a few files then run a checksum on the source/destination (md5sum command or any of the *sum commands). Find command one liner works well here for many files.

If you're copying copious amounts of files then you're better off using a utility which checks the integrity as part of the copying/syncing process. Rsync comes to mind for that.

Code:

rsync -av /src /dst

See rsync man page for option explanation. With rsync you can cancel the copy operation half way through and start it again later; it will pick up where it left off.

If you're possibly using putty over a flakey network consider using a terminal multiplexer such as screen or tmux. If you get disconnected from your session the copy operation will continue and when you connect to the server again you can resume the disconnected session to view the result of said operation.

JJJCR · 07-14-2014, 10:52 PM

hahah, crystall ball on the shop..

sorry for the vague question..

actually, i just use the cp command to copy the files and so far Linux OS just went back to the directory prompt without any error message.

And as usual during successful cp command it does this thing.

So cross fingers, I assumed that all the eml files (it could be hundred of them), were successfully copied.

And I deleted the files using the rm command, in order not to waste space or duplicate files.

But to my surprise it takes more than I expected. cp does it in few minutes and rm does it on hours.

Anyway, if user complains time to pull the backup. Thanks for the help.

sag47 · 07-14-2014, 11:11 PM

Quote:

Originally Posted by JJJCR

But to my surprise it takes more than I expected. cp does it in few minutes and rm does it on hours.

Anyway, if user complains time to pull the backup. Thanks for the help.

To me it sounds like there is a more serious underlying problem worth investigating. I recommend consulting with peers on the issue or continue hashing it out on LQ until you discover the source of the slow rm operation. What system call did the operation hang up on with strace? What type of file system do the files reside? What type of storage is the filesystem located? Is it a network file share mount? There's a lot more which could be investigated.

JJJCR · 07-15-2014, 12:37 AM

Quote:

Originally Posted by sag47

To me it sounds like there is a more serious underlying problem worth investigating. I recommend consulting with peers on the issue or continue hashing it out on LQ until you discover the source of the slow rm operation. What system call did the operation hang up on with strace? What type of file system do the files reside? What type of storage is the filesystem located? Is it a network file share mount? There's a lot more which could be investigated.

haven't run any diagnostic tool, the folder in which the file resides was running under Kerio user email folder.

I'm not really sure whether Kerio services was interfering with the rm command.

I did not try stopping the Kerio service and deleting the files. I remove the files while Kerio service was running.

sag47 · 07-15-2014, 06:59 AM

Quote:

Originally Posted by JJJCR

haven't run any diagnostic tool, the folder in which the file resides was running under Kerio user email folder.

I'm not really sure whether Kerio services was interfering with the rm command.

I did not try stopping the Kerio service and deleting the files. I remove the files while Kerio service was running.

lsof is a good command to list any open file handles. Though in Linux you can delete a file and said file still takes up space as long as there's an open file handle. For that reason deleted files which have open file handles from programs can be recovered from /proc.

jpollard · 07-15-2014, 02:24 PM

It really depends on the filesystem.

Most linux native filesystems are quite fast at delete (using a tree or sparse list).

But non-native filesystems aren't so efficient - These usually (not always) store files in a directory as a simple linear file. Deleting a bunch of files in the wrong order (like always deleting the file at the beginning of the directory list) is slow. This is because the directory gets repacked for every delete. This is usually not a problem... unless you have thousands of files in a directory.

Now sometimes it can be due to the deallocation of data blocks - especially on a disk that is 100% full.

frankbell · 07-15-2014, 07:18 PM

How large are the files in question? More details about precisely what you are trying to accomplish might help clarify this: are the files are on the same or separate partitions, in a remote location, etc.

As a general observation, if you are copying a file to another location on the same partition, the file does not actually get rewritten. The pointers to the location of the file are simply changed. That's why the copy process appears to complete so quickly--nothing really gets copied, it just gets a new address.

If you copy a file to another partition or another physical device, that takes some time because the file must be written to the target partition and then removed from the source partition. It's recreating the data that takes time.

JJJCR · 07-15-2014, 10:05 PM

Quote:

Originally Posted by frankbell

How large are the files in question? More details about precisely what you are trying to accomplish might help clarify this: are the files are on the same or separate partitions, in a remote location, etc.

As a general observation, if you are copying a file to another location on the same partition, the file does not actually get rewritten. The pointers to the location of the file are simply changed. That's why the copy process appears to complete so quickly--nothing really gets copied, it just gets a new address.

If you copy a file to another partition or another physical device, that takes some time because the file must be written to the target partition and then removed from the source partition. It's recreating the data that takes time.

Hi Frankbell, this would explain why copying was very fast. Thank you so much. It's good to know how the system works, but of course it would be the best to know how the developer did it.

Files were copied in the same partition but to a different location only.

Thanks again, Frankbell.

sag47 · 07-16-2014, 08:26 PM

You've still not answered critical questions which would help us help you troubleshoot your filesystem. Can you answer them please?

frankbell · 07-16-2014, 08:59 PM

Quote:

Hi Frankbell, this would explain why copying was very fast. Thank you so much. It's good to know how the system works, but of course it would be the best to know how the developer did it.

It's been like that since I first started mucking about with computers with DOS 3.2.

File information is stored in a table at the beginning of each partition. In NTFS, it's called a Master File Table. In Linux file systems, it's called inodes. In other file systems, it might be called something else.

When you move a file on a partition, all that happens is that that table gets changed. There's no reason the physically move the bits and bytes when changing a database entry (the table is ultimately a database) has the same result for, as my father would have said, all practical purposes.

As to who came up with the idea and why, I don't know. It's shrouded in the mists of Unix history.

A web search for "how file systems work" will turn up many useful links, if you want to pursue this further.

Glad I could help.

JJJCR · 07-16-2014, 10:54 PM

Thanks Frankbell, sometimes need to refresh the memory on how things work.

JJJCR · 12-11-2014, 07:07 PM

Quote:

Originally Posted by sag47

Unfortunately my crystal ball is in the shop :-). If you're concerned about the integrity of a file copy operation then you can do a few things.

If it's only a single file or a few files then run a checksum on the source/destination (md5sum command or any of the *sum commands). Find command one liner works well here for many files.

If you're copying copious amounts of files then you're better off using a utility which checks the integrity as part of the copying/syncing process. Rsync comes to mind for that.

Code:

rsync -av /src /dst

See rsync man page for option explanation. With rsync you can cancel the copy operation half way through and start it again later; it will pick up where it left off.

If you're possibly using putty over a flakey network consider using a terminal multiplexer such as screen or tmux. If you get disconnected from your session the copy operation will continue and when you connect to the server again you can resume the disconnected session to view the result of said operation.

Thanks sag47.