LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   i/o redirection destn file (moved, renamed, edited), will there be data corruption ? (https://www.linuxquestions.org/questions/linux-newbie-8/i-o-redirection-destn-file-moved-renamed-edited-will-there-be-data-corruption-943100/)

vikggg 05-03-2012 09:12 AM

i/o redirection destn file (moved, renamed, edited), will there be data corruption ?
 
i have a script that is running as a background process,
redirects the stdout and stderr to some file as below.

script.py >> log/script.log 2>&1 &

Now while the script is running, i moved the log file, renamed it (since it was growing too big).

What i did observe was that even after renaming the file it was growing in size.

--- Question 1: Can somebody explain this behavior ? (my guess is the FD still remains unchanged)

Then I cleaned up some garbage messages in it and saved it again.

I was expecting the system to detect closed FD and create a new file and continue the I/O re-direction. (Learnt it the hard way that i was wrong)

--- Question 2: Saving the file now most probably changed the file handle information. What happens to the background process that has I/O redirection.. ?? does it still keep on writing to the FD info that it had from before ? Should I expect data corruption on disk ? How do i recover/repair without having to stop my script ?

Thanks,
Vik

MensaWater 05-03-2012 09:31 AM

On most UNIX/Linux filesystems files have "names" and "inodes". It is the inode in which the data is actually stored. When you rename (mv) a file (on the same filesystem) you are simply putting a different name on the same inode. Any process that had the file "open" will continue to have it open because it is the inode rather than the name.

This causes a common problem in that people often delete (rm) a log file but because it is "open" only the name is deleted - the inode is still in place so no space is recovered.

You can run "lsof <filename>" to see if any process has the file "open".

The proper way to do this is to stop the process that has the file open, rename or delete the old file, create a new empty file with the old name then restart the process. After that if you chose to save the old file rather than delete it you typically want to compress (using something like gzip) the old file that you renamed so that it frees up more space on the filesystem but is available for review later if necessary. (Note that compression requires space for the old file AND the new compressed file until the compression is complete - sometimes you need to mv the file to another filesystem, do the compression then mv the compressed file back to original filesystem.)

Note that when you use "mv" to move a file from one filesystem to another you are actually copying from the inode original filesystem to a different inode on the new filesystem then removing the old inode.

vikggg 05-03-2012 10:01 AM

lsof -p 27267

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
...
script.py 27267 root 1w REG 8,3 645903454 110985451 /home/scripts/logs/script.log~ (deleted)
script.py 27267 root 2w REG 8,3 645903454 110985451 /home/scripts/logs/script.log~ (deleted)
..

The script still running it is still going to shell out some logging information, with the destination file now marked deleted, what happens to all that stdout/stderr ? Hopefully it doesnt write it to the old inode (the disk might just grow in size, which eventually might get overwritten - data corrouption), or does it just throw the stdout/stderr redirection out of the window and move on ?

The reason i ask, is i dont want to stop the script and re-run it again (will loose 2-3 days of execution time)

MensaWater 05-03-2012 10:22 AM

It DOES continue to write to the same inode. You can't change the stderr/stdout after the start of the process. Do NOT delete a file that is open. It does NOT free up space - it only makes it harder to find out what is using the space later. Once you delete such a file the only way to clear the inode is to stop the process that has it open. (This of course can also be done by a reboot because that will stop all processes.)

If you are running out of space and do NOT want to stop the process then your only option is to increase the size of the filesystem.

suicidaleggroll 05-03-2012 01:59 PM

This won't help you currently, but in the future you have a couple of options. Rather than having the script write each message to stdout and then you piping that into a logfile on execution, you could have the script write each message to the logfile directly. That would allow you to move/rename the logfile as needed without having to stop the script. Alternatively, you could write a logger script that reads from stdin and writes to a rotating logfile itself. Then rather than running "script.py >> logfile", you would run "script.py | logger"

war49 05-04-2012 09:48 AM

Quote:

Originally Posted by vikggg (Post 4669451)

script.py >> log/script.log 2>&1 &

If you affraid the script.log will grow to be huge. May be you can use database software to store script.py output. Or, if you consistently using log file (script.log), i think it would be better you rotated the log file and gziped (example: /var/log/messages, /var/log/messages1.gz, /var/log/messages2.gz ..etc).

snowmobile74 05-04-2012 10:57 AM

Interesting, but what if you did this.

cp log/script.log log/script.log.2; echo "" > log/script.log


All times are GMT -5. The time now is 09:59 PM.