LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   File Replication Routine (http://www.linuxquestions.org/questions/programming-9/file-replication-routine-865775/)

ppostma1 03-01-2011 12:18 PM

File Replication Routine
 
I have been scripting Inotify to work as file replication.
http://github.com/rvoicilas/inotify-tools/wiki

It works very well for casual use when editing or uploading files. But I have to make sure it stands up against a "tar bomb" when we install a new package. EG: tar -xzf really_big_php_project.tar.gz

Code:

#!/bin/sh

# get the current path
CURPATH=`pwd`
#not used:
CREATE="CREATE"
ISDIR="ISDIR"
DELETE="DELETE"

inotifywait -mr --timefmt '%d/%m/%y %H:%M' --format '%T %w %f %e' \
 -e close_write,delete,create /root/public_html/ | while read date time dir file evnt; do

if [ "$file" == "4913" ]; then
  continue
fi

  FILECHANGE=${dir}${file}
#if [ -f $FILECHANGE ]; then
#  echo "$FILECHANGE not found"
#  continue
#fi

  # convert absolute path to relative
  FILECHANGEREL=`echo "$FILECHANGE" | sed 's_'$CURPATH'/__'`

  var=$(echo $evnt | awk -F"," '{print $1,$2}')
  set -- $var

  # echo "${evnt} ($1 $2) was done on $FILECHANGEREL - ${file} in ${dir}"

  if [ "$1" == "DELETE" ]; then
      if [ "$2" == "ISDIR" ]; then
            echo "rmdir -f ${FILECHANGE}"
          ssh root@web2.home "rmdir ${FILECHANGE}"
      else
            echo "rm -f ${FILECHANGE}"
          ssh root@web2.home "rm -f ${FILECHANGE}"
      fi
  else
      if [ "$1" == "CLOSE_WRITE" ]; then
          if [ -f $FILECHANGE ]; then
            echo "rsync ${FILECHANGE}"
            rsync --relative -vrae 'ssh -p 22'  $FILECHANGEREL root@web2.home:/root/

            echo "checking: -d: $date -t: $time -dir: $dir -f: $file -ev: $evnt -ch: $FILECHANGE -rel: $FILECHANGEREL"
          fi
      else
          if [ "$1" == "CREATE" ]; then
              if [ "$2" == "ISDIR" ]; then
                    echo "mkdir ${FILECHANGE}"
                  ssh root@web2.home "mkdir ${FILECHANGE}"
              fi
          fi
      fi
  fi

echo ""

done

Obviously, it has to complete the current file transfer before it blocks for and waits for the next file event. So out of 200 files only 9 get transferred. (Yes, I know rsync will do dirs and deletes but the direct command is more efficient)

So I switched to a threaded approach:
Code:

#!/bin/sh

# get the current path
CURPATH=`pwd`

inotifywait -mr --timefmt '%d/%m/%y %H:%M' --format '%T %w %f %e' \
 -e close_write,delete,create /home/rbcmin/public_html/ | while read date time dir file evnt; do

./handle.bs $CURPATH $date $time $dir $file $evnt &

done

Which returns to the block/wait much quicker and transferred 120 of the 200 files, but was considerably hard on the network. Slowing our web access for a moment.

I'm not sure how to get a script to catch ALL files, Queue them, and transfer one by one non-threaded to be gentle on the network. Something to the idea of:
inotifywait -mr --timefmt '%d/%m/%y %H:%M' --format '%T %w %f %e' \
-e close_write,delete,create /root/public_html/ > file > while read date

jmajor 03-01-2011 06:54 PM

Your solution might be to write a small C program which has thread1 listening for changes and making a list, and thread2 picking up the list as often as it can and firing off an rsync. This would let you use the rsync bandwidth limit feature. Thread1 would not miss events. You would not kill the cpu / ram / swap usage with one thread per file.

In the tar bomb example thread2 would pick up a few as it starts, then when they were done would return to find a big batch waiting and go give rsync a good size batch.

ppostma1 02-03-2012 04:00 PM

python fix:
 
This is my basic but fully functional prototype:

import os
from pyinotify import WatchManager
from pyinotify import Notifier
from pyinotify import ProcessEvent
from pyinotify import EventsCodes

wm = WatchManager()

# print EventsCodes.ALL_FLAGS
# print "begin"
# print getattr(EventsCodes(), '__dict__')
# print dir(EventsCodes)
# print "hello"
# print EventsCodes.ALL_FLAGS['IN_CLOSE_WRITE']


mask = EventsCodes.IN_CLOSE_WRITE | EventsCodes.IN_CREATE | EventsCodes.ALL_FLAGS['IN_MOVE_SELF'] | EventsCodes.ALL_FLAGS['IN_MOVED_FROM'] | EventsCodes.ALL_FLAGS['IN_MOVED_TO'] # watched events
# mask = EventsCodes.ALL_FLAGS['IN_CLOSE_WRITE'] | EventsCodes.ALL_FLAGS['IN_CREATE'] # watched events


class PTmp(ProcessEvent):
def process_IN_CLOSE_WRITE(self, event):
print "CloseWrite: %s" % os.path.join(event.path, event.name)

def process_IN_CREATE(self, event):
print "Create: %s" % os.path.join(event.path, event.name)
print dir(event)
print 'dir', event.dir
print 'mask', event.mask
print 'maskname', event.maskname
print 'name', event.name
print 'path', event.path
print 'pathname', event.path
print ''

def process_IN_MOVE_SELF(self, event):
print "move self: %s" % os.path.join(event.path, event.name)
print dir(event)
print 'dir', event.dir
print 'name', event.name
print 'path', event.path
print 'pathname', event.path
print ''

def process_IN_MOVED_FROM(self, event):
print "movefrom: %s" % os.path.join(event.path, event.name)
print dir(event)
print 'dir', event.dir
print 'name', event.name
print 'path', event.path
print 'pathname', event.path
print ''

def process_IN_MOVED_TO(self, event):
print "moveto: %s" % os.path.join(event.path, event.name)
print dir(event)
print 'dir', event.dir
print 'name', event.name
print 'path', event.path
print 'pathname', event.path
print ''

notifier = Notifier(wm, PTmp())

wdd = wm.add_watch('/path/x', mask, rec=True, auto_add=True)


while True: # loop forever
try:
# process the queue of events as explained above
notifier.process_events()
if notifier.check_events():
# read notified events and enqeue them
notifier.read_events()
# you can do some tasks here...
except KeyboardInterrupt:
# destroy the inotify's instance on this interrupt (stop monitoring)
notifier.stop()
break


It prints out what need to be done as a debug/test.
I was able to pipe the output to a parser that executed the commands.
One cavet is how linux handles files by default (for live environments. giving the command tar -xzf x.tar.gz results in file big.mp3 being extracted but it:
creates file tmp.file
fills file tmp.file
mv tmp.file big.mp3

so a full capture of directory and file events reports a create and change/write to file tmp.file that when the parse/execute-sync checks for, does not exist. My handling of the problem was if the file does not exist, continue.
If it runs multiple changes on the same file before the parse/execute-sync is called, the first found change in the queue causes a sync of the final file, and then the subsequent sync calls return "up to date".

If one wants to optimize it I would say search ahead in the queue of events and if that file is changed again, call continue, so only the last event on a file is handled.


also note, depending on your python version, there will be a screw up of object variables. In some versions, a class setting a class level variable sets both the 'external' and 'internal' variable/value. In later versions only the internal variable is set so calling the static class variable (this.IN_CLOSE_WRITE, EventsCodes.IN_CLOSE_WRITE) from a the same file the class is declared in results in the expected value However calling the class variable (EventsCodes.IN_CLOSE_WRITE) from another file results an empty value. I modified the inotify python code (which has 2 files) when it sets the class variable defined in the class files (which becomes internal) to also set the public value from the other file to make all values act the same:
class EventsCodes:
FLAG_COLLECTIONS = {'OP_FLAGS': {
'IN_ACCESS' : 0x00000001,

...

EventsCodes.IN_ACCESS = 0x00000001


(I'm no python expert, nor do I want to be, but I tried the obvious EventCodes.FLAG_COLLECTION.IN_ACCESS, EventCodes.FLAG_COLLECTIONS{IN_ACCESS} and every permutation recommended to me. variable dumps from within the class revealed everything was set, and from outside of the class on the class or running object reported no such values attached)


All times are GMT -5. The time now is 05:55 AM.