LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-17-2011, 02:59 PM   #1
anon091
Senior Member
 
Registered: Jun 2009
Posts: 1,795

Rep: Reputation: 49
how to scp files only after they've been there for X seconds


Hi everybody. I have a program that outputs files to a certain directory, lets say /data/output

only problem is i need to copy those files to another server. What i'm hoping to do is scp them to that other server, then move them out of /data/output to /data/SentToOtherServer after they are scp'd. the only problem is that the files are huge so take a long time to write into /data/output and i dont want it grabbing the files before they are fully written. is there something i can do timestamp-wise so they aren't scp'd unless a timestamp on that file is older than say 30 seconds which would mean its finished writing?
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 08-17-2011, 04:30 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
How about instead using inotify watching files for "close_write" event?
 
Old 08-17-2011, 04:38 PM   #3
anon091
Senior Member
 
Registered: Jun 2009
Posts: 1,795

Original Poster
Rep: Reputation: 49
I'm not familiar with inotify, never even heard of it until I read your post. there is a man page for it on my server, but I cant say i really understand how to use it.
 
Old 08-18-2011, 12:25 AM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
While you have asked no question try 'inotifywait --monitor --recursive --quiet --csv --event close_write /data/output' as an example.
 
Old 08-18-2011, 11:15 AM   #5
anon091
Senior Member
 
Registered: Jun 2009
Posts: 1,795

Original Poster
Rep: Reputation: 49
OK, looking at the man for inotify I kinda get what that would do. But how do i tie these commands into a .sh that would scp the file(s) then move them to another folder? I can write the scp command and the mv command, but dont know how to tie it all together with not grabbing files before they are written. could anyone provide an example?
 
Old 08-18-2011, 11:34 AM   #6
anon091
Senior Member
 
Registered: Jun 2009
Posts: 1,795

Original Poster
Rep: Reputation: 49
or is there something like cmin that works for seconds where you can use that to do a find with?
 
Old 08-18-2011, 04:21 PM   #7
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by rjo98 View Post
OK, looking at the man for inotify I kinda get what that would do. But how do i tie these commands into a .sh that would scp the file(s) then move them to another folder? I can write the scp command and the mv command, but dont know how to tie it all together with not grabbing files before they are written. could anyone provide an example?
If you use inotifywait to detect close_write, you get the event after the writer has closed the file. There is no need for any additional waiting; the file has been closed (for writing) already, and should therefore be ready for copying.

inofitywait is part of the inotify-tools package, and has its own man page (after you install the package). The inotify man pages describe the API, whereas inotifywait is a shell command.

Try running this in the /data/output directory:
Code:
inotifywait -mrq -e close_write --format '%w%f' . | xargs -I COMPLETEDFILE scp COMPLETEDFILE user@remote:path/COMPLETEDFILE
It will transfer just one file at a time, but keeps the directory structure intact. If you want (unlimited) parallel SCP's, you could use
Code:
inotifywait -mrq -e close_write --format '%w%f' . | while read FILE ; do ( scp "$FILE" "user@remote:path/$FILE" & ) ; done
For a reliable service, you do need something a bit more complex. At startup, I'd check the names, sizes, and SHA1SUMs of all local files, and compare them to remote ones. You will need to buffer the inotifywait output somehow, to make sure you won't miss any events; it has a limited-size buffer, and will discard events if you don't process them fast enough.

You might wish to take a look at the incron package.
 
Old 08-18-2011, 04:43 PM   #8
anon091
Senior Member
 
Registered: Jun 2009
Posts: 1,795

Original Poster
Rep: Reputation: 49
Thanks for some examples Nominal. Combining commands always confuses me. couple questions though, if i would be doing this from a .sh file (i'm assuming) how do i force it to run in the /data/output directory all the time? I also need to move the file to another folder once its been SCP'd to that other server, but i dont think something like that is in the example, or is it?

Appreciate the example and help.
 
Old 08-18-2011, 09:40 PM   #9
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by rjo98 View Post
Thanks for some examples Nominal. Combining commands always confuses me. couple questions though, if i would be doing this from a .sh file (i'm assuming) how do i force it to run in the /data/output directory all the time? I also need to move the file to another folder once its been SCP'd to that other server, but i dont think something like that is in the example, or is it?
In a shell script, use cd to change the current working directory, like you always do. It is process-specific (working directory is private to each process), so changing the process directory in one process does not change it in any other process.

Consider the following Bash script.
Code:
#!/bin/bash

# Directory watched for completed files.
# Subdirectories are not watched.
INCOMING=/data/output

# SCP target for files.
# Note: all files end up in this same directory.
# Password authentication will not work, you need
# to set up authentication keys.
REMOTE=user@remote:/directory/

# Directory scp'd files are moved to.
# Note: all files end up in this same directory.
COMPLETED=/data/sent

# Extra SCP options. Use blowfish, only try 5 secs to connect.
SCPOPTS=(-c blowfish -o ConnectTimeout=5)

# Paths are relative to INCOMING directory.
cd "$INCOMING" || exit $?

# Wait for completed files in the INCOMING directory,
inotifywait -mq -e close_write --format '%f' . | while read FILE ; do

        # Only consider normal files.
        [ -f "$FILE" ] || continue

        # Try to transfer the file(s) using SCP.
        if ! scp "${SCPOPTS[@]}" "$FILE" "$REMOTE" ; then
                printf '%s scp-failure %s\n' "$(date '+%Y-%m-%d %T %z')" "$FILE"
                continue
        fi

        # SCP was successful. Move the file. May overwrite an old one.
        if ! mv -f "$FILE" "$COMPLETED" ; then
                printf '%s mv-failure %s\n' "$(date '+%Y-%m-%d %T %z')" "$FILE"
                continue
        fi

        # Success.
        printf '%s success %s\n' "$(date '+%Y-%m-%d %T %z')" "$FILE"
done
It will output a list of files (closed after being open for writing). The first (three) fields will contain the date, time, and timezone (numeric). The fourth field will contain 'success', 'scp-failure', or 'mv-failure'. The fifth field will contain the file name.

The script will never exit by itself; you need to kill it via e.g.
Code:
kill -HUP $(ps -C inotifywait -o pid=)
but if you have more than one running, that will kill all of them.

It is quite possible to extend the above around some job or script, so that close_write events are only watched while the other job/script runs, and afterwards everything is cleaned up -- including scp'ing and copying any files the monitoring might have missed. That will make the script even more complicated, though. You should also consider what to do with errors, for example if you run out of disk space. Should you just output the error, or should you send an e-mail message? Note that inotifytools package is not installed by default on most Linux distributions. If you are a Linux cluster user, first contact your cluster admins to ask if inotifytools is installed, and if/which command-line utility you can use to send mail from compute nodes. E-mail is not always possible from compute nodes, or may only be possible via a specific command-line client, e.g. /bin/sendmail.
 
Old 08-19-2011, 09:34 AM   #10
anon091
Senior Member
 
Registered: Jun 2009
Posts: 1,795

Original Poster
Rep: Reputation: 49
wow, that pretty intense, and impressive! I never would have figured any of that out haha. do those printf's just put stuff up on the screen? I'm guess I already have the inotifytools installed because i was able to pull up man pages for the stuff.

So since i would cron this, I really dont need the printf's if they just write to the screen, since nobody would see them as this would run constantly in the background?

Guess there's a lot more to think about then what i posted simply in my original post!!
 
Old 08-19-2011, 09:42 AM   #11
anon091
Senior Member
 
Registered: Jun 2009
Posts: 1,795

Original Poster
Rep: Reputation: 49
My original line of thought was to somehow do a find like

find /data/output/* -type f -cmin +1

even though i'd really like to do it like right after the file is closed or a few seconds after, kinda like what this inotify stuff does. then do the scp command, then move the file to /data/sent.

I guess that's kinda simplistic and doesnt account for errors, and its very glamourous. Plus i have no idea how to combine then all to work right. Just figured i'd give more background.
 
Old 08-20-2011, 11:12 AM   #12
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by rjo98 View Post
do those printf's just put stuff up on the screen?
Yes, they are there only as informative output; you can just as well remove it altogether.

Quote:
Originally Posted by rjo98 View Post
So since i would cron this
Well, I wouldn't. Just remove the printfs, and let it run all the time. The script does not use that much RAM, and it only uses CPU time when something happens. (It does not busy-wait; it blocks/sleeps on waiting for input when there is nothing to do.)

You might add another script to cron, to do the same for files that have not been modified in the last N minutes (say, a few hours), so that you "catch" anything the monitoring missed, or could not transfer for some reason. Basically,
Code:
#!/bin/bash
cd /data/output || exit $?
find . -maxdepth 1 -type f -mmin +N -print0 | while read -d "" FILE ; do
        scp "$FILE" "user@remote:path/" || continue
        mv -f "$FILE" /data/completed
done
Note that it may be necessary to add a running flag (say /var/run/scp-backup.pid containing the PID of the running process), and check if another copy of the same script is running (still alive), if the transfers may take longer than your cron interval is. Otherwise cron may start another copy of the script while the old one is still running.

My personal approach to issues like this is much more careful than most. I tend to assume problems will occur, and try to handle them in an useful manner. Your initial idea might work well for you, without any issues, if you happen to select a large enough age limit. My environments tend to vary too much for a simple age limit to work reliably, so I've had to find more reliable methods. They are obviously a bit more complex, but I think their robustness more than makes up for the added complexity.
 
Old 08-21-2011, 07:28 PM   #13
anon091
Senior Member
 
Registered: Jun 2009
Posts: 1,795

Original Poster
Rep: Reputation: 49
Thanks for replying Nominal. I agree you're approach is probably better suited than my very basic idea from the get go, was just posting that to show my thought process.
I'm not sure how to let it run all the time in the background though. Also, if the server is restarted, would however you set that up automatically restart it as well so it would start doing the process?
I'm also confused by your "note that it..." as i'm not sure how you use a pid file and i thought you said not to cron it (even though I dont know how to make it run all the time like you said".

Sorry for all these questions, but i appreciate you answering them all.
 
Old 08-22-2011, 01:53 PM   #14
anon091
Senior Member
 
Registered: Jun 2009
Posts: 1,795

Original Poster
Rep: Reputation: 49
I'm just afraid this approach may be too far over my head, and i wouldnt be able to support it. but maybe after you answer those questions i'll understand it better. thanks again.
 
Old 08-22-2011, 01:58 PM   #15
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
Here is a simple solution...

Quote:
Originally Posted by rjo98 View Post
Hi everybody. I have a program that outputs files to a certain directory, lets say /data/output

only problem is i need to copy those files to another server. What i'm hoping to do is scp them to that other server, then move them out of /data/output to /data/SentToOtherServer after they are scp'd. the only problem is that the files are huge so take a long time to write into /data/output and i dont want it grabbing the files before they are fully written. is there something i can do timestamp-wise so they aren't scp'd unless a timestamp on that file is older than say 30 seconds which would mean its finished writing?
Well,.. if you have a script that outputs the files, you could always use the && operator... which will run a command *only* after the previous one completes. So:

Code:
./write_files_to_directory.sh && scp * 192.168.1.1:/tmp/
Would write the files to the directory,.. then when complete, SCP them to 192.168.1.1:/tmp/
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Howto find files with ctime older than X seconds? spangberg Linux - General 2 06-05-2013 05:23 AM
convert total no of seconds in the format hour minutes and seconds suchi_s Programming 15 03-15-2011 11:34 AM
[SOLVED] How do I scp all files except for ... ? ccornchip Linux - Newbie 4 03-07-2011 04:28 AM
What happens if device files disappear for a few seconds? larold Linux - Kernel 1 01-26-2010 11:13 AM
hdparm power save, 15 seconds of spin, 5 seconds of rest... Romanus81 Linux - Laptop and Netbook 1 01-01-2009 05:24 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 08:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration