how to scp files only after they've been there for X seconds
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
how to scp files only after they've been there for X seconds
Hi everybody. I have a program that outputs files to a certain directory, lets say /data/output
only problem is i need to copy those files to another server. What i'm hoping to do is scp them to that other server, then move them out of /data/output to /data/SentToOtherServer after they are scp'd. the only problem is that the files are huge so take a long time to write into /data/output and i dont want it grabbing the files before they are fully written. is there something i can do timestamp-wise so they aren't scp'd unless a timestamp on that file is older than say 30 seconds which would mean its finished writing?
Click here to see the post LQ members have rated as the most helpful post in this thread.
I'm not familiar with inotify, never even heard of it until I read your post. there is a man page for it on my server, but I cant say i really understand how to use it.
OK, looking at the man for inotify I kinda get what that would do. But how do i tie these commands into a .sh that would scp the file(s) then move them to another folder? I can write the scp command and the mv command, but dont know how to tie it all together with not grabbing files before they are written. could anyone provide an example?
OK, looking at the man for inotify I kinda get what that would do. But how do i tie these commands into a .sh that would scp the file(s) then move them to another folder? I can write the scp command and the mv command, but dont know how to tie it all together with not grabbing files before they are written. could anyone provide an example?
If you use inotifywait to detect close_write, you get the event after the writer has closed the file. There is no need for any additional waiting; the file has been closed (for writing) already, and should therefore be ready for copying.
inofitywait is part of the inotify-tools package, and has its own man page (after you install the package). The inotify man pages describe the API, whereas inotifywait is a shell command.
For a reliable service, you do need something a bit more complex. At startup, I'd check the names, sizes, and SHA1SUMs of all local files, and compare them to remote ones. You will need to buffer the inotifywait output somehow, to make sure you won't miss any events; it has a limited-size buffer, and will discard events if you don't process them fast enough.
You might wish to take a look at the incron package.
Thanks for some examples Nominal. Combining commands always confuses me. couple questions though, if i would be doing this from a .sh file (i'm assuming) how do i force it to run in the /data/output directory all the time? I also need to move the file to another folder once its been SCP'd to that other server, but i dont think something like that is in the example, or is it?
Thanks for some examples Nominal. Combining commands always confuses me. couple questions though, if i would be doing this from a .sh file (i'm assuming) how do i force it to run in the /data/output directory all the time? I also need to move the file to another folder once its been SCP'd to that other server, but i dont think something like that is in the example, or is it?
In a shell script, use cd to change the current working directory, like you always do. It is process-specific (working directory is private to each process), so changing the process directory in one process does not change it in any other process.
Consider the following Bash script.
Code:
#!/bin/bash
# Directory watched for completed files.
# Subdirectories are not watched.
INCOMING=/data/output
# SCP target for files.
# Note: all files end up in this same directory.
# Password authentication will not work, you need
# to set up authentication keys.
REMOTE=user@remote:/directory/
# Directory scp'd files are moved to.
# Note: all files end up in this same directory.
COMPLETED=/data/sent
# Extra SCP options. Use blowfish, only try 5 secs to connect.
SCPOPTS=(-c blowfish -o ConnectTimeout=5)
# Paths are relative to INCOMING directory.
cd "$INCOMING" || exit $?
# Wait for completed files in the INCOMING directory,
inotifywait -mq -e close_write --format '%f' . | while read FILE ; do
# Only consider normal files.
[ -f "$FILE" ] || continue
# Try to transfer the file(s) using SCP.
if ! scp "${SCPOPTS[@]}" "$FILE" "$REMOTE" ; then
printf '%s scp-failure %s\n' "$(date '+%Y-%m-%d %T %z')" "$FILE"
continue
fi
# SCP was successful. Move the file. May overwrite an old one.
if ! mv -f "$FILE" "$COMPLETED" ; then
printf '%s mv-failure %s\n' "$(date '+%Y-%m-%d %T %z')" "$FILE"
continue
fi
# Success.
printf '%s success %s\n' "$(date '+%Y-%m-%d %T %z')" "$FILE"
done
It will output a list of files (closed after being open for writing). The first (three) fields will contain the date, time, and timezone (numeric). The fourth field will contain 'success', 'scp-failure', or 'mv-failure'. The fifth field will contain the file name.
The script will never exit by itself; you need to kill it via e.g.
Code:
kill -HUP $(ps -C inotifywait -o pid=)
but if you have more than one running, that will kill all of them.
It is quite possible to extend the above around some job or script, so that close_write events are only watched while the other job/script runs, and afterwards everything is cleaned up -- including scp'ing and copying any files the monitoring might have missed. That will make the script even more complicated, though. You should also consider what to do with errors, for example if you run out of disk space. Should you just output the error, or should you send an e-mail message? Note that inotifytools package is not installed by default on most Linux distributions. If you are a Linux cluster user, first contact your cluster admins to ask if inotifytools is installed, and if/which command-line utility you can use to send mail from compute nodes. E-mail is not always possible from compute nodes, or may only be possible via a specific command-line client, e.g. /bin/sendmail.
wow, that pretty intense, and impressive! I never would have figured any of that out haha. do those printf's just put stuff up on the screen? I'm guess I already have the inotifytools installed because i was able to pull up man pages for the stuff.
So since i would cron this, I really dont need the printf's if they just write to the screen, since nobody would see them as this would run constantly in the background?
Guess there's a lot more to think about then what i posted simply in my original post!!
My original line of thought was to somehow do a find like
find /data/output/* -type f -cmin +1
even though i'd really like to do it like right after the file is closed or a few seconds after, kinda like what this inotify stuff does. then do the scp command, then move the file to /data/sent.
I guess that's kinda simplistic and doesnt account for errors, and its very glamourous. Plus i have no idea how to combine then all to work right. Just figured i'd give more background.
do those printf's just put stuff up on the screen?
Yes, they are there only as informative output; you can just as well remove it altogether.
Quote:
Originally Posted by rjo98
So since i would cron this
Well, I wouldn't. Just remove the printfs, and let it run all the time. The script does not use that much RAM, and it only uses CPU time when something happens. (It does not busy-wait; it blocks/sleeps on waiting for input when there is nothing to do.)
You might add another script to cron, to do the same for files that have not been modified in the last N minutes (say, a few hours), so that you "catch" anything the monitoring missed, or could not transfer for some reason. Basically,
Code:
#!/bin/bash
cd /data/output || exit $?
find . -maxdepth 1 -type f -mmin +N -print0 | while read -d "" FILE ; do
scp "$FILE" "user@remote:path/" || continue
mv -f "$FILE" /data/completed
done
Note that it may be necessary to add a running flag (say /var/run/scp-backup.pid containing the PID of the running process), and check if another copy of the same script is running (still alive), if the transfers may take longer than your cron interval is. Otherwise cron may start another copy of the script while the old one is still running.
My personal approach to issues like this is much more careful than most. I tend to assume problems will occur, and try to handle them in an useful manner. Your initial idea might work well for you, without any issues, if you happen to select a large enough age limit. My environments tend to vary too much for a simple age limit to work reliably, so I've had to find more reliable methods. They are obviously a bit more complex, but I think their robustness more than makes up for the added complexity.
Thanks for replying Nominal. I agree you're approach is probably better suited than my very basic idea from the get go, was just posting that to show my thought process.
I'm not sure how to let it run all the time in the background though. Also, if the server is restarted, would however you set that up automatically restart it as well so it would start doing the process?
I'm also confused by your "note that it..." as i'm not sure how you use a pid file and i thought you said not to cron it (even though I dont know how to make it run all the time like you said".
Sorry for all these questions, but i appreciate you answering them all.
I'm just afraid this approach may be too far over my head, and i wouldnt be able to support it. but maybe after you answer those questions i'll understand it better. thanks again.
Hi everybody. I have a program that outputs files to a certain directory, lets say /data/output
only problem is i need to copy those files to another server. What i'm hoping to do is scp them to that other server, then move them out of /data/output to /data/SentToOtherServer after they are scp'd. the only problem is that the files are huge so take a long time to write into /data/output and i dont want it grabbing the files before they are fully written. is there something i can do timestamp-wise so they aren't scp'd unless a timestamp on that file is older than say 30 seconds which would mean its finished writing?
Well,.. if you have a script that outputs the files, you could always use the && operator... which will run a command *only* after the previous one completes. So:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.