LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Incremental tar problem (http://www.linuxquestions.org/questions/linux-newbie-8/incremental-tar-problem-784049/)

johncc 01-22-2010 11:21 AM

Incremental tar problem
 
I'm having real trouble trying to do an incremental tar. My understanding is that I keep using the same list file, and the first time it should get created, after which it should create smaller tars with only new changes.

My script is as follows:-

#!/bin/bash
set -e
set -u

shopt -s extglob

src_dir=/mnt/storage/Public
working_dir=/mnt/storage/Backups/tmp
user=johncc
dest_addr=192.168.1.2
dest_dir=/shares/internal/PUBLIC/Backup

basename=Public-$(date +%Y-%W)
listname=$basename.snar
destname=$basename-$(date +%w).tar.bz2

echo "Basename: $basename"
echo "Listname: ${working_dir}${listname}"
echo "Destname: ${working_dir}${destname}"

if [[ -f "${working_dir}/${listname}" ]]; then
echo "List file exists."
else
echo "List file does NOT exist."
fi

if [[ -r "${working_dir}/${listname}" ]]; then
echo "List file is readable."
else
echo "List file is NOT readable."
fi

tar --verbose --create --ignore-failed-read --listed-incremental="${working_dir}/${listname}" --bzip2 \
--file "${working_dir}/${destname}" "$src_dir" && \
scp "${working_dir}/${basename}"* ${user}@${dest_addr}:${dest_dir} && \
( rm -f "${working_dir}/${basename}"*.tar.bz2 ; rm -f "${working_dir}/!(${basename}).snar" )

What happens in practice is that although the list file exists and nothing has changed, it still creates a full backup. The output of tar is not very helpful. I modded the script to echo the tar command, and this is what I get:-

Basename: Public-2010-03
Listname: /mnt/storage/Backups/tmpPublic-2010-03.snar
Destname: /mnt/storage/Backups/tmpPublic-2010-03-5.tar.bz2
List file exists.
List file is readable.
tar --verbose --create --ignore-failed-read --listed-incremental="/mnt/storage/Backups/tmp/Public-2010-03.snar" --bzip2 --file "/mnt/storage/Backups/tmp/Public-2010-03-5.tar.bz2" "/mnt/storage/Public"

Can anyone see what I'm doing wrong here?

Thanks!

jschiwal 01-22-2010 11:54 AM

Your script is creating a new name for the .snar file which contains the date. For a differential backup, you create a working copy of the .snar file created from the first full backup. Then all files modifies since the full backup will be archived. This allows you to restore by restoring the full backup and the last differential backup.

If you reuse the same snar file, it will be updated after each dump so you will be creating an incremental backup. The incremental backups will only backup files since the last incremental backup.

Section 5.2 of the tar info manual is where this is explained.
---
FYI, I downloaded the tar package source and run "./configure && make pdf" to create a more readable book version of the info manual for tar. Some things like backups are important enough where I will print out the documentation. I find I can learn and retain the info better. (Maybe I'm showing my age)

johncc 01-22-2010 04:05 PM

Quote:

Originally Posted by jschiwal (Post 3836730)
Your script is creating a new name for the .snar file which contains the date.

Thanks for the reply. That's not quite correct, I believe. The .snar file is named with the year and the week number. The tar.bz2 file is named with the year, week number and week day (number). If you look at the output in my original post, the files are Public-2010-03.snar and Public-2010-03-5.tar.bz2, The idea is that the .snar file should keep the same name for a week, meaning that I should get one full backup per week and 6 incremental daily backups.

As you can see, I put the check in to echo if the .snar file exists and is readable when the script runs, and sure enough it does.

jschiwal 01-22-2010 07:42 PM

OK. Sorry I missed that.

This backed files up to a locally mounted external drive and replicated files on a remote location. I did test that only newer files were backed up during an incremental backup. You could do something like:
tar

You did test that the .snar file was present and readable. But you didn't test that it was not empty. Also the script is deleting the snar file in the working directory. Why? It is using the local directory for the snar file in the tar command. I don't trust what is happening with the snar file, or that the snar file is valid when you run the tar command.

I have in the past used something like:
tar -C <base_dir> --create --listed=incremental=<snar file> -f - <dir list> | tee /mnt/backups/<backup_name> | ssh user@host tar -C <remote_base_dir> -xvf - >logfile

You could do something like:
tar --verbose --create --ignore-failed-read --listed-incremental="${working_dir}/${listname}" --bzip2 \
--file - "$src_dir" |
ssh ${user}@${dest_addr} cat >${dest_dir}/${destname}

This way you don't need to have space available to create a temporary archive.

In any event, I would keep the .snar file available locally and not remove it.

jschiwal 01-22-2010 07:55 PM

Sorry I didn't look close enough at that.

However,

( rm -f "${working_dir}/${basename}"*.tar.bz2 ; rm -f "${working_dir}/!(${basename}).snar" )

the snar file is getting deleted after the files are copied to the destination. The .snar file is modified when you run tar. You need to reuse the modified .snar file when performing the next incremental backup.

Given the line above, I don't trust the results of the tests. They test for the existence of the .snar file, and its readability, but it could be empty or not updated.

:D
"Don't touch that, you don't know where it's been."


---

You might consider running tar like:
tar -C $base_dir --create --bz2 --listed-incremental=<snar_file> - <dir_list> | ssh <user@host> cat > ${remote_dir}/{backup_name}

This will pipe the file to the destination instead of creating a temporary backup locally and copying the files to the destination.
Otherwise you need at least the size of a full backup free locally.

In any case, I wouldn't delete the *.snar files locally.

johncc 01-24-2010 05:49 PM

Hi Jschiwal, and thanks for all the time you put into this. I've removed all the quoting around file names in the script because I don't actually need it in this case (no spaces in the paths) and because I wanted to see if it was causing a problem.

Actually the command you pointed out had a bug in (because of the quotes), but one which meant it was doing nothing at all.

It should have read:-

rm -f ${working_dir}/!(${basename}).snar

Quoting breaks the globbing character, turned on by the "shopt -s extglob" line. The above line deletes any snar *apart* from the current snar file. In other words, it cleans up old ones. It seems to work as intended now.

The tar still behaves as before though, creating a full backup each time.

jschiwal 01-24-2010 06:48 PM

Only two things I can think of that could cause a full backup every time. First is if there isn't a snar file. Second is if somehow the timestamps are being updated.

You can check the latter by logging tar with the -vv option and redirecting the output.
I would try commenting out the line that deletes the snar files just to make sure.

---

I tested out using !(pattern) manually and in a script:
Code:

cat testscr
shopt -s extglob
ls bash-4.0/!(${basename}).pdf
set bash-4.0/!(${basename}).pdf
echo $*
jschiwal@qosmio:~/Documents> ls bash-4.0/!(${basename}).pdf
bash-4.0/bash.pdf  bash-4.0/bashref.pdf  bash-4.0/rose94.pdf
jschiwal@qosmio:~/Documents> cat testscr
shopt -s extglob
ls bash-4.0/!(${basename}).pdf
set bash-4.0/!(${basename}).pdf
echo $*
jschiwal@qosmio:~/Documents> ./testscr
bash-4.0/article.pdf  bash-4.0/bash.pdf  bash-4.0/bashref.pdf  bash-4.0/rose94.pdf
bash-4.0/article.pdf bash-4.0/bash.pdf bash-4.0/bashref.pdf bash-4.0/rose94.pdf

enabling extglob and entering "ls bash-4.0/!(${basename}).pdf" excludes article.pdf.
Inside a script it includes it.

This same behavior would delete your current snar file in your script, causing a full backup every time.

I haven't looked into it further. Is extglob for interactive shells?

jschiwal 01-24-2010 07:13 PM

Correction:

I hadn't defined "basename" in the script. It does work after making the correction.


All times are GMT -5. The time now is 06:17 AM.