back up file system changes after installing a new software

pablopla · 12-19-2008, 09:41 AM

Hi,

If I'm starting from a fresh linux installation, is it possible to scan automatically scan the filesystyem after installing a new software and save only the affected files and folders?

Is is possible to copy the above following files to a fresh installation of the same distro and hardware and have the same behaviour as if installing the software.

Thanks

eco · 12-19-2008, 11:10 AM

Hi,

Can I ask why you would want to do that? If you want to setup one box and then have many with the same configuration, just take an image (use dd) of the disk and copy it to the other boxes.

I can't think of an easy way of copying files that changed... and even harder to get it working on another box just by copying them.

Any one else know of a software that can do that?

theNbomr · 12-19-2008, 11:19 AM

To find files that have been modified recently, use find -mtime -X, where 'X' is some number number of days, or find -mmin -X, where 'X' is some integer number of minutes. The output can be piped to xargs for further processing. For more details,

Code:

man find
man xargs

--- rod.

jschiwal · 12-19-2008, 12:04 PM

Some things will need to be changed if you simply clone a disk image and use it on another machine. The IP addresses that the nics use and the hostname of the computer need to be unique. Also be careful if an install script uses information unique to a host.

You may find it useful to mount some system directories from a server or the first box on the second. The Filesystem Hierarchy Standard outines which directories can be shared. This would allow you to install software packages centrally. You may need to copy some files to unshared directories such as /etc. Another advantage is that there will be less that needs to be backed up. If the /bin, /usr/bin/, /sbin/, /usr/sbin/, /lib/, /usr/lib directories are shared, only the files on the server need to be backed up.

Your package system can help. For example, for rpm, you can list the files in a package. Filter out the shared directories and you are left with a list of files that you need to copy.

Since you are interested in files that have changed, you could look into tar backups. See the info manual "5.2 Using `tar' to Perform Incremental Dumps". You can use something like "tar -C sourcedir -cf - . | tar -C targetdir -xpf -" to transfer files in bulk as well. Here is an example using ssh to connect:

Code:

tar -g logsnap -cf - logs | ssh hpmedia 'tar -xpvf -'
logs/
logs/#linuxcranks.log
logs/#linuxcranks_#linuxbasement.log
logs/#linuxcranks_#linuxcranks.log
logs/#linuxcranks_#linuxlinktechshow.log
logs/#linuxcranks_#linuxoutlaws.log
logs/#linuxcranks_#lottalinuxlinks.log
logs/#linuxcranks_#samba-technical.log
logs/#linuxcranks_#samba.log
logs/#linuxcranks_#suse.log
logs/#linuxcranks_#techshow.log
logs/#linuxcranks_#tllts.log
logs/freenode.log
logs/freenode_#linuxactionshow.log
logs/freenode_#opensuse-kde.log
logs/freenode_#openvideo.log
logs/freenode_#suse.log
logs/irc.dslextreme.com #twitlive.log
logs/irc.dslextreme.com #twitlive_#twitlive.log
logs/linuxcranks #linuxcranks.log
logs/linuxcranks #linuxcranks_#linuxcranks.log
logs/thelinuxlink.net.log
logs/thelinuxlink.net_#techshow.log
jschiwal@qosmio:~> touch logs/freenode.log
jschiwal@qosmio:~> tar -g logsnap -cf - logs | ssh hpmedia 'tar -xpvf -'
logs/
logs/freenode.log

The second time I ran it, only the altered file was sent.

Another option is using rsync.

pablopla · 12-19-2008, 02:22 PM

@jschiwal
Thank you this is very helpful.
Following the manual as you adviced seems that this is what I need:

Code:

tar --create --file=archive.1.tar -g incremental.snar /home/user

Don't understand your command:

Code:

tar -g logsnap -cf - logs | ssh hpmedia 'tar -xpvf -'

jschiwal · 12-19-2008, 07:28 PM

The first example creates an archive "archive.1.tar" of /home/user. The archive is written to the current directory where the command is run. You could use "-C <directory>" option to explicitly indicate where the working directory should be.

In the second command, instead of creating a physical file on the same computer, the archive is sent out stdout, piped to the input of the tar command running on a remote computer. The ssh command creates a secure connection to the host hpmedia. So the archive is streamed over the network and extracted to the other machine. An actual file is never created, just sent over the network.

Using the "-g logsnap" argument to the tar command on the local computer; the first time you run this command the ~/logs/ directory is fully archived. Now imagine that a file or files are edited in ~/logs/. The next time this command is run, an incremental archive is created instead and only the modified & new files are sent. Using ssh, this could be done securely over the internet.

I used the -v option in the tar command extracting the files, so I could see which files where being created on the remote computer. The "touch logs/freenode.log" command updated the timestamp. Then I reran the command to demonstrate that indeed, a file newer than the timestamp was copied to the remote machine. This is doing what you asked about in your first post, copying files from one computer to the other and later doing the same only for files that have changed.

You can tweak this command, using -C <dir>, as the examples in the tar info manual do to set where the command is executed from. So `-C /home' will start tar from the /home directory. You can have use -C on the tar command on the right hand side so that the files are extracted where you want them to be.

Some ssh details
Please note, that I am using public key authentication for ssh access. I use a passphrase, but I used ssh-agent so that I wasn't prompted for a password when I ran the command. The passphrase is requested on the client side, to unlock the private key. If you read the comments above the "UsePam Yes" line in the /etc/ssh/sshd_config file, you should have no problem configuring public key authentication. The paragraph tells you which options to disable and which to disable.

You also need to add the contents of your clients ~/.ssh/id_rsa.pub to the servers ~/.ssh/authorized_keys file.

Another tar example
In the example using tar to copy modified files, please realize that I am not creating a backup. I am using tar to transport the files that would be backed up if I were to create a backup. This next example does both at once!

Code:

tar -g policies.ts -cf - policies/ | tee /mnt/maxtor/policies-12-19-08.tar.gz | ssh hpmedia  'tar -xvp

The "everything is a file" philosophy of Unix sure comes in handy, doesnt' it. The `tee' command is like a tap. Inserting it where you have a pipe, the standard input is saved to a file (policies-12-19-08.tar.gz), and also sent out stdout. The /mnt/maxtor/ directory is a mounted Samba share on a Maxtor Central Axis networked drive.

If you were to use this in a script, the filename used will need to be changed each time it is run, to prevent over-writing the last backup. Using something like "$(date)" in the filename will make the filename unique.

Back to more ssh details
If the incremental backup/transfers are for system directories, you will need to allow root ssh access. This is another reason to only use public key authentication. Brute force script kiddie attacks won't work because you simply don't use the authentication they are trying. You can't try a key if there isn't a keyhole!

pablopla · 12-20-2008, 05:01 AM

@jschiwal
Every word you say is gold to me...

Are there files I can't archive, like files with file type 1?

If I'm not mounting folders from a server as you suggested in your first post, do I need to backup my whole file system or are the folders I should avoid (dev, mnt...)?

Are there files which are actually running process?

jschiwal · 12-20-2008, 08:07 AM

Drat, I had about 4 paragraphs written, and must have hit the wrong key combination. It went back to the twit site I was on earlier. Oh well!

Quote:

If I'm not mounting folders from a server as you suggested in your first post, do I need to backup my whole file system or are the folders I should avoid

Yes, you don't want to backup the /dev, because it is created when you boot, and is maintained by the udev daemon. The daemon creates device nodes on the fly after you insert a new device. Don't back up /proc. It is a pseudo file system. I would recommend against backing up /tmp or /var/tmp. They are just temporary files. You probably have a setting where you can delete the contents of /tmp when you shutdown.

Be careful about links. You will probably want to backup the links as links rather than follow them and back up the file they point to.
Symbolic links are the most common. They are used very frequently in /lib and /usr/lib to use a shortened name to refer to the same library. Look at `ls -l /lib/' and you can see for yourself.

You will want to backup links as links. A rare exception is a hard link. They have to refer to a file on the same filesystem. However the file it references may be on a directory that isn't being backed up. A hard link is really just directory entry to a file inode. It is the same as a normal directory entry, but refers to an inode that already exists. If you delete file (link), it won't be actually deleted until the link count is zero. I'm not certain, but when you restore a hard linked file, you may end up with two files instead. You will need to experiment. You normally won't run into this situation, unless you create a chroot jail and use hard links instead of copying the files themselves. The advantage of this would be to save space on the disk.
Be careful about permissions and ownership. You want to preserve the ownership and permissions of files. Sometimes a system file will be owned by a system user and expects to be run that way. Sometimes a file in /etc/ will have a different group. This allows a service running as a group user to use the file. Even the permissions of ~/.ssh/ and ~/.ssh/id_rsa are important. If they are world or group readable, connections will fail because your private key could have been stolen. Hint: notice the -p option in my example and the examples in the tar manual.

Don't backup the /media directory. It is where external devices are mounted.
If you mount a device or share somewhere, where you are backing up do, don't backup that mount point. This is obvious.
Most likely you don't want to backup /mnt. There could be an exception however. Suppose you add an internal drive and mount it on /mnt/podcasts. You may want to back it up. Determine for each mountpoint under /mnt/ whether it needs to be backed up or not. Don't backup the entire /mnt directory. It would probably be a good idea to create an entry in /etc/fstab for it. For a removable device use the `noauto' mount option. Having an entry in /etc/fstab will help you mount the same device on the same mount point. If you backed it up under the name /mnt/podcasts and later as /mnt/music, you would end up either backing up files you already have, or restoring will fail.

Code:

Are there files I can't archive

Other file types such as device nodes and sockets will probably be located in directories that you don't backup. Device nodes should all be in /dev/. There will be some sockets in /var/run/. They should be present if you perform your backup from a live distro. If you use Fedora which uses SE Linux, you may run into a situation where even root is denied access to a file. You will need to read up on their documentation if this is your distro. Often there is a group or system user that is allowed to read any file but can't execute every file.

If a file has a lock, you may not be able to read it. It is normal to have some files not be backed up because either a lock prevented it or more likely it changed before the backup finished. This is why the best backups are performed by booting up with a live distro so the filesystems aren't live.

Good Luck!

p.s.
By the way, I'm having problems with the touchpad on my new laptop, and had this post scrambled a few times. I hope I was able to correct everything. Maybe I shouldn't have had those two beers.