Which rsync parameters to backup huge .tar.gz file?
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Which rsync parameters to backup huge .tar.gz file?
Hello,
my virtualization host server is producing an monthly "snapshot" of the OpenVZ VPS. The file is like vzdump1234.tar.gz and is around 250GB in size, this archive contents changing, i guess like 10% of the VPS files are new/modified.
I want to transfer this archive monthly to remote server. Which rsync parameters are clever to be used in order
a) least data transfer is used
b) local and remote server space is very limited and storing double size of the archive might be issue.
c) remote server is low CPU VPS
d) both server's HDDs are quite a bottleneck of the overall performance
Hello,
my virtualization host server is producing an monthly "snapshot" of the OpenVZ VPS. The file is like vzdump1234.tar.gz and is around 250GB in size, this archive contents changing, i guess like 10% of the VPS files are new/modified.
I want to transfer this archive monthly to remote server. Which rsync parameters are clever to be used in order
a) least data transfer is used
b) local and remote server space is very limited and storing double size of the archive might be issue.
c) remote server is low CPU VPS
d) both server's HDDs are quite a bottleneck of the overall performance
Again, as with many of your previous threads, you provide next to no useful details, which would let anyone help you. You omit the bandwidth between the machines, why you want to do this vs. backing the file up locally on the machine where the snapshot was taken, etc. To go through your post:
a) least data transfer is used - You say you have to move a 250GB file..that is the 'least data transfer' you can use, to move that amount of data. rsync can't make the file smaller.
b) local and remote server space is very limited and storing double size of the archive might be issue. - So if you can't store it, it's pointless to transfer it.
c) remote server is low CPU VPS - Meaningless in this context. Either you have the system resources to do the job or you live with it being slow. If you want to do it, you have those two choices.
d) both server's HDDs are quite a bottleneck of the overall performance - Again, meaningless in this context. How, exactly, do you think any piece of software can magically make your HDD's faster?
Transferring a compressed file will eliminate the benefit of rsync's delta transfer. Upon reaching the first changed byte, the entire remainder of the file will be different due to the compression, and thus will be transmitted in its entirety. At least rsync is intelligent enough to see the ".gz" extension and not attempt re-compression of the already compressed file.
Maybe mount the final file like mounting an iso image and then using the directory structure let some program only move the differences. Unison or rsync.
I think that then you could use the compression to transmit less after first sync.
Some thoughts only.
Maybe mount the final file like mounting an iso image and then using the directory structure let some program only move the differences. Unison or rsync.
I think that then you could use the compression to transmit less after first sync. Jigdo is kind of what I'm thinking.
Yeah, but the OP hasn't provided any details at all, aside from sending me a nasty PM. We don't know how that file is getting generated, or if there's even an option to get it in ISO format. For all we know, this is generated from the VPS system, and is ONLY available as a .gz file. If they're using vzdump, it creates a .pm file, which isn't mountable. So even if you use the rsync compression options, you still have to shovel over 250GB, and can't do a delta of it.
Not to mention the fact they imply they don't have the disk space to store it. Backing up an entire VM container file with vzdump creates an image, but having multiple images is pointless. Take ONE image, then a delta rsync each day. Container fail? Restore the image via the bare metal restore utilities or vzrestore, and restore the rsync delta after you've got a working system.
OP, you STILL need to read the "Question Guidelines" link, and start showing some effort of your own. Sorry, but after 4 years you're not a 'newbie' anymore, and having worked with rsync for at least 3, you should be able to figure out some options on your own.
...where the OP is (essentially), asking this same question. Didn't come back there to answer the follow-ups asked by wpeckham, either, and apparently the OP cannot post to the OpenVZ forums either, since they were banned from that forum. The openvz-diff-backups tool is in beta right now, specifically written to take deltas of existing PM's, and shovel it over SSH tunnels to different locations. The OP seems to have not looked.
If the OP is out of space on source then I'll agree that compressing data to a file is useless. Only to ways to stop in time is snapshot by some means or live state. (wish I could remember that open sources linux live state program name)
Using highest compression levels of the selected compression program may help. Knowing what kind of data it is may yield a better compression program choice.
If the OP is out of space on source then I'll agree that compressing data to a file is useless. Only to ways to stop in time is snapshot by some means or live state. (wish I could remember that open sources linux live state program name)
Using highest compression levels of the selected compression program may help. Knowing what kind of data it is may yield a better compression program choice.
True. But based on the sparse information given here, we can probably assume it's a vzdump file, containing a .pm snapshot, created with the --compress option, which uses gzip. https://openvz.org/Backup_of_a_runni...er_with_vzdump
If the file is already 250GB compressed/gzipp'ed, it probably can't get a lot tighter...which takes us back to the OP saying there's not enough space on the target. The OP hasn't given any information, despite being asked, which seems to be a theme with the OP.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.