LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   File Backup (https://www.linuxquestions.org/questions/linux-newbie-8/file-backup-4175442999/)

andrew.comly 12-26-2012 10:30 AM

File Backup
 
What's the best way in Linux to backup files and folders?

Currently my personal files are 48.5 GB large, 38,466 files with 2,971 sub-folders. I try PCManFM and Dolphin FM, but they end up failing half way through around 50% of the time.

I have right clicked in PCManFM to compress, but after it compresses everything, when go into the tar.gz with archive manager, not all folders are there!

Maybe is entering some command in Terminal the only secure way?

vishesh 12-26-2012 10:58 AM

You can create compressed tar backup using tar command

tar czf <path <tarfilename> <path of folder to be backuped>

Thanks

malekmustaq 12-26-2012 11:31 AM

andrew.comly,

Hi, Welcome to LQ.

Quote:

Currently my personal files are 48.5 GB large, 38,466 files with 2,971 sub-folders. I try PCManFM and Dolphin FM, but they end up failing half way through around 50% of the time.
I assume you have a separate space to back up to.

Quote:

Maybe is entering some command in Terminal the only secure way?

There are of course good commercial backup utilities for Gnu/Linux, but the same objective can be attained by simple terminal commands.

Use 'rsync', simple and sure.
Code:

~$ rsync --archive /personal/files /mounted/volume/
or you can use the "z" (zip) option and the many options available, just read the manual.
Code:

~$ man rsync
Hope that helps.

Good luck.

shivaa 12-26-2012 11:32 AM

You can create a .tar of multiple files, and then compress the tar file.
To create a .tar:
Code:

tar -cvf sample.tar /path/to/file(s)
To check it's content:
Code:

tar -tvf sample.tar
To compress:
Code:

gzip /path/to/file
To check content of tar+zip file:
Code:

tar -ztvf /path/to/file
After creating a tar/zip, take backup of it in some external drive.

lleb 12-26-2012 04:10 PM

were are you putting the backup files? are they going to a NAS, to an other computer, to the same computer in a different directory? were.... this makes a slight difference in how best to backup the data.

if going to a NAS, then you might want to just look into either mounting the CIFS mount point and moving the data around, or you could even activate FTP on the NAS and use lftp to transfer a tarball, see above on how to create, to the NAS.

if going to an other Linux computer then rsync with ssh keys would be the best way to go.

rsync would also be a great way to move the data around locally too.

I typically will use:

rsync -aviS /souce/files /destination/

for my backups. with this much data you will see a network hickup so you might want to run this at night via cron, or you might want to consider adding -z in there for compression. see the man rsync for more details.

andrew.comly 12-28-2012 11:59 AM

Quote:

Originally Posted by shivaa (Post 4857164)
You can create a .tar of multiple files, and then compress the tar file.
To create a .tar:
Code:

tar -cvf sample.tar /path/to/file(s)
To check it's content:
Code:

tar -tvf sample.tar
To compress:
Code:

gzip /path/to/file
To check content of tar+zip file:
Code:

tar -ztvf /path/to/file
After creating a tar/zip, take backup of it in some external drive.


The above "sudo tar -cvf AC20121229.tar /mnt/D/AC" yields the following:
tar: AC20121229.tar: Wrote only 4095 of 10240 bytes
tar: Error is not recoverable: exiting now
"

lleb 12-28-2012 03:56 PM

how about you copy/paste the exact output of the command for us as well as answer the above questions about location and space.

you can run df -Th from either $ or # after you have mounted the destination.

ruario 12-28-2012 04:26 PM

To those recommending gzipped tar files, that is a bad idea. Personally I would copy the files (perhaps with rsync) to an external disk using a linux formatting scheme. If you need to copy to a disk or network drive where UNIX file permissions and attributes cannot not be maintained (e.g. using a disk using a Windows file format) and want to use an archive format to retain permissions/attributes, consider the implications of gzipping the archive itself.

To expand I'll link to a post I made previously on this topic:

Quote:

Originally Posted by ruario (Post 4790081)
you might want to reconsider gzip compressed tars because a single corrupt bit near the beginning of the archive means the rest of the file is a write off. This is less of an issue when using an actual disk for backup as opposed to media like DVDs, Blu-ray, etc. but still something to consider. Personally I would either skip compression or use xar, dar or afio instead, all of which can compress files individually as they are added (afio gives you the most compression options, since you can specify any compressor you like). This is safer as any corruptions will mean only losing some of you files. Alternatively (or better yet in addition) look at making parity archive volume sets. Check out the par2cmdline utils, an implementation of PAR v2.0 specification.

EDIT 1: And if you won't take my word for it here is what "UNIX Power tools, 3rd Edition (O'Reilly)" has to say:

Quote:

Originally Posted by Section 38.5.4. To gzip, or Not to gzip
Although compression using gzip can greatly reduce the amount of backup media required to store an archive, compressing entire tar files as they are written to floppy or tape makes the backup prone to complete loss if one block of the archive is corrupted, say, through a media error (not uncommon in the case of floppies and tapes). Most compression algorithms, gzip included, depend on the coherency of data across many bytes to achieve compression. If any data within a compressed archive is corrupt, gunzip may not be able to uncompress the file at all, making it completely unreadable to tar. The same applies to bzip2. It may compress things better than gzip, but it has the same lack of fault-tolerance.

EDIT 2: If you do ever need to attempt recovery an important gzipped file, you should read this to see exactly what is involved.

ruario 12-28-2012 04:52 PM

Two more thoughts to the original poster:

Older versions of gzip had problems decompressing files larger than 4Gb (e.g. gzip 1.2.4 had such an issue). Your member info that that you use Lubuntu 12.10 so in theory this should not be a problem (as it ships with gzip 1.5) but perhaps you are describing an issue on another machine with an older distro (and hence an old gzip)? If yes, that could be your problem.

There are multiple tar implementations, which can use different default formats (additionally some distros compile GNU tar with different defaults). Modern GNU tar should have no problem with such a large archive as long as you are using GNU or PAX formats. To be 100% sure one of these is being used I would specify either --format=gnu or --format=pax.

That all said, I would once again suggest either doing a straight copy (perhaps with rsync), using a tar without compression or using another archive format that can do internal compression.

andrew.comly 12-29-2012 10:33 AM

I am extremely busy on weekends, but I will have some spare time starting on Sunday night(China time), any patience is much appreciated!

Andrew Comly

andrew.comly 12-31-2012 08:42 PM

rsync seems to work, but...
 
1 Attachment(s)
Your "~$ rsync --archive /personal/files /mounted/volume/" was most helpful. In addition, I did add on a few extra options after reading the "rsync --help" in detail to make the following:
rsync --archive -vE --delete --stats /media/a/AC/Recent/AC/ /AC

I found the above "--stats" option quite useful, giving the essential number of files information as the attached jpg indicates. Unfortunately, no "number of folders" info given, and worst of all due to my system crashing after I tried a "tar --format=gnu" command, I am reluctant to install dolphin(it's built for Kubuntu, not Lubuntu).

How can terminal be used to look up the # of folders information for the above job, and how may I obtain both the # of files and folders in the source folder? Indeed part of backing up information should definitely include verifying copied information matches source information exactly, this detail is quite essential.

andrew.comly 12-31-2012 09:18 PM

Outcomes of "tar -... --format=gnu SOURCE TARGET"
 
The tar command just didn't work, I tried the "--format=gnu" option and it only copied ~20% of my information.

Strangely enough a few minutes after this I encountered a strange crash error message that took me to the official KDE report bug reporting website, but due to not having some 'gpd' (or something like this) program installed, I just wasn't able to report the vital crash information. Even when I then opened terminal up using the "sudo apt-get xxx" didn't work, giving me the erroneous information that my disk space wasn't enough. This is quite absurd due to the fact on my Samsung at that time the drive I used for my personal files was on a different partition that the system files, more specifically: sys drive - 80GB; personal file drive 100GB, with no more that 15GB used on the system drive. More specifically terminal reported that there was insufficient space needed to install the mere 5KB of space the '~gpd' program would require.

I wonder if the crash is due to the fact that I was running dolphin FM and PCManFM concurrently, dolphin is originally made for KDE/Kubuntu, not Lubuntu. This might explain the KDE ladybug crash report website.

Anyways, I then rebooted to find the dreaded black page with only a blinking cursor. I then rebooted from the Lubuntu 12.10 flash drive trying first the "~reinstall system only" option (the option to re-install Lubuntu without deleting installed programs) which didn't work, so I ended up having to completely reinstall everything. I don't certainly don't blame your advice for this, but I just thought I should offer you information of what happened after I tried the above “tar –format=gnu SOURCE TARGET” command.

Certainly the rsync command should do the job, except I don't know how to verify # folders/files on both the target and source files match now without dolphin. I dare not install dolphin again, is there some way to use terminal to obtain this information?

malekmustaq 12-31-2012 11:56 PM

Quote:

Certainly the rsync command should do the job, except I don't know how to verify # folders/files on both the target and source files match now without dolphin.


You can monitor everything what rsync is doing --even what errors are happening behind-- into a file which you can examine by grepping or searching on it whatever you query later after the job is done: this means you can put everything on a record file.

Try this:
Code:

rsync --archive source/ destination/ >myrecord.txt  2>&1
Now, when job is done you can ask anything from the record: error, file-names, folder-name, etc.
Code:

grep -li filename myrecord.txt or
grep error myrecord.txt

Or you can open the myrecord.txt with any text editor then "Ctl+F" search for whatever-foo <press Enter> --you know that already :).

BTW, just a point of reminder: if you deal with huge quantity of files on huge volumes it is most advisable to use the terminal (rsync), not file managers like dolphin where memory resource is handled differently.

Hope that helps.

Good luck.

andrew.comly 01-05-2013 12:21 AM

rsync --archive /src /destination >myrecord.txt 2>&1: Error MSG
 
1 Attachment(s)
malekmustaq,

Thanks for your idea. Unfortunately when trying this I encounter the following message error message (screen snapshot attached):
a@SAM:/$ sudo rsync --archive --delete --vEu --stats /AC /a/home/AC >myrecord.txt 2>&1
<Enter>
bash: myrecord.txt: Permission denied

The above error message is still present when writing "sudo" in front of the above command, even when I make a "myrecord" file (no .txt since a leafpad document) before executing the above command in the "/" directory, the same "bash: myrecord.txt: Permission denied" is still encountered.

Andrew Comly

andrew.comly 01-05-2013 10:55 AM

rsync: Source & Destination still doesn't exactly match.
 
1 Attachment(s)
malekmustaq / All others :-),

So after I ran the “rsync --archive --delete -vE -u --stats /AC /home/a/AC” command, and it seemed to work. Next, in order to check a match between the source and destination files, I looked up on a Mac OSX hints webpage and discovered the concept of combining the two commands of “ls” and “wc” together, and received the following result:

a@SAM:/$ cd /AC
a@SAM:/AC$ ls -R | wc -l
46789

a@SAM:/AC$ cd /home/a/AC
a@SAM:~/AC$ ls -R | wc -l
46781

I didn't take any Computer Science classes, but isn't the subject of computers supposed to be a “hard” science? If so, why am I still short 8 files? I guess computer science just isn't as much of a "hard science" as mathematics or physics! {hard science = science in which facts and theories can be firmly and exactly measured, tested or proved, as opposed to soft science, e.g. sociology or economics}

Sincerely,
Andrew


All times are GMT -5. The time now is 11:10 PM.