LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to split very large files to copy to an external hard drive? (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-split-very-large-files-to-copy-to-an-external-hard-drive-4175662742/)

grumpyskeptic 10-18-2019 04:02 AM

How to split very large files to copy to an external hard drive?
 
Using Linux Mint 17.3 Rosa Cinnamon.

I have some very large files I want to move to an external hard drive that has a USB plug, but I get an error when I try to copy files much over 4GB.

I realise now that I should have re-formatted it to NTFS format when it was new, but it is too late as it has too much stuff on it.

The files are for example:

myfile.zip about 9GB

myotherfile.iso about 4.6GB

I have found out about the "split" command, but this seems to be for files with lines in them, unlike mine. Also, I have not be able to get a clear idea of what actual prefixes and suffixes to use.

Please note that I want to be able to put the file back together again in ten or more years time, so I need commands that are likely to be around for a long time.

Questions please:

1. What actual commands should I use to split the files?

2. What actual commands should I use to re-assemble the file parts?

3. Is it possible to assemble the file parts into a whole file on the external hard-drive, even though they were moved there in parts?

4. Is it possible to safely re-format the external hard-drive without losing the files on it? I have seen something on the internet for Windows that says it can do this.

Thanks.

michaelk 10-18-2019 04:27 AM

The split command has the -b option which will split the file based upon the number of bytes. Lots of examples can be found by searching the internet.

The cat command can easily reassemble the files. Suffix not real important.

No, If the drive is formatted as fat32 the max file size is 4GB, you can't reassemble the file on the usb drive.

If this is a cheap usb flash drive I would not want to use it as long term storage for 10 years. I would want to verify data which means not keeping split files.


Windows has a convert command which will convert fat32 to ntfs without data loss. I've never had to use it... However, always have a verified backup of important data just in case.

syg00 10-18-2019 04:37 AM

Excellent response - as usual. Note that you can simply leave the segments there for assembly later, but fat is a pox; subject to regular corruptions even while at rest. Your data are exposed - use something better.

Firerat 10-18-2019 10:03 AM

man tar

Code:

<snip>
      -L, --tape-length=N
              Change  tape after writing Nx1024 bytes.  If N is followed by a size suffix (see the subsection Size suffixes below), the suffix
              specifies the multiplicative factor to be used instead of 1024.

              This option implies -M.

      -M, --multi-volume
              Create/list/extract multi-volume archive.
<snip>

your limit is 4GB - 1 byte

so just to keep life easy, have your tape length 3G ;)

but play around with much smaller tape sizes / data sets to test archive / extraction of multi-volume

tar -L3G might be overkill,

you can probably just use split and reconstitute files with cat.

Shadow_7 10-18-2019 10:39 AM

If it was formatted ext4 with 4k blocks (default) you could have file sizes larger than a terabyte.

You can use split, it has options, some split based on line count, some on bytes, some on number of desired chunks, and such. Which you could just cat together to a file to rebuild it.

$ split --bytes=4000000000 --suffix-length=2 --numeric-suffixes=01 file.dat file_dat_
$ cat file_dat_?? > file_new.dat

Of course none of this retains the permissions or date/time stamps of the original file(s). There are some compression formats that will generate chunks like rar. Not that useful for compression on already compressed (media) files. But will create chunks and preserve permissions and timestamps.

grumpyskeptic 10-19-2019 08:38 AM

Thank you Shadow_7 and others.

I find descriptions on the internet of what split does very confusing.

"$ split --bytes=4000000000 --suffix-length=2 --numeric-suffixes=01 file.dat file_dat_"

Does this mean that it will split a file called file.dat into chunks of 4GB, and name the chunks as file_dat_.01, file_dat_.02, etc? Would this work with .zip and .iso files?

What exactly would I need to do to turn myfile.zip of about 9gb into parts of 1gb: myfile.zipsplit01, myfile.zipsplit02.....?

And what would I need to do to turn myotherfile.iso of about 4.6gb into parts of 1gb, myfile.isosplit01, myfile.isosplit02.....?

Thanks.

Firerat 10-19-2019 09:09 AM

split does not care what the file actually contains


--bytes=4294967296 is 1 byte too many
--bytes=4294967295 is fine
--bytes=4000000000 is fine and easy

not much I can add

sometimes it is just quicker to experiment yourself instead of seeking permission or full instructions from someone else

you really shouldn't have too much difficulty figuring out what you have asked for yourself

pan64 10-19-2019 09:13 AM

you can easily try how does it work on a small[er] file too.
https://www.linuxtechi.com/split-com...or-linux-unix/

Shadow_7 10-19-2019 02:07 PM

Sometimes it's easier to try it, or read the source than asking questions and hoping someone that knows answers and answers correctly. Man pages can be confusing since there's many options that conflict. Like divide by lines / chunks / bytes for split. They're mutually exclusive, or trying more than one of them on the same command might find some interesting bugs, or unpredictable behaviors.

--bytes=4000000000

If that generates 4GB chunks, then:

--bytes=1000000000

Should generate 1GB chunks. At least in marketing definitions of 1GB. Actual 4GB chunk, or less than would be (2^32)-1 (has to be "less than" to fit).

2^32 == 4,294,967,296
-1 == 4,294,967,295

In terms of 1024K blocks definition of size versus marketing.

(2^30)-1 == 1,073,741,823

For 1GB chunks via 2^ definitions. Less than 1GB chunks.

CoreyCananza 10-19-2019 02:49 PM

i believe zip folders will work for that

Firerat 10-19-2019 02:58 PM

I hate GB and GiB

I never remember which is which

GB gigabyte is 1000³
GiB gigibyte is 1024³

I have probably tried to remember it by saying to myself that GiB is "larger" than GB so that is the bigger number.
But I always have to doublecheck

Shadow_7 10-19-2019 05:30 PM

It's all GB to me. 1024 base, not base 10. AKA powers of 2. We existed before marketing ignorance. And then there's bits versus bytes for networking and media types. Probably a good thing I don't fill out "standardized" tests of multiple choice. Where all the answers are technically correct depending on what department / field of study you specialize in. And yet no option says all of the above. Like most things high school, you have to give the answer that you think the teacher/tester wants, not the answer you believe is correct.


All times are GMT -5. The time now is 09:43 PM.