LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 05-24-2012, 07:56 AM   #1
414N
Member
 
Registered: Sep 2011
Location: Italy
Distribution: Slackware
Posts: 647

Rep: Reputation: 189Reputation: 189
Searching for info regarding autosplit of files >4GiB on FAT32 partitions


Hi everybody!
I opened this thread because I'm looking for hints/suggestions on adding an auto-split functionality when copying files bigger than X bytes on a filesystem which supports only single files not bigger than X bytes.
While the problem is kinda generic, the more common example would be files > 4GiB to be copied on a FAT32 partition, so I put that in the title of the thread.
I will start explaining what I'd like to achieve and what ideas I have at the moment.

Objective

To be able to copy a file of over X bytes size on a partition which supports only single files not bigger than X bytes transparently to the user.
I often stumble upon this missing feature when copying recordings of TV shows from my PC (after having cut off commercials) to a FAT32-formatted USB thumbdrive. I perfectly know how to manually split those files, but I'd like to come together with an automated generic solution, which should be able to work equally across different desktop environments.
To sum it all up, the solution should:
  • be completely transparent to the user during a copy operation
  • be as desktop environment agnostic as possible
The problems
  • Depending on the file type of the file(s) being copied, the split may need to be done through the use of a user-space tool instead of a chunk-based split (as is done via the split command). For example, mpeg video files can be split via the split command, but avi, mkv, rar, zip etc. cannot. This introduces a list of additional dependencies for every file type which needs its own tool.
  • Remaining on the file type topic, different file types may need different renaming policies when splitting: while for multimedia files it may be strongly desirable to keep the file extension intact (media players may rely only on the extension to recognize a multimedia file) other files may need an incremental suffix right in the extension (think of RAR's r00, r01 ... split volume archives).
  • The maximum file size allowed is a filesystem-dependant information so, depending on the destination filesystem type, a split may not be needed at all.

Implementation ideas

Here are some possible implementation ideas I've come up with (still very high level):
Kernel based implementation
I'm not very acquainted with kernel development and I know very little of its low-level architecture, but I'd guess that to implement this auto-split functionality one would need to modify filesystem code (namely the vfat source code for FAT32 filesystems) to add the required functionality, but it wouldn't be a good idea because of the external tools needed for some splits, which could hinder the security of the system. Some kind of "hook" inside filesystem code which gets called every time a copy operation is performed could be a better approach, but I think it implies an interface change for a lot of code, making it impractical.
FUSE based implementation
For what I've seen in some tutorials/documentation (but I may have overlooked the important parts), FUSE allows to create filesystem drivers in userspace, but it doesn't allow to extend an existing filesystem (namely, vfat) with additional functionality. A FUSE filesystem needs to be mounted too, so some "transparency points" are lost
Other ideas could be:
  • cp based implementation
  • File manager based implementation (goodbye desktop-agnosticism!)
What I would like to come out of this thread is a bunch of hints (even RTFMs are welcome ) on where to look at in the code or the documentation and comments/opinions on the idea (if it's feasible or if it's just pure madness).
Thanks for reading all of this
 
Old 05-25-2012, 03:33 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,629

Rep: Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265Reputation: 7265
I think you only need a special user space copy which will be able to recognize the underlining filesystem and based on that info will split "on the fly". I do not think you would need to implement kernel module or similar, just a shell script.
 
Old 05-25-2012, 03:42 AM   #3
Doc CPU
Senior Member
 
Registered: Jun 2011
Location: Stuttgart, Germany
Distribution: Mint, Debian, Gentoo, Win 2k/XP
Posts: 1,099

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
Hi there,

Quote:
Originally Posted by 414N View Post
I opened this thread because I'm looking for hints/suggestions on adding an auto-split functionality when copying files bigger than X bytes on a filesystem which supports only single files not bigger than X bytes.
While the problem is kinda generic, the more common example would be files > 4GiB to be copied on a FAT32 partition, so I put that in the title of the thread.
I see your point and I appreciate what you're planning. Guess it could also be handy for putting large files on CDs, or even DVDs (floppy disks are slightly out of date, I guess, but the problem used to be the same).

Quote:
Originally Posted by 414N View Post
  • be completely transparent to the user during a copy operation
  • be as desktop environment agnostic as possible
I agree that the solution should be searched for in the file system layer, rather than application layer.

Quote:
Originally Posted by 414N View Post
Depending on the file type of the file(s) being copied, the split may need to be done through the use of a user-space tool instead of a chunk-based split (as is done via the split command). For example, mpeg video files can be split via the split command, but avi, mkv, rar, zip etc. cannot.
Why not? You can split a large file at any arbitrary point you like - unless you assume that each resulting chunk is a fully usable file on its own. Do you assume that? It would mean that your "split-copy" is restricted to a limited set of file types. What about vhd files (virtual disk images)? What about DVD images?

I'd rather favor a file type independent solution: A special copy handler that encapsulates the file into another container format. A proper file name extension, a few bytes allowing safe recognition and a header containing information like full file size should be enough. In the reverse direction, this copy handler must be able to recognize the container and recreate the original file (maybe even prompt you to change CDs).

When I think about it, I see that this is very much like the way popular archiving tools work ...

[X] Doc CPU
 
Old 05-25-2012, 05:11 AM   #4
414N
Member
 
Registered: Sep 2011
Location: Italy
Distribution: Slackware
Posts: 647

Original Poster
Rep: Reputation: 189Reputation: 189
Quote:
Originally Posted by pan64 View Post
I think you only need a special user space copy which will be able to recognize the underlining filesystem and based on that info will split "on the fly". I do not think you would need to implement kernel module or similar, just a shell script.
Yeah, but the main issue of calling this shell script automatically during every copy job remains. Remember that my main goal is to completely automate the split, i.e. the user just tells the system (via cli, file managers etc.) to copy files from a source directory to a destination directory and the split is done automatically, if needed.
Quote:
Originally Posted by Doc CPU View Post
I see your point and I appreciate what you're planning. Guess it could also be handy for putting large files on CDs, or even DVDs (floppy disks are slightly out of date, I guess, but the problem used to be the same).
This is a slightly different problem: while the filesystem used on the medium can, theoretically, support files of "any size", you're then limited by the medium physical size: for example, even when using ISO9660 one can write single files of up to 8 TiB (as written on wikipedia albeit losing some compatibility), but a single CD or a single DVD cannot contain such a file because limited first by their physical dimensions.
Maybe this kind of operation is better done via a CD/DVD burning software, which could better manage media insertions/removals for the splitting.
Quote:
Originally Posted by Doc CPU View Post
I agree that the solution should be searched for in the file system layer, rather than application layer.
I guess that's the only feasible way of making the split process as transparent as possible, but I'm open to all proposals
Quote:
Originally Posted by Doc CPU View Post
Why not? You can split a large file at any arbitrary point you like - unless you assume that each resulting chunk is a fully usable file on its own. Do you assume that? It would mean that your "split-copy" is restricted to a limited set of file types. What about vhd files (virtual disk images)? What about DVD images?
I'm sorry for not having made it clear, but I'd like to maintain the usability of every chunk, especially when splitting multimedia files.
Still thinking at the problem at a very high level, one can assume that this special copy handler could work like this:
  • if there are no "rules" specified, then a simple chunk split is done;
  • if a "rule" for the file type of the file(s) being copied exists, than we follow that "rule" (which can mean to use a userspace tool to perform the split in order to maintain file usability).
These "rules" would be some kind of information that extend the original program functionality i.e. some kind of plugins.
In this way, you can perform simple splits for all the files (if no "rules" are loaded) or let the "rules" manage the split
Quote:
Originally Posted by Doc CPU View Post
I'd rather favor a file type independent solution: A special copy handler that encapsulates the file into another container format. A proper file name extension, a few bytes allowing safe recognition and a header containing information like full file size should be enough. In the reverse direction, this copy handler must be able to recognize the container and recreate the original file (maybe even prompt you to change CDs).

When I think about it, I see that this is very much like the way popular archiving tools work ...
Yep, I think it too
By the way, thanks pan64 and Doc CPU for these initial comments.

Last edited by 414N; 05-25-2012 at 06:24 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] bash question: searching for info and changing the color of keywords in the results chicone Programming 4 05-06-2011 08:38 AM
[SOLVED] Gparted LiveCD stalls searching for partitions vnkinnaman Linux - Newbie 16 10-04-2010 03:38 AM
Suse 10 Hangs on Install @ searching for info file Passx SUSE / openSUSE 6 11-11-2006 11:57 PM
Can't access files on FAT32 partitions? epihammer Amigo 6 10-05-2004 05:01 AM
2 fat32 partitions on fbsd5/w2k dual. Can only access one fat32 so far.. mipia *BSD 1 10-06-2003 02:10 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 01:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration