Appending files whithout reading them

cwomeijer · 01-26-2007, 05:16 AM

Hi Everyone,

I have a problem which I don't have an answer for so
hopefully someone has a bright idea.

I'm having a few processes that create several files on a disk. The actual format is not that important but it is block oriented. Now all those separate files need to be concatenated into a single file. Obviously you can do this by simply creating a new file and copy the data to it. However this means that we need to move the data over the network again.

Because the actually data is already on the disk, isn't there a way to attach the blocks from one file to another without actually touching the data?

Because we are talking about a lot of data here (several tens of Gigs) this is a very worth while exercise for me :-)

Regards
Otto

m0rg · 01-26-2007, 05:26 AM

I'm not sure to understand, but cat maybe what you're looking for.

Read the manual (man cat) but basically it works like this:

Code:

cat file1 file2 ... fileN > output_file

cwomeijer · 01-26-2007, 06:25 AM

Quote:

Originally Posted by m0rg

I'm not sure to understand, but cat maybe what you're looking for.

Read the manual (man cat) but basically it works like this:

Code:

cat file1 file2 ... fileN > output_file

The problem with this solution is that the data is read and written again.
This is exactly what I don't want. I don't want to touch the data at all
because it is a huge amount and they are stored at the disk already.

Otto

matthewg42 · 01-26-2007, 06:49 AM

Is it a network drive? Log into the machine on which the drive is attached and do it locally. Nothing has to go over the network.

AFAIK, there's no "Linux" way to combine files without read data from at least some of the files. The best you could hope for would be to append the smaller files to the largest one, that way the largest file doesn't have to get read - the file will be opened, and your program would just seek to the end of it.

It's possible (although I think pretty unlikely) than some filesystem may provide an extra feature to do this sort of thing, but I doubt there's any way to access such a feature through the regular unix file io calls.

cwomeijer · 01-26-2007, 08:02 AM

Quote:

Originally Posted by matthewg42

Is it a network drive? Log into the machine on which the drive is attached and do it locally. Nothing has to go over the network.

AFAIK, there's no "Linux" way to combine files without read data from at least some of the files. The best you could hope for would be to append the smaller files to the largest one, that way the largest file doesn't have to get read - the file will be opened, and your program would just seek to the end of it.

It's possible (although I think pretty unlikely) than some filesystem may provide an extra feature to do this sort of thing, but I doubt there's any way to access such a feature through the regular unix file io calls.

Yes those files are on a network drive. In fact it is a very big filer.
I also agree that normal Linux C library does not have any functions to do this.

However I was hoping that the lower levels might be able to do such a thing.
I guess if you do a write() somewhere a new disk block needs to be allocated, filled with
data and in the end append to the inode block list of that file.

I just hoped this API was also available so you could manipulate the administration
of the file manually. Obviously this is dangerous but the time savings could be huge.
Simply keep reading this data over and over again is not a good options because
it inflicts way to much i/O whether this is locally or remotely does not matter.

Writing the data once to disk is already bad enough. We don't want to tough it again
if not really needed.

Otto

matthewg42 · 01-26-2007, 08:38 AM

If it's a process which you intend to perform regularly, it might be worth trying to implement your own filesystem with does this sort of thing. I very much doubt there would be a low-risk way to do it on pre-existing data. In any case, it'd have to be a pretty big benefit, because writing filesystems is no picnic.

Do you know of any OS / filesystem which provides such a feature?

cwomeijer · 01-26-2007, 08:55 AM

Quote:

Originally Posted by matthewg42

If it's a process which you intend to perform regularly, it might be worth trying to implement your own filesystem with does this sort of thing. I very much doubt there would be a low-risk way to do it on pre-existing data. In any case, it'd have to be a pretty big benefit, because writing filesystems is no picnic.

Do you know of any OS / filesystem which provides such a feature?

Know I don't know if there is some OS/System out there that does something like this.
And yes this might be a very big task. However within the very near future we will be handling very large files several hundreds of Gigs so I/O is going to be a real bottleneck.

The problem is that we can distribute our processing over several hundreds of CPU's without any problem. However they all create big files that need to be glued together at the end. This glueing is a very costly operation. It means we have to read and write the data all over again.
And the only thing that is happening is simply copying of data. This seems very inefficient to me.

Another solution would be to invent our own file format which might consists of multiple files. Problem with this is that it is not compatible with existing formats.

Otto

PTrenholme · 01-26-2007, 09:15 AM

Quote:

Originally Posted by cwomeijer

Know I don't know if there is some OS/System out there that does something like this.
And yes this might be a very big task. However within the very near future we will be handling very large files several hundreds of Gigs so I/O is going to be a real bottleneck.

The problem is that we can distribute our processing over several hundreds of CPU's without any problem. However they all create big files that need to be glued together at the end. This glueing is a very costly operation. It means we have to read and write the data all over again.
And the only thing that is happening is simply copying of data. This seems very inefficient to me.

Another solution would be to invent our own file format which might consists of multiple files. Problem with this is that it is not compatible with existing formats.

Otto

Now I'm puzzled. Are you saying that you have the files on several hundred different systems, and you want to access thos files as though they were a single file -- without the overhead of creating a single file on one system?

If that's your goal, why not drop the data into, e.g., a MySQL data base on each system, and let those data bases access a table created by the merge of the the seperate tables? That would give each seperate system access to it's local data with no network overhead and access to the "combined" data with the network overhead. If you wanted a local copy at some central location, you could define it as a table created from the merged remote data so only changed data would need to be moved.

In other words, it sounds to me like you're trying to re-invent distributed data base systems when all you need to do is use existing software.

cwomeijer · 01-26-2007, 09:32 AM

Quote:

Originally Posted by PTrenholme

Now I'm puzzled. Are you saying that you have the files on several hundred different systems, and you want to access thos files as though they were a single file -- without the overhead of creating a single file on one system?

If that's your goal, why not drop the data into, e.g., a MySQL data base on each system, and let those data bases access a table created by the merge of the the seperate tables? That would give each seperate system access to it's local data with no network overhead and access to the "combined" data with the network overhead. If you wanted a local copy at some central location, you could define it as a table created from the merged remote data so only changed data would need to be moved.

In other words, it sounds to me like you're trying to re-invent distributed data base systems when all you need to do is use existing software.

The data in this case is used by some other program to drive certain hardware.
Hence i'm not really in control of how this formats should look.

However because these files are so big accessing the data is a real issue.
There are multiple reasons for this. We are jumping around in the file quite
a lot simply because the data is scattered. Also searching for particular data is quite
hard. You simply can't make easy btrees etc for this amount.

Our main goal is performance, we are trying to optimize our routines as much as possible
however once you start hitting I/O there is not a lot you can do besides minimizing it.

Btw the data is shape data that expresses some geometry.
It still seems odd to me that having all the data on the disk already there is no way
we can glue them together apart from copying them all over again.

If this mechanism does not exists in the lower levels of the OS/FS than i'm sure
that some database has the same problems that I have.

Otto

matthewg42 · 01-26-2007, 09:38 AM

PTrenholme: I think he means that he wants copy all the files to one machine, and then glue them together. And that he wants to avoid IO in the gluing stage somehow.

cwomeijer: If you can pre-calculate the size of the append file (just do a stat on all the remote machines, and add the sizes together), you could allocate the size for the whole file, and then have the transfer programs writing directly into their segment of the append file.

The conditions would be:

Before you create the [empty] append file, you need to know the sizes of all the sub-files from the remote systems.
You need to be able to have several processes writing to different parts of the same file simultaneously. I suppose this is the same sort of thing that large databases do all the time, so I assume it's possible, although I've never implemented anything like that.
allocating disk space for a large empty file must be more efficient than copying a file, else you don't gain anything.

Maybe something like this already exists. Thinking about it, it's kind of what bittorrent does isn't it? Maybe you could even use bittorrent as the implementation. All you'd need to do is work out how to create the tracker which tells the destination host to download the chunks from the various other hosts...

porzech · 01-26-2007, 10:12 AM

there is no such possibility
even if you could alter filesystem structures it would produce unexpected results for you becouse of so called slack space
when you write a file to disk there is allocated a certain amount of data blocks wich guarantees enough space to store the contents of your file.
but its extremly unlikkely that the file has exactly the size as the sum of sizez of all allocated blocks in most cases its smaller than that
and the last block is not filled to the end (the filesystem ignores this unused space becouse it knows the exact size of the file)
if you just modify the metadata so all blocks from both files are seen as blocks of only one file there appear two problems
1) new file is bigger than the sum of those 2 files couse the system assumes that blocks in the middle of the file are compleetly filled with data
2) in the middle of the new file you have some random data (slack space) from last block of first file witch can hold random data from earlier files stored in this place before your file - it was ignored by filesystem becouse all slack space at the end of files is ignored but now it has become part of your file couse there is no way to distinguish it from valid data

the only possibility for joining files thru modification of filesystem metadata would be if your files would occupy all blocks allocated to them entirely wich is very unprobably

so there is most probably no filesystem that would offer such a feature
so the only option is to use cat or another utility of this kind but it would always transfer your data from storage hardware to the host runing this utility and back if this is a stand alone network drive

this answer is for single drive problem
if you have multiple computers you can try to copy data from them to a single file on destination host

this should be possible (and quite easy to implement) if files do not change.

you can attach to remote file system and append file by appending it to exixsing file on destination host

cwomeijer · 01-26-2007, 10:30 AM

Quote:

Originally Posted by porzech

there is no such possibility
even if you could alter filesystem structures it would produce unexpected results for you becouse of so called slack space
when you write a file to disk there is allocated a certain amount of data blocks wich guarantees enough space to store the contents of your file.
but its extremly unlikkely that the file has exactly the size as the sum of sizez of all allocated blocks in most cases its smaller than that
and the last block is not filled to the end (the filesystem ignores this unused space becouse it knows the exact size of the file)
if you just modify the metadata so all blocks from both files are seen as blocks of only one file there appear two problems
1) new file is bigger than the sum of those 2 files couse the system assumes that blocks in the middle of the file are compleetly filled with data
2) in the middle of the new file you have some random data (slack space) from last block of first file witch can hold random data from earlier files stored in this place before your file - it was ignored by filesystem becouse all slack space at the end of files is ignored but now it has become part of your file couse there is no way to distinguish it from valid data

the only possibility for joining files thru modification of filesystem metadata would be if your files would occupy all blocks allocated to them entirely wich is very unprobably

so there is most probably no filesystem that would offer such a feature
so the only option is to use cat or another utility of this kind but it would always transfer your data from storage hardware to the host runing this utility and back if this is a stand alone network drive

this answer is for single drive problem
if you have multiple computers you can try to copy data from them to a single file on destination host

this should be possible (and quite easy to implement) if files do not change.

you can attach to remote file system and append file by appending it to exixsing file on destination host

Number of things here.
1) We do have the ability to insert dummy data at certain points
so we would be able to crete sizesis exactly that of certain block sizes.

2) We do not know the total files size nor that of the intermediate files at the moment of creation.

3) We have a utility what copies the files and glues them together. However this is the only process that is not scalable
and we want to get rid of.

I would be interested to know how the distributed file systems would handle this case.

matthewg42 · 01-26-2007, 10:36 AM

Seems like you're asking the impossible if you don't know how big the total size if when it is to be created...

The only thing I would think is possible to improve performance - buy faster drives.

farslayer · 01-26-2007, 10:58 AM

Why not just create the single file to begin with instead of creating multiple files and then combining them later ?

I'm not sure what you are dealing with but a solution similar to a syslog server comes to mind. All the remote systems send their data to the central syslog server and it creates a single log file with all the combined data. then for searching etc, you would use a parsing application like sawmill.

you are trying to fix a problem on the backend that was created on the front end from all appearances.

porzech · 01-26-2007, 12:31 PM

some more questions to clarify situation a bit
1) you want to combine 2 or more files witch are still growing while you do this ?
2) the single big file should be stored on 1 computer or a copy of it on all computers ?
if 1 is true sorry but i see no option for that
if you want to store the big file on a single computer there are 2 options
assuming 1 is false
a) copy all files that are already of known size appending them to the end of the big file
b) use a central storage aproach like with syslog that farslayer mentioned

solution b is even possible even when there is new data appearing but not so easy if you want already existing data to be merged first there must be some point of time you first transfer and merge data that already exists and pappend newly created data in syslog server fashion

your processing nodes should send small portions of data to central server for storage whenever they have full chunk of it, they can write that data to local storage too just in case.

--------------------------------------------
if you have to merge files one at a time appending to existing file without rereading it first (the one file you are appending other files to) you can do it wih cat File_to_be_appended >> File_to_which_you_append_new_files