Looking for a more robust version of split

sleepy0110 · 01-17-2009, 12:50 PM

I'm looking for a program that allows you to split a file where you specify x bytes go into file 1 and y bytes go into file 2.

I'm really just looking for a way to strip off the first couple of bytes of a file.

I don't think dd will work in my situation because HDD space is very limited and dd leaves the old file behind.

Again, I'm just looking to strip a few bytes off the beginning of a file without leaving the original file behind.

If there are any programmers reading this, I writing this in C would definitely be possible, I just can't the functions I'm looking for. Any ideas?

Thanks.

H_TeXMeX_H · 01-18-2009, 06:00 AM

You can do this using 'dd' in a bash script, or if you know C well enough you can write yourself a program.

bgoodr · 01-18-2009, 01:08 PM

Quote:

Originally Posted by sleepy0110

I just can't [call??] the functions I'm looking for.

Why is that a limitation? Why can't you call the function indirectly by calling the executable that was built from a C program?

Quote:

Originally Posted by sleepy0110

I don't think dd will work in my situation because HDD space is very limited and dd leaves the old file behind.

This tells me that the file you want to operate on is larger than the available free space. Therefore, using dd won't cut it since it needs to write out to some other file just long enough to write it back over the original file. Looking at the dd man page, I did not see an "edit-in-place" option like sed's -i option, but would be curious if anyone knows how to do that just with dd alone and not writing a separate C/C++ program.

bg

H_TeXMeX_H · 01-18-2009, 01:15 PM

Quote:

Originally Posted by bgoodr

This tells me that the file you want to operate on is larger than the available free space. Therefore, using dd won't cut it since it needs to write out to some other file just long enough to write it back over the original file. Looking at the dd man page, I did not see an "edit-in-place" option like sed's -i option, but would be curious if anyone knows how to do that just with dd alone and not writing a separate C/C++ program.

bg

I haven't actually tried it, but can't you write some data from the source file to another file, then erase that portion of the source file using '/dev/null'. Would this save space ? Not sure.

bgoodr · 01-18-2009, 02:08 PM

Quote:

Originally Posted by H_TeXMeX_H

I haven't actually tried it, but can't you write some data from the source file to another file, then erase that portion of the source file using '/dev/null'. Would this save space ? Not sure.

Well, yes and no, The writing of part of the file to another file is possible, which can be done by dd alone as far as I know. If he had the available space on the disk (or even on some spare disk), he could write all of the file out less the few bytes at the start of the original file, and then just do a regular cp command to write the temporary file back over the original. But that requires almost double the space for the temporary file.

Reading between the lines, what I think he wants is some way to just move all bytes in the file back a few bytes just like you would do in a text editor to just delete those few bytes. There are probably low-level Linux-specific ways to do this, but by now I bet he has run down to the store for another spare USB drive just to do it the straight dd way.

For the OP's reference: Below is a transcript of a dd session on a small file. Same applies for a larger file. Notice the difference between seek and skip:

Code:

brentg@yoga:~$ cd /tmp
brentg@yoga:/tmp$ echo this is a test >file1
brentg@yoga:/tmp$ dd if=file1 of=file2 bs=1 seek=1
15+0 records in
15+0 records out
15 bytes (15 B) copied, 8.8839e-05 seconds, 169 kB/s
brentg@yoga:/tmp$ cat file2
this is a test
brentg@yoga:/tmp$ dd if=file1 of=file2 bs=1 seek=2
15+0 records in
15+0 records out
15 bytes (15 B) copied, 6.7328e-05 seconds, 223 kB/s
brentg@yoga:/tmp$ cat file2
tthis is a test
brentg@yoga:/tmp$ dd if=file1 of=file2 bs=1 seek=3
15+0 records in
15+0 records out
15 bytes (15 B) copied, 6.8445e-05 seconds, 219 kB/s
brentg@yoga:/tmp$ cat file2
ttthis is a test
brentg@yoga:/tmp$ dd if=file1 of=file2 bs=1 skip=1
14+0 records in
14+0 records out
14 bytes (14 B) copied, 8.2972e-05 seconds, 169 kB/s
brentg@yoga:/tmp$ cat file2
his is a test
brentg@yoga:/tmp$ dd if=file1 of=file2 bs=1 skip=2
13+0 records in
13+0 records out
13 bytes (13 B) copied, 7.8503e-05 seconds, 166 kB/s
brentg@yoga:/tmp$ cat file2
is is a test
brentg@yoga:/tmp$

bg

almatic · 01-19-2009, 12:01 AM

Code:

#include <stdio.h>
#include <fstream>
#include <iostream>

using namespace std;

int main()
{
    unsigned long i;
    char FileName[25];
    fstream fl;
    cout << "\nName of File (full path) : ";
    cin >> FileName;
    fl.open(FileName, ios::in | ios::binary);
    if (!fl)
    {
        cerr << "\nNo such file\n";
        exit(1);
    }
    fl.seekg(0, ios::end);
    unsigned long FileSize = streamoff(fl.tellg());
    fl.seekg(0, ios::beg);
    cout << "\nHow many bytes should I cut off the beginning : ";
    cin >> i;
    char* cbuffer = new char[FileSize-i]; 
    fl.seekg(i, ios::beg);
    fl.read(cbuffer, (FileSize-i)*sizeof(char));
    fl.close();
    remove(FileName);
    fl.open(FileName, ios::out | ios::binary);
    if (!fl)
    {
        cerr << "\nCannot open file !";
        exit(1);
    }
    fl.write(cbuffer, (FileSize-i)*sizeof(char));
    fl.close();
    delete [] cbuffer;
    cout << "Done !\n\n";
    return 0;
}

bgoodr · 01-19-2009, 09:15 PM

almatic: That reads the entire file into memory. That's ok if the file is a small one, but not if it is large. You will force the system to start swapping at some point. However, that code could be changed to read a block, backup X number of bytes, and then write over the previous blocks. Then at the end you can use the truncate64 function to trim off the last X bytes.

Actually, I'd be surprised if someone hasn't already created a user-level utility command to do just that.

bgoodr

almatic · 01-20-2009, 08:30 AM

Quote:

Originally Posted by bgoodr

However, that code could be changed to read a block, backup X number of bytes, and then write over the previous blocks. Then at the end you can use the truncate64 function to trim off the last X bytes.

Ok, I didn't know that function, but the change is rather trivial, isn't it. It was just for demonstration, because the poster stated he didn't know the necessary functions. He has probably already solved it anyway.

Quote:

Actually, I'd be surprised if someone hasn't already created a user-level utility command to do just that.

Why would anyone make such a useless utility

almatic · 01-20-2009, 08:43 AM

ok, I have now quickly changed the program and added the truncate function you linked. I have tested it on a 1gb vob and it took just 5 seconds to cut that nasty byte.
It's still beyond me why anyone would need that ...

Code:

#include <fstream>
#include <iostream>

using namespace std;

int main()
{
    unsigned long i;
    unsigned long bytecount=0;
    int scale;
    char FileName [128];
    fstream fl;
    cout << "\nName of File (full path) : ";
    cin.getline(FileName, 128);
    fl.open(FileName, ios::in | ios::binary);
    if (!fl)
    {
        cerr << "\nNo such file\n";
        exit(1);
    }
    fl.seekg(0, ios::end);
    unsigned long FileSize = streamoff(fl.tellg());
    if (FileSize > 104857600)
        scale=10485760;
    else if (FileSize > 10485760)
        scale=1048576;
    else if (FileSize > 1048576)
        scale=10240;
    else
        scale=1024;
    cout << "\nHow many bytes should I cut off the beginning : ";
    cin >> i;
    char* cbuffer = new char[scale*sizeof(char)]; 
    while(bytecount < (FileSize-i))
    {
        fl.close();
        fl.open(FileName, ios::in | ios::binary);    
        fl.seekg(i+bytecount, ios::beg);
        fl.read(cbuffer, scale*sizeof(char));
        fl.close();
        fl.open(FileName, ios::out | ios::binary);
        fl.seekp(bytecount, ios::beg);
        fl.write(cbuffer, scale*sizeof(char));
        bytecount+=scale;
    }
    delete [] cbuffer;
    fl.close();
    truncate(FileName, (FileSize-i)*sizeof(char));
    cout << "Done !\n\n";
    return 0;
}

bgoodr · 01-20-2009, 08:37 PM

Quote:

Originally Posted by almatic

It's still beyond me why anyone would need that ...

Perhaps he is chopping off a boot sector off of a disk image? I dunno. May be he's a troll and we fell into his trap?