LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 10-24-2007, 09:53 AM   #1
rhkramer
LQ Newbie
 
Registered: Oct 2007
Posts: 6

Rep: Reputation: 1
Splitting a file at (any) record boundary to get files not to exceed a certain size?


(I've tried googling and looking at split, csplit, fsplit, and a few others (and I've looked at other threads on LQ--so far I haven't found what I'm looking for.)

I'm looking for a file utility to split files in perhaps an unusual way.

I have several very large files that consist of multiple records. Each record consists of a variable number of lines, but ending with a record separator / record terminator. (currently "\nmorF\n")

I want to split those files so they don't exceed a certain size, but at a record boundary. In other words, a variable number of whole (entire) records totaling not more (or about) a specified size in KB.

Oh, and (of course) the split files will have to have names, ideally something using the original filename as a prefix and then a numeric suffix would be nice.

I've done some googling, which I'll continue, but it is not very promising so far. If I can't find something, I'll probably try writing something in Ruby (because I'm starting to get the hang of Ruby), but if anybody has something prewritten in awk, sed, perl, ... that should work (assuming I have to do little or no modification to the code ;-) .
 
Old 10-26-2007, 03:27 AM   #2
aus9
LQ Addict
 
Registered: Oct 2003
Location: Australia
Distribution: Mainly Debian based
Posts: 5,406

Rep: Reputation: Disabled
yes I know you have looked at split but lets relook it at.

consider you have file1.

file has that special field inside which for simplicity I relabel "a"

touch file1
vi file1 so its contents are line1 a line2 b
split -l 1 file1 produces files xaa and xab
cat xaa > a
cat xab > b

if a is several lines down then obviosly the split command changes the l number
 
Old 10-26-2007, 05:39 AM   #3
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
Here's a Perl program which does something like that:
Code:
#!/usr/bin/perl

use strict;
use warnings;
use FileHandle;

# Set the record break string.
$/ = "ENDREC";

# maximum file size is 1 KiB
my $max_file_size = 1024;

# The output file name template:
my $oft = "output_%03d";


my $current_file_size = $max_file_size;
my $current_file_number = 0;
my $output_fh = new FileHandle;

while(<>) {
        my $record_size = length($_);
        if ($current_file_size + $record_size > $max_file_size) {
                $output_fh->close;   # FileHandle module is nice enough not to complain if not already open.
                $current_file_number++;
                $output_fh->open(sprintf("> $oft", $current_file_number)) || die "cannot open file for writing : $!\n";
                $current_file_size = 0;
        }
        print $output_fh $_;
        $current_file_size += $record_size;
}

$output_fh->close;
 
Old 10-26-2007, 08:31 AM   #4
rhkramer
LQ Newbie
 
Registered: Oct 2007
Posts: 6

Original Poster
Rep: Reputation: 1
Thanks to you both!

(I presume this reply will be below matthewg42's reply--I didn't see a "post reply" button below aus9's message.)

Thanks to you both!

I should have mentioned that several of my files have nearly 2000 records totaling nearly 10 MB in size (and the problem is that Kate's folding / syntax highlighting seems to just fall over at that point)--the vi approach won't really cut it. ;-)

Thanks for the Perl program--it looks like I could modify it myself if need be, and answers a question that is relevant in Ruby as well--I didn't know if I could assign to $/ directly.

I've started on a Ruby program--I may finish it anyway just for the learning experience.

regards,
Randy Kramer
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Script to delete files with 0k file size in a directory justgiver Linux - Newbie 4 01-28-2008 05:56 AM
i want to print all the files in a directory and the file size in a same line naveen245 Programming 3 12-06-2005 09:22 AM
splitting a larg file into smaller files to be rebuilt on a windows comp 1eyedgorilla Linux - Software 2 12-22-2004 07:30 PM
search for files based on file size fatrandy13 Linux - General 1 12-05-2004 11:47 PM
segmentation fault when array size exceed 1GB ymei Programming 14 11-11-2003 11:27 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration