split very large 200mb text file by every N lines (sed/awk fails)
Hi All,
I have a large text file with over a million lines, and I need to split the file by every N lines. In the end, I need to have three separate files. The first will have every 3 lines starting with the very first line (no header), the second will have every 3 lines starting with the second line, and so on for the third line. Unfortunately, commands that I have tried so far, including: $ sed -n '2~3p' somefile $ awk 'NR%3==0' $ perl -ne 'print ((0 == $. % 3) ? $_ : "")' All fail at some point, and start shifting in the sequence after a certain number (probably an integer overflow). Are there any other commands I should try which should be able to work for the entire file? Thanks! Doug |
a simple start
Quote:
Code:
$> cd /home/user Caveats: As I tested this, csplit produced the first file ./smallpart00 with only _two_ lines. If there are less the 3 lines in the last small file, the last line will still get added to one or another of the three collected lines files. So you'd need to edit them accordingly. Watch out for line order issues. I've found the limit of max 100 splits is _not_ valid any longer, at least not with for the version distributed with Debians GNU coreutils 6.10-6. |
Quote:
So, why "probably" ? I.e. why wouldn't you write slightly more code and establish the exact root cause ? You want us to do the debugging ? |
Quote:
|
Quote:
|
Quote:
SomeNumber choice_of_three_words text ---> SomeNumber choice_of_two_words text ---> SomeNumber one_word text ---> Every time I have tried the sed command one of the three result files ends up with a mix of words starting about 22,000 rows down that could never otherwise end up in that file. I have checked the original datafile to ensure that the problem is not in the original file. hasienda -- are the only lines I need to check the very last ones? Thank you very much for your help, Doug |
bash script
This will append to any existing files, takes input from stdin, takes file names as arguments, and doesn't bother checking for correct usage, but it works to split the lines in a round-robin fashion.
Code:
#! /bin/bash |
Quote:
For example, modify your Perl one-liner into full blown script and debug it. Here is a Perl for Windows, for example: http://strawberryperl.com/ -> http://strawberryperl.com/releases.html -> http://strawberryperl.com/download/s...6-portable.zip . |
Quote:
Worked GREAT! Thank you very much for your help! Doug |
All times are GMT -5. The time now is 10:35 AM. |