LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-06-2009, 06:59 PM   #1
zklone
LQ Newbie
 
Registered: Dec 2009
Posts: 1

Rep: Reputation: 0
Scripting: split file into 12 lines array


Hi,

I need to split a file into an array. The split is at every 12 line.
eg.

line 1
line 2
...
line 24


then the array will look something like

items[0] = line 1 ... line 12
items[1] = line 13 ... line 24

right now I am read line by line from the file and putting into an array. This is a little slow.

If there is a better way, please point me in the right direction.

Thanks,
Mike
 
Old 12-06-2009, 07:12 PM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
tell us what exactly what problem you are solving.
 
Old 12-06-2009, 07:28 PM   #3
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 59
Quote:
Originally Posted by zklone View Post
right now I am read line by line from the file and putting into an array. This is a little slow.
I'm not sure how you can avoid doing this (in one way or another). You have to read the file to get the items at all, so to that degree you're I/O bound.

If the lines are uniform, you could theoretically do something involving bytes or size, but that strikes me as an unlikely possibility.
 
Old 12-06-2009, 08:25 PM   #4
lwasserm
Member
 
Registered: Mar 2008
Location: Baltimore Md
Distribution: ubuntu
Posts: 184

Rep: Reputation: 41
I don't know if it will run faster, but you could use something like this (untested code, just for concept)

Code:
INDEX=0
LINENUMBER=1

while whatever-is-appropriate; do
ARRAY[INDEX]=sed '$LINENUMBER,+11p' /path/to/file
((INDEX++))
((LINENUMBER*=12*INDEX)) 
done
Note that sed starts line numbering at 1, not at 0.
 
Old 12-06-2009, 08:31 PM   #5
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by lwasserm View Post
I don't know if it will run faster, but you could use something like this (untested code, just for concept)

Code:
INDEX=0
LINENUMBER=1

while whatever-is-appropriate; do
ARRAY[INDEX]=sed '$LINENUMBER,+11p' /path/to/file
((INDEX++))
((LINENUMBER*=12*INDEX)) 
done
Note that sed starts line numbering at 1, not at 0.
This will cause sed to read the entire file every time through the loop, not just the 12 lines requested.

Please see the thread below:
sed script to parse a file into smaller files with set # of lines

Kevin Barry
 
Old 12-06-2009, 09:04 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,222

Rep: Reputation: 1019Reputation: 1019Reputation: 1019Reputation: 1019Reputation: 1019Reputation: 1019Reputation: 1019Reputation: 1019
I did some testing a while back, and found perl was faster at subsetting a (huge) file than sed, even if both were stopped after the requisite lines (only) were found rather than continuing to read.
As usual, YMMV.
 
Old 12-06-2009, 11:07 PM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by ta0kira View Post
with 80million? lines of file, you can (for the last post in that thread)
1) lose the cat because its useless,
2) avoid using bash's while read loop to read big files.
3) and if bash solution is desired, no need to call external sed command. use bash's own string substitution.
4) or use awk
 
Old 12-07-2009, 01:09 AM   #8
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by ghostdog74 View Post
with 80million? lines of file, you can (for the last post in that thread)
1) lose the cat because its useless,
2) avoid using bash's while read loop to read big files.
3) and if bash solution is desired, no need to call external sed command. use bash's own string substitution.
4) or use awk
Please post an example, either here or in the other thread.
Kevin Barry
 
Old 12-07-2009, 04:50 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by ta0kira View Post
Please post an example, either here or in the other thread.
Kevin Barry
an example for which point ? 1,2,3 or 4?
 
Old 12-07-2009, 01:44 PM   #10
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by ghostdog74 View Post
an example for which point ? 1,2,3 or 4?
Your solution to the problem taking into account all 4.
Kevin Barry
 
Old 12-07-2009, 06:08 PM   #11
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by ta0kira View Post
Your solution to the problem taking into account all 4.
Kevin Barry
1) Instead of
Code:
cat $file | while....
. use input redirection
Code:
while read ...
do
done < $filename
or open/close the file
Code:
exec 4<"$filename"
while read -r line <&4
do
  ....
done
exec >&4-
2) its well known that processing large files with bash's while read loop is slower (much slower) than using tools like awk. you can search some of my previous posts (way back) which i demonstrated this.

3) I am not sure what that sed line is doing ie s/./&/, care to explain?

4) Have already provided awk suggestion in that thread.
 
Old 12-07-2009, 09:39 PM   #12
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by ghostdog74 View Post
2) its well known that processing large files with bash's while read loop is slower (much slower) than using tools like awk. you can search some of my previous posts (way back) which i demonstrated this.
Ok, but I'm not sure how you direct to separate files with awk without running through the file more than once. That's my awk ignorance, though, which is why I was hoping you had an example.
Quote:
Originally Posted by ghostdog74 View Post
3) I am not sure what that sed line is doing ie s/./&/, care to explain?
I'm not sure, either. It's something that OP had in his or her original script, and again, I didn't test my code. It was an example. This isn't the thread to argue about such things; therefore, it would be helpful if you'd show what you mean by "awk can do it better."
Kevin Barry
 
Old 12-07-2009, 10:02 PM   #13
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by ta0kira View Post
Ok, but I'm not sure how you direct to separate files with awk without running through the file more than once.
you mean this?
Code:
awk 'NR%4==1{++c}{print $0 > "file-"c".txt"}' file
NR%4==1 means at every 4th line. eg
Code:
$ more file
1
2
3
4
5
6
7
8
9
10
$ awk 'NR%4==1' file 
1
5
9
using this concept, OP can change it to NR%4000000 for his requirement. notice that count variable "c" is incremented at every 4th line. this variable "c" will be appended to file name.
the awk one liner above summarizes what OP did will those bunch of seds
Code:
sed -n '1,4000000 s/./&/w $FileName.01' $FileName
...
...
sed -n '76000001,$ s/./&/w $FileName.20' $FileName
the s/./&/w is just writing to the file (my guess), which is just a simple print with redirection ">" in awk.

Quote:
I'm not sure, either. It's something that OP had in his or her original script, and again, I didn't test my code.
whatever it is, there's no need to call sed. (echo or printf will do )

Quote:
it would be helpful if you'd show what you mean by "awk can do it better."
what i mean "better" is in the sense of speed performance on big files as compared to bash's while read loop.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
split very large 200mb text file by every N lines (sed/awk fails) doug23 Programming 8 08-10-2009 06:08 PM
Scripting ?, determine file type, compress, split and email edpatterson Linux - Newbie 2 01-17-2009 10:40 AM
Scripting ?, determine file type, compress, split and email edpatterson Linux - Newbie 1 01-17-2009 10:24 AM
Split large file in several files using scripting (awk etc.) chipix Programming 14 10-29-2007 11:16 AM


All times are GMT -5. The time now is 06:38 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration