Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
12-06-2009, 06:59 PM
|
#1
|
|
LQ Newbie
Registered: Dec 2009
Posts: 1
Rep:
|
Scripting: split file into 12 lines array
Hi,
I need to split a file into an array. The split is at every 12 line.
eg.
line 1
line 2
...
line 24
then the array will look something like
items[0] = line 1 ... line 12
items[1] = line 13 ... line 24
right now I am read line by line from the file and putting into an array. This is a little slow.
If there is a better way, please point me in the right direction.
Thanks,
Mike
|
|
|
|
12-06-2009, 07:12 PM
|
#2
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
tell us what exactly what problem you are solving.
|
|
|
|
12-06-2009, 07:28 PM
|
#3
|
|
Member
Registered: May 2007
Distribution: Debian
Posts: 754
Rep:
|
Quote:
Originally Posted by zklone
right now I am read line by line from the file and putting into an array. This is a little slow.
|
I'm not sure how you can avoid doing this (in one way or another). You have to read the file to get the items at all, so to that degree you're I/O bound.
If the lines are uniform, you could theoretically do something involving bytes or size, but that strikes me as an unlikely possibility.
|
|
|
|
12-06-2009, 08:25 PM
|
#4
|
|
Member
Registered: Mar 2008
Location: Baltimore Md
Distribution: ubuntu
Posts: 184
Rep:
|
I don't know if it will run faster, but you could use something like this (untested code, just for concept)
Code:
INDEX=0
LINENUMBER=1
while whatever-is-appropriate; do
ARRAY[INDEX]=sed '$LINENUMBER,+11p' /path/to/file
((INDEX++))
((LINENUMBER*=12*INDEX))
done
Note that sed starts line numbering at 1, not at 0.
|
|
|
|
12-06-2009, 08:31 PM
|
#5
|
|
Senior Member
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 2,962
Rep: 
|
Quote:
Originally Posted by lwasserm
I don't know if it will run faster, but you could use something like this (untested code, just for concept)
Code:
INDEX=0
LINENUMBER=1
while whatever-is-appropriate; do
ARRAY[INDEX]=sed '$LINENUMBER,+11p' /path/to/file
((INDEX++))
((LINENUMBER*=12*INDEX))
done
Note that sed starts line numbering at 1, not at 0.
|
This will cause sed to read the entire file every time through the loop, not just the 12 lines requested.
Please see the thread below:
sed script to parse a file into smaller files with set # of lines
Kevin Barry
|
|
|
|
12-06-2009, 09:04 PM
|
#6
|
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 11,223
|
I did some testing a while back, and found perl was faster at subsetting a (huge) file than sed, even if both were stopped after the requisite lines (only) were found rather than continuing to read.
As usual, YMMV.
|
|
|
|
12-06-2009, 11:07 PM
|
#7
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
Quote:
Originally Posted by ta0kira
|
with 80million? lines of file, you can (for the last post in that thread)
1) lose the cat because its useless,
2) avoid using bash's while read loop to read big files.
3) and if bash solution is desired, no need to call external sed command. use bash's own string substitution.
4) or use awk
|
|
|
|
12-07-2009, 01:09 AM
|
#8
|
|
Senior Member
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 2,962
Rep: 
|
Quote:
Originally Posted by ghostdog74
with 80million? lines of file, you can (for the last post in that thread)
1) lose the cat because its useless,
2) avoid using bash's while read loop to read big files.
3) and if bash solution is desired, no need to call external sed command. use bash's own string substitution.
4) or use awk
|
Please post an example, either here or in the other thread.
Kevin Barry
|
|
|
|
12-07-2009, 04:50 AM
|
#9
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
Quote:
Originally Posted by ta0kira
Please post an example, either here or in the other thread.
Kevin Barry
|
an example for which point ? 1,2,3 or 4?
|
|
|
|
12-07-2009, 01:44 PM
|
#10
|
|
Senior Member
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 2,962
Rep: 
|
Quote:
Originally Posted by ghostdog74
an example for which point ? 1,2,3 or 4?
|
Your solution to the problem taking into account all 4.
Kevin Barry
|
|
|
|
12-07-2009, 06:08 PM
|
#11
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
Quote:
Originally Posted by ta0kira
Your solution to the problem taking into account all 4.
Kevin Barry
|
1) Instead of
Code:
cat $file | while....
. use input redirection
Code:
while read ...
do
done < $filename
or open/close the file
Code:
exec 4<"$filename"
while read -r line <&4
do
....
done
exec >&4-
2) its well known that processing large files with bash's while read loop is slower (much slower) than using tools like awk. you can search some of my previous posts (way back) which i demonstrated this.
3) I am not sure what that sed line is doing ie s/./&/, care to explain?
4) Have already provided awk suggestion in that thread.
|
|
|
|
12-07-2009, 09:39 PM
|
#12
|
|
Senior Member
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 2,962
Rep: 
|
Quote:
Originally Posted by ghostdog74
2) its well known that processing large files with bash's while read loop is slower (much slower) than using tools like awk. you can search some of my previous posts (way back) which i demonstrated this.
|
Ok, but I'm not sure how you direct to separate files with awk without running through the file more than once. That's my awk ignorance, though, which is why I was hoping you had an example.
Quote:
Originally Posted by ghostdog74
3) I am not sure what that sed line is doing ie s/./&/, care to explain?
|
I'm not sure, either. It's something that OP had in his or her original script, and again, I didn't test my code. It was an example. This isn't the thread to argue about such things; therefore, it would be helpful if you'd show what you mean by " awk can do it better."
Kevin Barry
|
|
|
|
12-07-2009, 10:02 PM
|
#13
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
Quote:
Originally Posted by ta0kira
Ok, but I'm not sure how you direct to separate files with awk without running through the file more than once.
|
you mean this?
Code:
awk 'NR%4==1{++c}{print $0 > "file-"c".txt"}' file
NR%4==1 means at every 4th line. eg
Code:
$ more file
1
2
3
4
5
6
7
8
9
10
$ awk 'NR%4==1' file
1
5
9
using this concept, OP can change it to NR%4000000 for his requirement. notice that count variable "c" is incremented at every 4th line. this variable "c" will be appended to file name.
the awk one liner above summarizes what OP did will those bunch of seds
Code:
sed -n '1,4000000 s/./&/w $FileName.01' $FileName
...
...
sed -n '76000001,$ s/./&/w $FileName.20' $FileName
the s/./&/w is just writing to the file (my guess), which is just a simple print with redirection ">" in awk.
Quote:
|
I'm not sure, either. It's something that OP had in his or her original script, and again, I didn't test my code.
|
whatever it is, there's no need to call sed. (echo or printf will do )
Quote:
|
it would be helpful if you'd show what you mean by "awk can do it better."
|
what i mean "better" is in the sense of speed performance on big files as compared to bash's while read loop.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 05:57 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|