LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-22-2017, 06:35 AM   #16
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,130

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121

Standard answer to speed up text processing code is to use (properly constructed) perl.
The python code is overly complex, and no doubt adds to the runtime. awk shouldn't be written to mirror that code, but use awk imperatives.
Also, the python code in post #1 won't produce the output in post #7 as no attempt was made to account for the header. Here is a quick awk attempt - it should be (much ?) faster.
Code:
awk 'BEGIN{fl=1 ; i=0} (NR == 1) {next} ; !_[$1]++ {i++} ; {if (i % 4) {print $0 > "out"fl".txt"}  else { delete _ ; print $0 > "out"++fl".txt" ; _[$1]++ ; i=1 }}' Input.txt
 
1 members found this post helpful.
Old 06-22-2017, 06:53 AM   #17
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,879

Rep: Reputation: 7317Reputation: 7317Reputation: 7317Reputation: 7317Reputation: 7317Reputation: 7317Reputation: 7317Reputation: 7317Reputation: 7317Reputation: 7317Reputation: 7317
you can use with to open a file, see here: https://stackoverflow.com/questions/...file-in-python (for example).
You closed still only one output file, but opened a lot...
 
Old 06-22-2017, 08:23 AM   #18
Asoo
Member
 
Registered: Apr 2017
Posts: 33

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
probably you need to generate several pieces instead of that one big file.
I want to split the file into chunks but based on grouping the data into ID's as the same ID's information should be present in the same file but not in the other file. I could have split the file using csplit or split, but then the same ID's information will not be present in the same file.
 
Old 06-22-2017, 08:32 AM   #19
Asoo
Member
 
Registered: Apr 2017
Posts: 33

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
Standard answer to speed up text processing code is to use (properly constructed) perl.
The python code is overly complex, and no doubt adds to the runtime. awk shouldn't be written to mirror that code, but use awk imperatives.
Also, the python code in post #1 won't produce the output in post #7 as no attempt was made to account for the header. Here is a quick awk attempt - it should be (much ?) faster.
Code:
awk 'BEGIN{fl=1 ; i=0} (NR == 1) {next} ; !_[$1]++ {i++} ; {if (i % 4) {print $0 > "out"fl".txt"}  else { delete _ ; print $0 > "out"++fl".txt" ; _[$1]++ ; i=1 }}' Input.txt
Thank you so much for the reply.

Actually, I just added the first line to indicate the ID's separately (So I just initiated NR==0) Yes, it worked perfectly.
 
Old 06-22-2017, 09:37 AM   #20
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,130

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Remove the test altogether - if you have a lot of data, no sense testing every record.
 
Old 06-22-2017, 09:58 AM   #21
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
I realize that you've marked this as solved.

I would've approached this very differently, however will also admit that I saw the original examples and felt it was pretty simple, not noticing that you were citing a very large amount of data to be processed.

My solution would've been a program over a script or scripted language. If it were small files, a script.

I would've written a program that would've opened the original file as read-only, opened a new write-to file and then processed the records in a simple loop which would test the first value and choose to write that record to the output file versus not.

I feel this possible solution, based on my experience doing similar things with text files, would be very fast.
 
Old 06-23-2017, 02:04 AM   #22
Asoo
Member
 
Registered: Apr 2017
Posts: 33

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rtmistler View Post
I realize that you've marked this as solved.

I would've approached this very differently, however will also admit that I saw the original examples and felt it was pretty simple, not noticing that you were citing a very large amount of data to be processed.

My solution would've been a program over a script or scripted language. If it were small files, a script.

I would've written a program that would've opened the original file as read-only, opened a new write-to file and then processed the records in a simple loop which would test the first value and choose to write that record to the output file versus not.

I feel this possible solution, based on my experience doing similar things with text files, would be very fast.
Can you elaborate your solution? I can try with this one also, if it is much faster... then why not!
 
Old 06-23-2017, 02:06 AM   #23
Asoo
Member
 
Registered: Apr 2017
Posts: 33

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
Remove the test altogether - if you have a lot of data, no sense testing every record.
It is removed. Thanks!

Last edited by Asoo; 06-23-2017 at 09:15 AM.
 
Old 06-23-2017, 06:46 AM   #24
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Quote:
Originally Posted by Asoo View Post
Can you elaborate your solution? I can try with this one also, if it is much faster... then why not!
My short summary would be:
  • C program.
  • open() using read-only for one file and write/create for the other file.
  • read() from the source file in a loop until EOF.
  • Conditionally write() to the output file.
A concern is that if you didn't understand the earlier descriptive text, then you are not generally a C programmer, familiar with file operations. Therefore suggest you do not follow this solution, unless you wish to tackle coming up to speed well enough with C programming to be able to accomplish this.
Quote:
Originally Posted by rtmistler View Post
I would've written a program that would've opened the original file as read-only, opened a new write-to file and then processed the records in a simple loop which would test the first value and choose to write that record to the output file versus not.
 
Old 06-23-2017, 09:14 AM   #25
Asoo
Member
 
Registered: Apr 2017
Posts: 33

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rtmistler View Post
My short summary would be:
  • C program.
  • open() using read-only for one file and write/create for the other file.
  • read() from the source file in a loop until EOF.
  • Conditionally write() to the output file.
A concern is that if you didn't understand the earlier descriptive text, then you are not generally a C programmer, familiar with file operations. Therefore suggest you do not follow this solution, unless you wish to tackle coming up to speed well enough with C programming to be able to accomplish this.
Yeah, I have worked only in Java and Python. So coding this in C will take much time. Thank you so much for your help.
 
Old 06-29-2017, 08:07 AM   #26
Asoo
Member
 
Registered: Apr 2017
Posts: 33

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
Standard answer to speed up text processing code is to use (properly constructed) perl.
The python code is overly complex, and no doubt adds to the runtime. awk shouldn't be written to mirror that code, but use awk imperatives.
Also, the python code in post #1 won't produce the output in post #7 as no attempt was made to account for the header. Here is a quick awk attempt - it should be (much ?) faster.
Code:
awk 'BEGIN{fl=1 ; i=0} (NR == 1) {next} ; !_[$1]++ {i++} ; {if (i % 4) {print $0 > "out"fl".txt"}  else { delete _ ; print $0 > "out"++fl".txt" ; _[$1]++ ; i=1 }}' Input.txt
The code works fine but in some files few columns are missing for the last entry. I got a file with more number of columns than 3, so it missing the last few columns of the last row only. Any suggestions?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Sort And Group Data sweeny_here Programming 8 02-14-2016 07:38 PM
Writing app data to root volume group - best practice Cerephim Linux - Server 1 12-06-2014 04:41 AM
Viewing data within a Volume Group jeg1972 Linux - Hardware 10 11-21-2009 05:20 AM
Unable to read group repo data-Xen rsub Linux - Software 0 10-01-2008 10:32 PM
Volume group data missing Smthian AIX 1 07-28-2004 02:01 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration