Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Standard answer to speed up text processing code is to use (properly constructed) perl.
The python code is overly complex, and no doubt adds to the runtime. awk shouldn't be written to mirror that code, but use awk imperatives.
Also, the python code in post #1 won't produce the output in post #7 as no attempt was made to account for the header. Here is a quick awk attempt - it should be (much ?) faster.
probably you need to generate several pieces instead of that one big file.
I want to split the file into chunks but based on grouping the data into ID's as the same ID's information should be present in the same file but not in the other file. I could have split the file using csplit or split, but then the same ID's information will not be present in the same file.
Standard answer to speed up text processing code is to use (properly constructed) perl.
The python code is overly complex, and no doubt adds to the runtime. awk shouldn't be written to mirror that code, but use awk imperatives.
Also, the python code in post #1 won't produce the output in post #7 as no attempt was made to account for the header. Here is a quick awk attempt - it should be (much ?) faster.
I would've approached this very differently, however will also admit that I saw the original examples and felt it was pretty simple, not noticing that you were citing a very large amount of data to be processed.
My solution would've been a program over a script or scripted language. If it were small files, a script.
I would've written a program that would've opened the original file as read-only, opened a new write-to file and then processed the records in a simple loop which would test the first value and choose to write that record to the output file versus not.
I feel this possible solution, based on my experience doing similar things with text files, would be very fast.
I would've approached this very differently, however will also admit that I saw the original examples and felt it was pretty simple, not noticing that you were citing a very large amount of data to be processed.
My solution would've been a program over a script or scripted language. If it were small files, a script.
I would've written a program that would've opened the original file as read-only, opened a new write-to file and then processed the records in a simple loop which would test the first value and choose to write that record to the output file versus not.
I feel this possible solution, based on my experience doing similar things with text files, would be very fast.
Can you elaborate your solution? I can try with this one also, if it is much faster... then why not!
Can you elaborate your solution? I can try with this one also, if it is much faster... then why not!
My short summary would be:
C program.
open() using read-only for one file and write/create for the other file.
read() from the source file in a loop until EOF.
Conditionally write() to the output file.
A concern is that if you didn't understand the earlier descriptive text, then you are not generally a C programmer, familiar with file operations. Therefore suggest you do not follow this solution, unless you wish to tackle coming up to speed well enough with C programming to be able to accomplish this.
Quote:
Originally Posted by rtmistler
I would've written a program that would've opened the original file as read-only, opened a new write-to file and then processed the records in a simple loop which would test the first value and choose to write that record to the output file versus not.
open() using read-only for one file and write/create for the other file.
read() from the source file in a loop until EOF.
Conditionally write() to the output file.
A concern is that if you didn't understand the earlier descriptive text, then you are not generally a C programmer, familiar with file operations. Therefore suggest you do not follow this solution, unless you wish to tackle coming up to speed well enough with C programming to be able to accomplish this.
Yeah, I have worked only in Java and Python. So coding this in C will take much time. Thank you so much for your help.
Standard answer to speed up text processing code is to use (properly constructed) perl.
The python code is overly complex, and no doubt adds to the runtime. awk shouldn't be written to mirror that code, but use awk imperatives.
Also, the python code in post #1 won't produce the output in post #7 as no attempt was made to account for the header. Here is a quick awk attempt - it should be (much ?) faster.
The code works fine but in some files few columns are missing for the last entry. I got a file with more number of columns than 3, so it missing the last few columns of the last row only. Any suggestions?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.