[SOLVED] AWK: Remove of Lines matching from a supplied List of Objects?
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
yeah grail thats what I wanted. Ygrex, that only deletes the matching object itself, which can be done several different ways which I allready have, I need the entire line containing a match to be deleted as grails example shows.
But, its still slow as hell. I guess im gunna need to figure a way to fork this, or maybe use GNU Parellel.
Any suggestions for a better more efficient way to do this? I mean this works fine with smaller files.
But the target file has millions of lines, and the 2remove file is thousands, sometimes 10s of thousands.
You might have reached a point were the size of the files and the resources available are starting to become problematic.
Running in parallel won't solve anything (resources are still the same), it might even make things slower.
Looking at the answers given I would suggest using grails awk solution (post #3). Practical experience has shown me that a well written awk script is pretty efficient. Just be patient and wait for the results.
Not sure if this applies to you, but do be careful when running this on a production server. It will definitely have an impact and might even slow things down to a point were things become unusable/unresponsive.
You might want to run this job at a slow time (night?) and/or on a dedicated, non-production server.
Good words of widsom, unfortunately these lists must be updated frequently, and waiting days for them to complete is simply not an option, well at the moment it would seem to be the only option.
However, I am considering simply dividing up the work, chopping up the target file into pieces and distributing them to workstations to speed up the process. While a crude and primitive method. It lends to my belief that GNU Parallel can be leveraged to achieve this in a more elegant fashion.
Now read the to-remove file into a hash (disk based to allow for really large lists).
Then for each record, check the to-remove file for a hash match (much faster than pattern matching) and only output if the record doesn't exist.
Using Perl hash files is fast and when disk based, the hashes are then cached in memory for those frequently identified.
Now there is a fresh idea (as far as my brain is concerned) Thanks!
I think im a gunna go down that yellow perl/brick road, I know first hand how advantageous cracking passwords can be using rainbow tables, and this sounds familiar in that regard, A googling I Go! Im a close this thread tho as solved and go bug the piss out of some perl monks, thanks again everyone!
The rainbow files tend to be rather large (56 bit keys+salt is around 2 GB in size). The larger the key range is the larger the file. The goal is to have a file so large that it is impractical to store them.
Of course, having rainbow files for only the most common passwords/phrases would reduce that size, but it also introduces the likelihood of missing a password.