LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-02-2020, 04:58 AM   #1
linux2021
LQ Newbie
 
Registered: Aug 2020
Posts: 8

Rep: Reputation: Disabled
Not Sure how to use parallel command with awk


Greetings All,

Hi, looking for a way to speed up the processing of this code.

Code:
while read line; do

read a b c d  <<< "$line"
 awk -v v1=$a -v v2=$b -v v3=$c -v v4=$d '$0~v1 && $0~v2 && $0~v3 && $0~v4 {print v1, v2, v3, v4}' gamenums.txt 

done <  patterns.txt
The script works fine but is very slow processing many lines. Can parallel or other code speed up the process?

I appreciate any suggestions. Thanks

Last edited by linux2021; 08-02-2020 at 06:39 AM.
 
Old 08-02-2020, 05:21 AM   #2
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 1,161

Rep: Reputation: Disabled
Is it the GNU parallel or the parallel from moreutils? Because they have different syntax.

But in both cases you have to specify the list of input files parallel will be run against. If you only have one file gamenums.txt I don't see how you can use parallel.

You should think of parallel as of xargs that runs the specified command against each file in the list, but different from xargs does it, well, in parallel.

Instead of different input files those could be different sets of arguments to the command, too.

Last edited by shruggy; 08-02-2020 at 05:26 AM.
 
1 members found this post helpful.
Old 08-02-2020, 05:33 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,097

Rep: Reputation: 3325Reputation: 3325Reputation: 3325Reputation: 3325Reputation: 3325Reputation: 3325Reputation: 3325Reputation: 3325Reputation: 3325Reputation: 3325Reputation: 3325
Every time I look at GNU parallel I get a headache. The authors think it's a cinch, but I find it arcane to wrestle into shape.
 
Old 08-02-2020, 05:45 AM   #4
linux2021
LQ Newbie
 
Registered: Aug 2020
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by shruggy View Post
Is it the GNU parallel or the parallel from moreutils? Because they have different syntax.

But in both cases you have to specify the list of input files parallel will be run against. If you only have one file gamenums.txt I don't see how you can use parallel.

You should think of parallel as of xargs that runs the specified command against each file in the list, but different from xargs does it, well, in parallel.

Instead of different input files those could be different sets of arguments to the command, too.
It is GNU parallel from ubuntu

Quote:
Originally Posted by syg00 View Post
Every time I look at GNU parallel I get a headache. The authors think it's a cinch, but I find it arcane to wrestle into shape.
I agree. It's very complicated to understand unless you're a kernel programmer.
 
Old 08-02-2020, 06:35 AM   #5
linux2021
LQ Newbie
 
Registered: Aug 2020
Posts: 8

Original Poster
Rep: Reputation: Disabled
I updated my question. If anyone knows any bash tricks in speeding up while loops or reading from files, I would like your suggestions to try. Thanks
 
Old 08-02-2020, 07:06 AM   #6
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 1,161

Rep: Reputation: Disabled
I'd think on something like
Code:
parallel -C' ' -q awk -v v1={1} -v v2={2} -v v3={3} -v v4={4} '$0~v1 && $0~v2 && $0~v3 && $0~v4 {print v1, v2, v3, v4}' gamenums.txt <patterns.txt

Last edited by shruggy; 08-02-2020 at 07:35 AM.
 
3 members found this post helpful.
Old 08-02-2020, 07:45 AM   #7
linux2021
LQ Newbie
 
Registered: Aug 2020
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by shruggy View Post
I'd think on something like
Code:
parallel -C' ' -q awk -v v1={1} -v v2={2} -v v3={3} -v v4={4} '$0~v1 && $0~v2 && $0~v3 && $0~v4 {print v1, v2, v3, v4}' gamenums.txt <patterns.txt
Thanks for this suggestion. I am a bit confused tho. I already have a redirect from patterns.txt file in the done line.

If I paste your code, I will have two redirects of patterns.txt file. Which one do I omit?

The one in your code or the one in the done line. Thanks
 
Old 08-02-2020, 08:07 AM   #8
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 1,161

Rep: Reputation: Disabled
No, you don't need any loops. This is a one-liner that completely replaces your script.
 
1 members found this post helpful.
Old 08-02-2020, 08:16 AM   #9
linux2021
LQ Newbie
 
Registered: Aug 2020
Posts: 8

Original Poster
Rep: Reputation: Disabled
Thumbs up

Quote:
Originally Posted by shruggy View Post
No, you don't need any loops. This is a one-liner that completely replaces your script.
Thanks, it's much better
 
Old 08-13-2020, 09:23 AM   #10
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: CentOS 6 & 7
Posts: 3,411

Rep: Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916
For each line of patterns.txt you are splitting the line and starting a new awk and reading all of gamenums.txt, looking for a similar line. Better would be to write a small program (for example in python) that reads the smaller of the two files into memory, sorts it, and then runs through the other one line-by-line to search for the matches with something better than linear time. This converts your problem from O(n^2) to O(n log n).
 
  


Reply

Tags
parallel


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed inside awk or awk inside awk maddyfreaks Linux - Newbie 4 06-29-2016 01:10 PM
LXer: Nvidia and ARM: It's a parallel, parallel, parallel world LXer Syndicated Linux News 0 03-21-2013 06:10 PM
[SOLVED] Once again... awk.. awk... awk shivaa Linux - Newbie 13 12-31-2012 04:56 AM
I'm not sure how this happened but I'm sure there is a lesson in there somewhere mreff555 Linux - Newbie 6 09-24-2012 02:09 AM
[SOLVED] Executing a command in parallel | GNU parallel or xargs the_gripmaster AIX 3 05-08-2012 07:41 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration